The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
Berkeley cloud computing meetup may 2020Larry Smarr
The Pacific Research Platform (PRP) is a high-bandwidth global private "cloud" connected to commercial clouds that provides researchers with distributed computing resources. It links Science DMZs at universities across California and beyond using a high-performance network. The PRP utilizes Data Transfer Nodes called FIONAs to transfer data at near full network speeds. It has adopted Kubernetes to orchestrate software containers across its resources. The PRP provides petabytes of distributed storage and hundreds of GPUs for machine learning. It allows researchers to perform data-intensive science across multiple universities much faster than possible individually.
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
11.03.28
Remote Luncheon Presentation from Calit2@UCSD
National Science Board
Expert Panel Discussion on Data Policies
National Science Foundation
Title: High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Science and Engineering
Arlington, Virginia
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...Larry Smarr
The document summarizes the Pacific Research Platform (PRP) which connects researchers across multiple universities with high-speed networks and computing resources for big data and machine learning applications. Key points:
- PRP connects 15 universities with optical networks, distributed storage devices (FIONAs), and over 350 GPUs for data analysis and AI training.
- It allows researchers to rapidly share and analyze large datasets, with one example reducing a workflow from 19 days to 52 minutes.
- Other projects using PRP resources include climate modeling, astrophysics simulations, and machine learning courses involving thousands of students.
National Federated Compute Platforms: The Pacific Research PlatformLarry Smarr
The Pacific Research Platform (PRP) is a multi-institution hypercluster that connects science DMZs across 25 partner campuses using FIONA data transfer nodes and 10-100Gbps networks. PRP adopted Kubernetes and Rook to orchestrate petabytes of distributed storage and GPUs for data science applications. A CHASE-CI grant added machine learning capabilities. PRP is working to federate with the Open Science Grid and become a prototype for a future National Research Platform connecting regional networks.
SC21: Larry Smarr on The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a drive to link those supercomputers with high-speed optical networks, and blossomed into the notion of building a distributed, high-performance computing infrastructure – replete with compute, storage and management capabilities – available broadly to the science community.
The document provides an overview of the Pacific Research Platform (PRP) and discusses its role in connecting researchers across institutions and enabling new applications. It summarizes the PRP's key components like Science DMZs, Data Transfer Nodes (FIONAs), and use of Kubernetes for container management. Several examples are given of how the PRP facilitates high-performance distributed data analysis, access to remote supercomputers, and sensor networks coupled to real-time computing. Upcoming work on machine learning applications and expanding the PRP internationally is also outlined.
High Performance Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
This document summarizes a lecture given by Dr. Larry Smarr on high performance cyberinfrastructure for data-intensive research. The summary discusses:
1) The need for dedicated high-bandwidth networks separate from the shared internet to enable big data research due to the increasing volume of digital scientific data.
2) Extensions being made to networks like CENIC in California to provide campus "Big Data Freeways" connecting instruments, computing resources, and remote facilities.
3) The use of networks like HPWREN to provide high-performance wireless access for data-intensive applications in rural areas like astronomy, wildfire detection, and more.
From the Shared Internet to Personal Lightwaves: How the OptIPuter is Transfo...Larry Smarr
The document summarizes how the OptIPuter project is transforming scientific research through user-controlled high-speed optical network connections. It provides examples of how 1-10Gbps connections through projects like National LambdaRail are enabling new forms of collaborative work and access to scientific instruments and global data repositories. The OptIPuter creates an environment where researchers can access remote resources through local "OptIPortals" connected to these high-speed optical networks.
The document discusses the growing carbon footprint of information and communication technologies (ICT) and efforts to make cyberinfrastructure more energy efficient and environmentally sustainable. Specifically, it mentions that (1) ICT energy usage is growing rapidly and accounts for 2% of global greenhouse gas emissions, (2) universities are working on initiatives like the GreenLight project to reduce ICT energy usage through techniques like dynamic power management, and (3) further research is needed to develop more energy-efficient computing technologies, data center designs, and videoconferencing solutions to reduce the need for travel.
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...Larry Smarr
05.06.14
Keynote to the 15th Federation of Earth Science Information Partners Assembly Meeting: Linking Data and Information to Decision Makers
Title: The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way to the International LambdaGrid
San Diego, CA
Why Researchers are Using Advanced NetworksLarry Smarr
07.07.03
Remote Talk from Calit2 to:
Building KAREN Communities for Collaboration Forum
KIWI Advanced Research and Education Network
University of Auckland, Auckland City, New Zealand
Title: Why Researchers are Using Advanced Networks
La Jolla, CA
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
This document discusses several projects related to connecting research institutions through high-speed networks:
1) The Pacific Research Platform connects campuses in California through a "big data superhighway" funded by NSF from 2015-2020.
2) CHASE-CI adds machine learning capabilities for researchers across 10 campuses in California using NSF-funded GPU resources.
3) A pilot project is using CENIC and Internet2 to connect regional research networks on a national scale, funded by NSF from 2018-2019.
08.04.14
Invited Talk
National Astrobiology Institute Executive Council Meeting
Astrobiology Science Conference 2008
Santa Clara Convention Center
Title: High Performance Collaboration
Santa Clara, CA
The Human Microbiome, Supercomputers,and the Advancement of MedicineLarry Smarr
The keynote presentation discusses the importance of the human microbiome and how understanding its dynamics can advance medicine. It notes that the human microbiome contains tens of trillions of microbial cells and hundreds of times as many genes as human cells. Understanding the microbiome as an ecology rather than focusing on single pathogens is crucial. The presentation describes research tracking one person's microbiome and biomarkers over time, finding shifts between healthy and diseased states. It advocates developing tools to manage the microbiome and new therapies like fecal transplants. National initiatives now recognize the microbiome's importance in health and disease.
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...Larry Smarr
This document provides a summary of a lecture given by Dr. Larry Smarr on the past, present, and future of supercomputing over the last 50 years. The summary discusses:
- How Smarr solved equations for colliding black holes in the 1970s using a megaFLOPs computer, whereas today collisions are detected using petaFLOPs supercomputers - a billion fold increase in speed.
- How Smarr's research has evolved from modeling astrophysical phenomena to mapping the human gut microbiome using terabytes of sequencing data and hundreds of thousands of core-hours of supercomputing.
- Emerging trends in brain-inspired computing architectures and non-von Neumann systems that are better suited to tasks
The document summarizes a seminar given by Dr. Larry Smarr on supercomputing the human microbiome. Some key points:
- The human microbiome contains 100 trillion microorganisms and their DNA contains 300 times as many genes as human DNA.
- Dr. Smarr has been collecting extensive data from his own body over 7 years to study his personal microbiome and immune system interactions using high performance computing.
- Analyzing microbiome data requires massive computing resources, such as millions of core hours on supercomputers. This reveals details of microbial ecology and genetics in health and disease.
- Computational analysis of microbiome sequencing data from many subjects shows major shifts in microbial populations between healthy and
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...Larry Smarr
Invited Presentation Microbiology and the Microbiome and the Implications for Human Health Analytic, Life Science & Diagnostic Association (ALDA) 2016 Senior Management Conference
Half Moon Bay, CA
October 3, 2016
Dynamics of Your Gut Microbiome in Health and DiseaseLarry Smarr
This document summarizes a presentation by Dr. Larry Smarr on the dynamics of the gut microbiome in health and disease. It discusses how the gut microbiome contains hundreds of microbial species that vary significantly between healthy and diseased states. Dr. Smarr has tracked his own gut microbiome and biomarkers over time, discovering an autoimmune disease. He is now collaborating on a project combining deep metagenomic sequencing and supercomputing to map differences in the gut microbiome between healthy and inflammatory bowel disease patients.
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Larry Smarr
The document discusses two projects, OptIPuter and LOOKING, that aim to analyze large earth data sets using optical networking and grid technologies. OptIPuter extends grid middleware to dedicated optical circuits for earth and medical sciences. LOOKING builds on OptIPuter to provide real-time control of ocean observatories through web and grid services integrated over optical networks. Both projects represent efforts to develop cyberinfrastructure for interactive analysis of remote earth science data and instruments.
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
The document discusses creating a California-wide cyberinfrastructure for data-intensive research. It outlines efforts to connect all UC campuses and other research institutions across California with high-speed optical networks. This would create a "big data plane" to share large datasets. Several campuses have received NSF grants to upgrade their networks and implement Science DMZ architectures with 10-100Gbps connections to CENIC. Connecting these resources would provide researchers access to high-performance computing, large scientific instruments, and datasets. This would support collaborative big data science across disciplines like physics, climate modeling, genomics and microscopy.
- The Pacific Research Platform (PRP) interconnects campus DMZs across multiple institutions to provide high-speed connectivity for data-intensive research.
- The PRP utilizes specialized data transfer nodes called FIONAs that provide disk-to-disk transfer speeds of 10-100Gbps.
- Early applications of the PRP include distributing telescope data between UC campuses, connecting particle physics experiments to computing resources, and enabling real-time wildfire sensor data analysis.
06.07.26
Invited Talk
Cyberinfrastructure for Humanities, Arts, and Social Sciences, A Summer Institute, SDSC
Title: The OptIPuter and Its Applications
La Jolla, CA
Coupling Australia’s Researchers to the Global Innovation EconomyLarry Smarr
The document summarizes Dr. Larry Smarr's lecture on connecting Australian researchers to the global innovation economy through high-performance networks. It discusses projects that established dedicated 1Gbps and 10Gbps connections between Australian universities and research centers and international partners. This infrastructure will allow Australian researchers to collaborate globally on issues like climate change, health care, and more. The goal is for Australia to have connectivity on par with the best in the world to attract top researchers and partners.
How Global-Scale Personal Lightwaves are Transforming Scientific ResearchLarry Smarr
07.03.22
Distinguished Lecturer
Technology for a Changing World Series
Baskin School of Engineering, UCSC
Title: How Global-Scale Personal Lighwaves are Transforming Scientific Research
Santa Cruz, CA
Calit2: a View Into the Future of the Wired and Unwired InternetLarry Smarr
06.01.23
Invited Talk to the National Research Council's Computer Science and Telecommunications Board
Title: Calit2: a View Into the Future of the Wired and Unwired Internet
La Jolla, CA
Coupling Australia’s Researchers to the Global Innovation EconomyLarry Smarr
08.10.10
Fifth Lecture in the
Australian American Leadership Dialogue Scholar Tour
University of Queensland
Title: Coupling Australia’s Researchers to the Global Innovation Economy
Brisbane, Australia
Similar to The Pacific Research Platform:a Science-Driven Big-Data Freeway System (20)
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceQuentin Reul
The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to:
- Identify high-impact customer needs with precision
- Harness the power of large language models to address specific customer needs effectively
- Implement AI responsibly to build trust and foster strong customer relationships
Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Choosing the Best Outlook OST to PST Converter: Key Features and Considerationswebbyacad software
When looking for a good software utility to convert Outlook OST files to PST format, it is important to find one that is easy to use and has useful features. WebbyAcad OST to PST Converter Tool is a great choice because it is simple to use for anyone, whether you are tech-savvy or not. It can smoothly change your files to PST while keeping all your data safe and secure. Plus, it can handle large amounts of data and convert multiple files at once, which can save you a lot of time. It even comes with 24*7 technical support assistance and a free trial, so you can try it out before making a decision. Whether you need to recover, move, or back up your data, Webbyacad OST to PST Converter is a reliable option that gives you all the support you need to manage your Outlook data effectively.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
1. “The Pacific Research Platform:
a Science-Driven Big-Data Freeway System.”
Invited Presentation
2015 Campus Cyberinfrastructure PI Workshop
Austin, TX
September 30, 2015
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. Vision: Creating a West Coast “Big Data Freeway”
Connected by CENIC/Pacific Wave to Internet2 & GLIF
Use Lightpaths to Connect
All Data Generators and Consumers,
Creating a “Big Data” Freeway
Integrated With High Performance Global Networks
“The Bisection Bandwidth of a Cluster Interconnect,
but Deployed on a 20-Campus Scale.”
This Vision Has Been Building for 25 Years
3. The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Freeway System”
NSF CC*DNI $5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-Pis:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UC San Diego SDSC,
• Frank Wuerthwein, UC San Diego Physics
and SDSC
NSF-Funded Workshop
For PRP Members
October 14-16
Calit2@UCSD
4. NCSA Telnet--“Hide the Cray”
Paradigm That We Still Use Today
• NCSA Telnet -- Interactive Access
– From Macintosh or PC Computer
– To Telnet Hosts on TCP/IP Networks
• Allows for Simultaneous
Connections
– To Numerous Computers on The Net
– Standard File Transfer Server (FTP)
– Lets You Transfer Files to and from
Remote Machines and Other Users
John Kogut Simulating
Quantum Chromodynamics
He Uses a Mac—The Mac Uses the Cray
Source: Larry Smarr 1985
Data
Generator
Data
Portal
Data
Transmission
5. Interactive Supercomputing End-to-End Prototype:
Using Analog Communications to Prototype the Fiber Optic Future
“We’re using satellite technology…
to demo what It might be like to have
high-speed fiber-optic links between
advanced computers
in two different geographic locations.”
― Al Gore, Senator
Chair, US Senate Subcommittee on Science, Technology and Space
Illinois
Boston
SIGGRAPH 1989
“What we really have to do is eliminate distance between
individuals who want to interact with other people and
with other computers.”
― Larry Smarr, Director, NCSA
6. NSF’s PACI Program was Built on the vBNS
to Prototype America’s 21st Century Information Infrastructure
The PACI Grid Testbed
National Computational Science
1997
7. Chesapeake Bay Simulation End-to-End Collaboratory:
vBNS Linked CAVE, ImmersaDesk, Power Wall, and Workstation
Alliance Project: Collaborative Video Production
via Tele-Immersion and Virtual Director
UIC
Donna Cox, Robert Patterson, Stuart Levy, NCSA Virtual Director Team
Glenn Wheless, Old Dominion Univ.
Alliance Application Technologies
Environmental Hydrology Team
4 MPixel PowerWall
Alliance 1997
8. Two New Calit2 Buildings Provide
New Laboratories for “Living in the Future”
• “Convergence” Laboratory Facilities
– Nanotech, BioMEMS, Chips, Radio, Photonics
– Virtual Reality, Digital Cinema, HDTV, Gaming
• Over 1000 Researchers in Two Buildings
– Linked via Dedicated Optical Networks
UC Irvine
www.calit2.net
Preparing for a World in Which
Distance is Eliminated…
9. Linking the Calit2 Auditoriums at UCSD and UCI
With HD Streams
September 8, 2009
Photo by Erik Jepsen, UC San Diego
Sept. 8, 2009
10. NSF’s OptIPuter Project: Using Supernetworks
to Meet the Needs of Data-Intensive Researchers
OptIPortal–
Termination
Device
for the
OptIPuter
Global
Backplane
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI
Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
2003-2009
$13,500,000
In August 2003,
Jason Leigh and his
students used
RBUDP to blast data
from NCSA to SDSC
over the
TeraGrid DTFnet,
achieving18Gbps file
transfer out of the
available 20Gbps
11. Integrated “OptIPlatform” Cyberinfrastructure System:
A 10Gbps Lightpath Cloud
National LambdaRail
Campus
Optical
Switch
Data Repositories & Clusters
HPC
HD/4k Video Images
HD/4k Video Cams
End User
OptIPortal
10G
Lightpath
HD/4k Telepresence
Instruments
LS 2009
Slide
12. So Why Don’t We Have a National
Big Data Cyberinfrastructure?
“Research is being stalled by ‘information overload,’ Mr. Bement said, because
data from digital instruments are piling up far faster than researchers can study.
In particular, he said, campus networks need to be improved. High-speed data
lines crossing the nation are the equivalent of six-lane superhighways, he said.
But networks at colleges and universities are not so capable. “Those massive
conduits are reduced to two-lane roads at most college and university
campuses,” he said. Improving cyberinfrastructure, he said, “will transform the
capabilities of campus-based scientists.”
-- Arden Bement, the director of the National Science Foundation May 2005
13. Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
See
Esnet’s
Eli Dart
Talk on
Future of
Science
DMZs
14. Creating a “Big Data” Freeway on Campus:
NSF Funded Prism@UCSD and CHeruB
Prism@UCSD, Phil Papadopoulos, SDSC, Calit2, PI
CHERuB, Mike Norman, SDSC PI
CHERuB
15. FIONA – Flash I/O Network Appliance:
Linux PCs Optimized for Big Data
Cost $7700 $21,000
Intel Xeon Haswell Multicore E5-1650 v3 6-Core 2x E5-2697 v3 14-Core
RAM 1 TB 16 TB
SSD 4 TB 16 TB
Network Interface 10GbE/40GbE 100GbE
GPU NVIDIA Tesla K80 24GB
RAID Drives 0 to 112TB (add ~$100/TB)
UCOP Rack-Mount Build:
FIONAs Are
Science DMZ Data Transfer Nodes &
Optical Network Termination Devices
UCSD CC-NIE Prism Award & UCOP
Phil Papadopoulos & Tom DeFanti
Joe Keefe & John Graham
16. Customizing Prism@UCSD to Specific Big Data Requirements
for Rob Knight’s Lab – PRP Does This on a Sub-National Scale
FIONA
12 Cores/GPU
128 GB RAM
3.5 TB SSD
48TB Disk
10Gbps NIC
Knight Lab
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
100GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
18. Ten Week Sprint to Demonstrate the West Coast
Big Data Freeway System: PRPv0
Presented at CENIC 2015
March 9, 2015
FIONA DTNs Now Deployed to All UC Campuses
And Most PRP Sites
19. Digital Research Platform: Distributed IPython/Jupyter Notebooks:
Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images
IJulia
IHaskell
IFSharp
IRuby
IGo
IScala
IMathics
Ialdor
LuaJIT/Torch
Lua Kernel
IRKernel (for the R language)
IErlang
IOCaml
IForth
IPerl
IPerl6
Ioctave
Calico Project
• kernels implemented in Mono,
including Java, IronPython,
Boo, Logo, BASIC, and many
others
IScilab
IMatlab
ICSharp
Bash
Clojure Kernel
Hy Kernel
Redis Kernel
jove, a kernel for io.js
IJavascript
Calysto Scheme
Calysto Processing
idl_kernel
Mochi Kernel
Lua (used in Splash)
Spark Kernel
Skulpt Python Kernel
MetaKernel Bash
MetaKernel Python
Brython Kernel
IVisual VPython Kernel
Source: John Graham, QI
20. PRP Has Deployed Powerful FIONA Servers at UCSD and UC Berkeley
to Create a UC-Jupyter Hub Backplane
FIONAs Have GPUs and
Can Spawn Jobs
to SDSC’s Comet
Using inCommon CILogon
Authenticator Module
for Jupyter.
Deep Learning Libraries
Have Been Installed
Source: John Graham, QI
21. PRP Timeline
• PRPv1
– A Layer 2 and Layer 3 System
– Completed In 2 Years
– Tested, Measured, Optimized, With Multi-domain Science Data
– Bring Many Of Our Science Teams Up
– Each Community Thus Will Have Its Own Certificate-Based Access
To its Specific Federated Data Infrastructure.
• PRPv2
– Advanced Ipv6-Only Version with Robust Security Features
– e.g. Trusted Platform Module Hardware and SDN/SDX Software
– Support Rates up to 100Gb/s in Bursts And Streams
– Develop Means to Operate a Shared Federation of Caches
22. Pacific Research Platform
Multi-Campus Science Driver Teams
• Particle Physics
• Astronomy and Astrophysics
– Telescope Surveys
– Galaxy Evolution
– Gravitational Wave Astronomy
• Biomedical
– Cancer Genomics Hub/Browser
– Microbiome and Integrative ‘Omics
– Integrative Structural Biology
• Earth Sciences
– Data Analysis and Simulation for Earthquakes and Natural Disasters
– Climate Modeling: NCAR/UCAR
– California/Nevada Regional Climate Data Analysis
– CO2 Subsurface Modeling
• Scalable Visualization, Virtual Reality, and Ultra-Resolution Video
22
23. Particle Physics: Creating a 10-100 Gbps LambdaGrid
to Support LHC Researchers
ATLASCMS
U.S. Institutions
Participating in LHC
LHC Data
Generated by
CMS & ATLAS
Detectors
Analyzed
on OSG
Maps from www.uslhc.us
24. LHC Scientists Across Eight CA Universities Benefit From
Petascale Data & Compute Resources across PRP
SLAC
Data & Compute
Resource
Caltech
Data & Compute
Resource
UCSD & SDSC
Data & Compute
Resources
UCSB
UCSC
UCD
UCR
CSU Fresno
UCI
Harvey Newman and Azher Mughal of Caltech have
been lead researchers in 40Gbps and 100Gbps DTNs
Source: Frank Wuerthwein, UCSD Physics;
SDSC; co-PI PRP
25. Goal: Allow LHC Community to Use Five Major Data & Compute Resources
in CA: SLAC, NERSC, Caltech, UCSD, SDSC
• Aggregate Petabytes of Disk Space & Petaflops of Compute
• Transparently Compute on Data at Their Home Institutions &
These 5 Major Centers
– Uniform Execution Environment
– XrootD Data Federations for ATLAS & CMS
– Serving Local Disks Outbound to Remotely Running Jobs
– Caching Remote Data Inbound for Locally Running Jobs
– HTCondor “Overflow” of Jobs from Local Cluster to Major Centers
– Satisfy Peak Needs to Accelerate Path from Idea to Publication
• Collaboration of PRP, SDSC, and Open Science Grid
– PRP Builds on SDSC LHC-UC Project
25
Source: Frank Wuerthwein, UCSD Physics; SDSC; co-PI PRP
26. Two Automated Telescope Surveys
Creating Huge Datasets Will Drive PRP
300 images per night.
100MB per raw image
30GB per night
120GB per night
250 images per night.
530MB per raw image
150 GB per night
800GB per night
When processed
at NERSC
Increased by 4x
Source: Peter Nugent, Division Deputy for Scientific Engagement, LBL
Professor of Astronomy, UC Berkeley
Precursors to
LSST and NCSA
PRP Allows Researchers
to Bring Datasets from NERSC
to Their Local Clusters
for In-Depth Science Analysis-
see UCSC’s Brad Smith Talk
27. Cancer Genomics Hub (UCSC) is Housed in SDSC CoLo:
Large Data Flows to End Users at UCSC, UCB, UCSF, …
1G
8G
15G
Cumulative TBs of CGH
Files Downloaded
Data Source: David Haussler,
Brad Smith, UCSC
30 PB
28. Dan Cayan
USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and
other colleagues
Sponsors:
California Energy Commission
NOAA RISA program
California DWR, DOE, NSF
Planning for climate change in California
substantial shifts on top of already high climate variability
SIO Campus Climate Researchers Need to Download
Results from NCAR Remote Supercomputer Simulations
to Make Regional Climate Change Forecasts
29. Collaboration Between EVL’s CAVE2
and Calit2’s VROOM Over 10Gb Wavelength
EVL
Calit2
Source: NTT Sponsored ON*VECTOR Workshop at Calit2 March 6, 2013
31. Next Step: Use AARnet/PRP to Set Up
Planetary-Scale Shared Virtual Worlds
Digital Arena, UTS Sydney
CAVE2, Monash U, Melbourne
CAVE2, EVL, Chicago
32. The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Freeway System”
Opportunities for Collaboration
with Other Regional Systems