The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...Larry Smarr
The Pacific Research Platform (PRP) is a distributed big data and machine learning cyberinfrastructure connecting researchers across multiple UC campuses. It was established in 2015 with NSF funding and has since expanded to include other California universities and national/international partners. The PRP provides high-speed networks, storage, and computing resources like GPUs. It has enabled new data-intensive collaborations and significantly accelerated research workflows. The PRP also supports educational initiatives, providing computing resources for data science courses impacting thousands of students.
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform (PRP) is a multi-institutional partnership that establishes a high-capacity "big data freeway system" spanning the University of California campuses and other research universities in California to facilitate rapid data access and sharing between researchers and institutions. Fifteen multi-campus application teams in fields like particle physics, astronomy, earth sciences, biomedicine, and visualization drive the technical design of the PRP over five years. The goal of the PRP is to extend campus "Science DMZ" networks to allow high-speed data movement between research labs, supercomputer centers, and data repositories across campus, regional
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...Larry Smarr
The document summarizes the Pacific Research Platform (PRP) which connects researchers across multiple universities with high-speed networks and computing resources for big data and machine learning applications. Key points:
- PRP connects 15 universities with optical networks, distributed storage devices (FIONAs), and over 350 GPUs for data analysis and AI training.
- It allows researchers to rapidly share and analyze large datasets, with one example reducing a workflow from 19 days to 52 minutes.
- Other projects using PRP resources include climate modeling, astrophysics simulations, and machine learning courses involving thousands of students.
Internet & Climate Change: Cyberinfrastructure for a Carbon-Constrained WorldLarry Smarr
- Internet and information technologies (ICT) can play a key role in addressing climate change by enabling efficiency gains across multiple sectors that could reduce greenhouse gas emissions up to 5 times more than ICT's own carbon footprint.
- University campuses can serve as living laboratories for testing green ICT solutions and infrastructure to reduce emissions from buildings, transportation, electricity generation and distribution.
- Advances in machine learning and brain-inspired computing will be necessary to develop low-power exascale supercomputers needed to fully model and simulate climate systems.
National Federated Compute Platforms: The Pacific Research PlatformLarry Smarr
The Pacific Research Platform (PRP) is a multi-institution hypercluster that connects science DMZs across 25 partner campuses using FIONA data transfer nodes and 10-100Gbps networks. PRP adopted Kubernetes and Rook to orchestrate petabytes of distributed storage and GPUs for data science applications. A CHASE-CI grant added machine learning capabilities. PRP is working to federate with the Open Science Grid and become a prototype for a future National Research Platform connecting regional networks.
The Pacific Research Platform (PRP) is a multi-institutional cyberinfrastructure project that connects researchers across California and beyond to share large datasets. It spans the 10 University of California campuses, major private research universities, supercomputer centers, and some out-of-state universities. Fifteen multi-campus research teams in fields like physics, astronomy, earth sciences, biomedicine, and multimedia will drive the technical needs of the PRP over five years. The goal is to create a "big data freeway" to allow high-speed sharing of data between research labs, supercomputers, and repositories across multiple networks without performance loss over long distances.
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
Pacific Research Platform Science DriversLarry Smarr
The document discusses the vision and progress of the Pacific Research Platform (PRP) in creating a "big data freeway" across the West Coast to enable data-intensive science. It outlines how the PRP builds on previous NSF and DOE networking investments to provide dedicated high-performance computing resources, like GPU clusters and Jupyter hubs, connected by high-speed networks at multiple universities. Several science driver teams are highlighted, including particle physics, astronomy, microbiology, earth sciences, and visualization, that will leverage PRP resources for large-scale collaborative data analysis projects.
CENIC: Pacific Wave and PRP Update Big News for Big DataLarry Smarr
The document discusses the Pacific Wave exchange and Pacific Research Platform (PRP). It provides an overview of Pacific Wave, including its history and connectivity across the Pacific and western US. It then discusses how the PRP will build on infrastructure projects to create a high-speed "big data freeway" for science across California universities. This will allow researchers to more easily share and analyze large datasets for projects in areas like climate modeling, cancer genomics, astronomy and particle physics. Details are provided on specific science applications and datasets that will benefit from the enhanced connectivity of the PRP.
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
An Integrated Science Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
This document summarizes Dr. Larry Smarr's vision for an integrated science cyberinfrastructure to support data-intensive research. It discusses the exponential growth of digital data and need for dedicated high-bandwidth networks and data repositories. Specific examples are provided of initiatives at UCSD, regional optical networks connecting research institutions, and national projects like the Open Science Grid and Cancer Genomics Hub that are creating cyberinfrastructure to enable data-intensive scientific discovery.
Using the Pacific Research Platform for Earth Sciences Big DataLarry Smarr
Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
National Ocean Exploration Forum 2017
Ocean Exploration in a Sea of Data
Calit2’s Qualcomm Institute
University of California, San Diego
October 21, 2017
- The Pacific Research Platform (PRP) interconnects campus DMZs across multiple institutions to provide high-speed connectivity for data-intensive research.
- The PRP utilizes specialized data transfer nodes called FIONAs that provide disk-to-disk transfer speeds of 10-100Gbps.
- Early applications of the PRP include distributing telescope data between UC campuses, connecting particle physics experiments to computing resources, and enabling real-time wildfire sensor data analysis.
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
The document discusses creating a California-wide cyberinfrastructure for data-intensive research. It outlines efforts to connect all UC campuses and other research institutions across California with high-speed optical networks. This would create a "big data plane" to share large datasets. Several campuses have received NSF grants to upgrade their networks and implement Science DMZ architectures with 10-100Gbps connections to CENIC. Connecting these resources would provide researchers access to high-performance computing, large scientific instruments, and datasets. This would support collaborative big data science across disciplines like physics, climate modeling, genomics and microscopy.
Creating a Big Data Machine Learning Platform in CaliforniaLarry Smarr
Big Data Tech Forum: Big Data Enabling Technologies and Applications
San Diego Chinese American Science and Engineering Association (SDCASEA)
Sanford Consortium
La Jolla, CA
December 2, 2017
High Performance Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
This document summarizes a lecture given by Dr. Larry Smarr on high performance cyberinfrastructure for data-intensive research. The summary discusses:
1) The need for dedicated high-bandwidth networks separate from the shared internet to enable big data research due to the increasing volume of digital scientific data.
2) Extensions being made to networks like CENIC in California to provide campus "Big Data Freeways" connecting instruments, computing resources, and remote facilities.
3) The use of networks like HPWREN to provide high-performance wireless access for data-intensive applications in rural areas like astronomy, wildfire detection, and more.
2014.02.06
Calit2 Director Larry Smarr invited short talk to a workshop on "Enriching Human Life and Society," one of the planned themes for the UCSD Strategic Plan to be adopted in 2014.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
The document discusses the Pacific Research Platform (PRP), a regional big data cyberinfrastructure connecting researchers across California universities. PRP provides high-speed networks and data transfer nodes to enable sharing of large datasets for projects like medical imaging, cryo-electron microscopy, and machine learning. Recent grants are expanding PRP to add GPUs and non-von Neumann processors to support these computationally intensive applications.
Similar to Peering The Pacific Research Platform With The Great Plains Network (20)
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
The Rise of Python in Finance,Automating Trading Strategies: _.pdfRiya Sen
In the dynamic realm of finance, where every second counts, the integration of technology has become indispensable. Aspiring traders and seasoned investors alike are turning to coding as a powerful tool to unlock new avenues of financial success. In this blog, we delve into the world of Python live trading strategies, exploring how coding can be the key to navigating the complexities of the market and securing your path to prosperity.
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataSamuel Jackson
We present our work to improve data accessibility and performance for data-intensive tasks within the fusion research community. Our primary goal is to develop services that facilitate efficient access for data-intensive applications while ensuring compliance with FAIR principles [1], as well as adoption of interoperable tools, methods and standards.
The major outcome of our work is the successful creation and deployment of a data service for the MAST (Mega Ampere Spherical Tokamak) experiment [2], leading to substantial enhancements in data discoverability, accessibility, and overall data retrieval performance, particularly in scenarios involving large-scale data access. Our work follows the principles of Analysis-Ready, Cloud Optimised (ARCO) data [3] by using cloud optimised data formats for fusion data.
Our system consists of a query-able metadata catalogue, complemented with an object storage system for publicly serving data from the MAST experiment. We will show how our solution integrates with the Pandata stack [4] to enable data analysis and processing at scales that would have previously been intractable, paving the way for data-intensive workflows running routinely with minimal pre-processing on the part of the researcher. By using a cloud-optimised file format such as zarr [5] we can enable interactive data analysis and visualisation while avoiding large data transfers. Our solution integrates with common python data analysis libraries for large, complex scientific data such as xarray [6] for complex data structures and dask [7] for parallel computation and lazily working with larger that memory datasets.
The incorporation of these technologies is vital for advancing simulation, design, and enabling emerging technologies like machine learning and foundation models, all of which rely on efficient access to extensive repositories of high-quality data. Relying on the FAIR guiding principles for data stewardship not only enhances data findability, accessibility, and reusability, but also fosters international cooperation on the interoperability of data and tools, driving fusion research into new realms and ensuring its relevance in an era characterised by advanced technologies in data science.
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016) https://doi.org/10.1038/sdata.2016.18
[2] M Cox, The Mega Amp Spherical Tokamak, Fusion Engineering and Design, Volume 46, Issues 2–4, 1999, Pages 397-404, ISSN 0920-3796, https://doi.org/10.1016/S0920-3796(99)00031-9
[3] Stern, Charles, et al. "Pangeo forge: crowdsourcing analysis-ready, cloud optimized data production." Frontiers in Climate 3 (2022): 782909.
[4] Bednar, James A., and Martin Durant. "The Pandata Scalable Open-Source Analysis Stack." (2023).
[5] Alistair Miles (2024) ‘zarr-developers/zarr-python: v2.17.1’. Zenodo. doi: 10.5281/zenodo.10790679
[6] Hoyer, S. & Hamman, J., (20
Annex K RBF's The World Game pdf documentSteven McGee
Signals & Telemetry Annex K for RBF's The World Game / Trade Federations / USPTO 13/573,002 Heart Beacon Cycle Time - Space Time Chain meters, metrics, standards. Adaptive Procedural template framework structured data derived from DoD / NATO's system of systems engineering tech framework
Combined supervised and unsupervised neural networks for pulse shape discrimi...Samuel Jackson
Our methodology for pulse shape discrimination is split into two steps. Firstly, we learn a model to discriminate between pulses using "clean" low-rate examples by removing pile-up & saturated events. In addition to traditional tail sum discrimination, we investigate three different choices for discrimination between γ-pulses, fast, thermal neutrons. We consider clustering the pulses directly using Gaussian Mixture Modelling (GMM), using variational autoencoders to learn a representation of the pulses and then clustering the learned representation (VAE+GMM) and using density ratio estimation to discriminate between a mixed (γ + neutron) and pure (γ only) sources using a multi-layer perceptron (MLP) as a supervised learning problem.
Secondly, we aim to classify and recover pile-up events in the < 150 ns regime by training a single unified multi-label MLP. To frame the problem as a multi-label supervised learning method, we first simulate pile-up events with known components. Then, using the simulated data and combining it with single event data, we train a final multi-label MLP to output a binary code indicating both how many and which type of events are present within an event window.
Introduction to Data Science
1.1 What is Data Science, importance of data science,
1.2 Big data and data Science, the current Scenario,
1.3 Industry Perspective Types of Data: Structured vs. Unstructured Data,
1.4 Quantitative vs. Categorical Data,
1.5 Big Data vs. Little Data, Data science process
1.6 Role of Data Scientist
Peering The Pacific Research Platform With The Great Plains Network
1. “Peering The Pacific Research Platform
With The Great Plains Network”
Keynote
Great Plains Network 2018 Annual Meeting
Kansas City, MO
May 31, 2018
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. I Was Born and Raised in the Midwest:
Columbia, Missouri
My Grandfather, Father, and Me
At My MU Graduation
My Mother and Me
On My First Birthday
3. I Earned Three of the Sixteen
University of Missouri Degrees in My Family
Framing
By Brother
David Smarr
4. 30 Years Ago NSF Brought to University Researchers
a DOE HPC Center Model
NCSA Was Modeled on LLNL SDSC Was Modeled on MFEnet
1985/6
5. Thirty Years After NSF Adopts DOE Supercomputer Center Model
NSF Adopts DOE ESnet’s Science DMZ for High Performance Applications
• A Science DMZ integrates 4 key concepts into a unified whole:
– A network architecture designed for high-performance applications,
with the science network distinct from the general-purpose network
– The use of dedicated systems as data transfer nodes (DTNs)
– Performance measurement and network testing systems that are
regularly used to characterize and troubleshoot the network
– Security policies and enforcement mechanisms that are tailored for
high performance science environments
http://fasterdata.es.net/science-dmz/
Science DMZ
Coined 2010
The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis
for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program
6. Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Made Over 200 Campus-Level Awards in 44 States
Source: Kevin Thompson, NSF
7. (GDC)
Logical Next Step: The Pacific Research Platform Networks Campus DMZs
to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-PIs:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2/QI,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
NSF Program Officer: Amy Walton
Source: John Hess, CENIC
8. PRP National-Scale Experimental Distributed Testbed:
Using Internet2 to Connect Early-Adopter Quilt Regional R&E Networks
Original PRP
Extended PRP
Testbed
Announced May 8, 2018
Internet2 Global Summit
9. PRP Science DMZ Data Transfer Nodes (DTNs) -
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
FIONette—1G, $250
Five Racked FIONAs at Calit2
• Each Contains:
• Dual 12-Core CPUs
• 96GB RAM
• 1TB SSD
• 2 10GbE interfaces
• Total ~$10,500
• With 8 GPUs
• total ~$18,500
10. GPN Becomes the First Multi-State Regional Network
to Peer with the PRP
Seeing 5Gbs Between
the PRP-Contributed PWave DTN in Los Angeles
To GPN FIONA in UMC
Source: John Hess, CENIC and George Rob III, UMissouri
11. Game Changer: Using Kubernetes
to Manage Containers Across the PRP
“Kubernetes is a way of stitching together
a collection of machines into, basically, a big computer,”
--Craig Mcluckie, Google
and now CEO and Founder of Heptio
"Everything at Google runs in a container."
--Joe Beda,Google
“Kubernetes has emerged as
the container orchestration engine of choice
for many cloud providers including
Google, AWS, Rackspace, and Microsoft,
and is now being used in HPC and Science DMZs.
--John Graham, Calit2/QI UC San Diego
12. Rook is Ceph Cloud-Native Object Storage
‘Inside’ Kubernetes
https://rook.io/
Source: John Graham, Calit2/QI
13. FIONA8
FIONA8
100G Epyc NVMe
40G 160TB
100G NVMe 6.4T
SDSU
100G Gold NVMe
March 2018 John Graham, UCSD
100G NVMe 6.4T
Caltech
40G 160TB
UCAR
FIONA8
UCI
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
sdx-controller
controller-0
Calit2
100G Gold FIONA8
SDSC
40G 160TB
UCR 40G 160TB
USC
40G 160TB
UCLA
40G 160TB
Stanford
40G 160TB
UCSB
100G NVMe 6.4T
40G 160TB
UCSC
40G 160TB
Hawaii
Running Kubernetes/Rook/Ceph On PRP
Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data
Rook/Ceph - Block/Object/FS
Swift API compatible with
SDSC, AWS, and Rackspace
Kubernetes
Centos7
14. Operational Metrics: Containerized Trace Route Tool
Allows Realtime Visualization of Status of Network Links
All Kubernetes Nodes on PRP
Source: Dmitry Mishin(SDSC),
John Graham (Calit2)Presets
This node graph shows UCR
as the source of the flow
to the mesh
15. We Measure Disk-to-Disk Throughput
with 10GB File Transfer Using Globus GridFTP
4 Times Per Day in Both Directions for All PRP Sites
April 24, 2017
Source: John Graham, Calit2
16. PRP’s First 2.5 Years:
Connecting Multi-Campus Application Teams and Devices
Earth
Sciences
GPN Is Beginning to Define
Its Application Drivers
18. Distributed LHC Data Analysis
Running Over PRP
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
GPN Can Connect Campus
LHC Atlas and CMS Data Analysis
19. PRP Distributed Tier-2 Cache
Across Caltech & UCSD-Thousands of Flows Sustaining >10Gbps!
Cache
Server
Cache
Server…
Redirect
or
Cache
Server
Cache
Server…
Redirect
or
UCSD Caltech
Redirector Top Level Cache
Global Data Federation of CMS
Provisioned pilot systems:
PRP UCSD: 9 x 12 SATA Disk of 2TB
@ 10Gbps for Each System
PRP Caltech: 2 x 30 SATA Disk of 6TB
@ 40Gbps for Each System
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
20. Collaboration Opportunity with OSG & PRP
on Distributed Storage
1.8PB1.2PB1.6PB
210TB
Total data volume pulled last year
is dominated by 4 caches.
OSG Is Operating a Distributed Caching CI.
At Present, 4 Caches Provide Significant Use
PRP Kubernetes Infrastructure Could Either
Grow Existing Caches by Adding Servers,
or by Adding Additional Locations
StashCache Users include:
LIGO
DES
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
22. 100 Gbps PRP Over CENIC
Couples UC Santa Cruz Astrophysics Cluster to LBNL NERSC Supercomputer
CENIC 2018
Innovations in
Networking
Award for
Research
Applications
23. The Great Plains Network
Has Many Campuses With Active Projects at SDSC
GPN Map Source: James Deaton, GPN Shawn Strande, SDSC
24. Dan Cayan
USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues
Sponsors:
California Energy Commission
NOAA RISA program
California DWR, DOE, NSF
Planning for climate change in California
substantial shifts on top of already high climate variability
SIO Campus Climate Researchers Need to Download
Results from NCAR Remote Supercomputer Simulations
to Make Regional Climate Change Forecasts
NCAR Upgrading to 100Gbps Link from Wyoming and Boulder to CENIC/PRP
GPN Can Connect
Campus NCAR Users
25. PRP Provides High Performance Access to
Multi-Campus Big Data Collaborative Teams
26. PRP Will Link the Laboratories of
the Pacific Earthquake Engineering Research Center
http://peer.berkeley.edu/
PEER Labs: UC Berkeley, Caltech, Stanford,
UC Davis, UC San Diego, and UC Los Angeles
John Graham Installing FIONette at PEER
Feb 10, 2017
28. PRP Provides High Performance Access to
Large Community Data Repositories
29. Cancer Genomics Hub (UCSC) Was Housed in SDSC, But NIH Moved Dataset
From SDSC to UChicago - So the PRP Deployed a FIONA to Chicago’s MREN
1G
8G
Data Source: David Haussler,
Brad Smith, UCSC
15G
Jan 2016
30. In 2011 EROS sent the equivalent of the entire library of congress every 9 days
In 2011 SDSU was the 3rd largest user downloading data (GIS)
In 2016 EROS sent the equivalent of the entire library of congress every 6 hours
Source: Claude Garelik
USGS Earth Resources Observation and Science (EROS) Center
Is a Natural GPN/PRP Big Data Repository
EROS is located ~15 miles north
of Sioux Falls, South Dakota
31. PRP Provides High Performance Access to
Large Scientific Instruments
32. 100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster
from the LBNL NERSC Supercomputer for DESI Science Analysis
300 images per night.
100MB per raw image
120GB per night
250 images per night.
530MB per raw image
800GB per night
Source: Peter Nugent, LBNL
Professor of Astronomy, UC Berkeley
Precursors to
LSST and NCSA
NSF-Funded Cyberengineer
Shaw Dong @UCSC
Receiving FIONA
Feb 7, 2017
33. Global Scientific Instruments Will Produce Ultralarge Datasets Continuously
Requiring Dedicated Optic Fiber and Supercomputers
Large Synoptic Survey Telescope
3.2 Gpixel Camera
Tracks ~40B Objects,
Creates 1-10M Alerts/Night
Within 1 Minute of Observing
1000 Supernovas Discovered/Night
2x100Gb/s
“First Light”
In 2019
34. The Prototype PRP Has Attracted
New Application Drivers
Scott Sellars, Marty Ralph
Center for Western Weather
and Water Extremes
Frank Vernon, Graham Kent, & Ilkay Altintas, Wildfires
Jules Jaffe – Undersea Microscope
Tom Levy At-Risk Cultural Heritage
35. PRP UC-JupyterHub Backbone Connects
FIONAs At UC Berkeley and UC San Diego
Source: John Graham, Calit2
Goal: Jupyter Everywhere
36. PRP Provides High Performance Access to
SensorNets Coupled to Realtime Computing
37. Church Fire, San Diego CA
Alert SD&ECameras/HPWREN
October 21, 2017
New PRP Application:
Coupling Wireless Wildfire Sensors to Computing
Thomas Fire, Ventura, CA
Firemap Tool, WIFIRE
December 10, 2017
CENIC 2018
Innovations in Networking Award
for Experimental Applications
38. Once a Wildfire is Spotted, PRP Brings High-Resolution Weather Data
to Fire Modeling Workflows in WIFIRE
Real-Time
Meteorological Sensors
Weather Forecast
Landscape data
WIFIRE Firemap
Fire Perimeter
Work Flow
PRP
Source: Ilkay Altintas, SDSC
39. 2014 2015-2017
Grid Spacing 4 km 3 km
Domain Size 1163x723x53 1683x1155x53
One Output Time 4.2 GB 9.7 GB
Sub-Hourly Interval 10 min 6 min
Complete Forecast Size
Hourly + Sub-hourly 18h-30h
508 GB 1639 GB
For 10 members per day 5.08 TB 16.4 TB
For approx 30 days per season 152 TB 492 TB
Full CONUS Data Volumes
High Resolution Ensemble Weather Forecasts at
The Center for Analysis and Prediction of Storms, University of Oklahoma
Hazardous Weather Testbed
• In 2017, CAPS started testing the
next-generation forecasting model
FV3 for convective-scale
forecasting.
• For 2018 HWT CLUE, CAPS is
producing 5 ensembles using WRF
and FV3 with a total of 52 forecasts
of up to 84 hours.
Source: Ming Xue and Keith Brewster, CAPS
Prime Target for GPN/OneNet
Dec, 2013: CAPS has >1 PB
of in-house storage capacity
40. The Rise of Brain-Inspired Computers:
Left & Right Brain Computing: Arithmetic vs. Pattern Recognition
Adapted from D-Wave
41. UC San Diego Jaffe Lab (SIO) Scripps Plankton Camera
Off the SIO Pier with Fiber Optic Network
42. Over 1 Billion Images So Far!
Requires Machine Learning for Automated Image Analysis and Classification
Phytoplankton: Diatoms
Zooplankton: Copepods
Zooplankton: Larvaceans
Source: Jules Jaffe, SIO
”We are using the FIONAs for image processing...
this includes doing Particle Tracking Velocimetry
that is very computationally intense.”-Jules Jaffe
43. New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure:
Adding a Machine Learning Layer Built on Top of the Pacific Research Platform
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for High Speed “Cloud” of 256 GPUs
For 30 ML Faculty & Their Students at 10 Campuses
for Training AI Algorithms on Big Data
NSF Program Officer: Mimi McClure
44. FIONA8: Adding GPUs to FIONAs
Supports Data Science Machine Learning
Multi-Tenant Containerized GPU JupyterHub
Running Kubernetes / CoreOS
Eight Nvidia GTX-1080 Ti GPUs
~$13K
32GB RAM, 3TB SSD, 40G & Dual 10G ports
Source: John Graham, Calit2
45. 48 GPUs for
OSG Applications
UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure -
Devoted to Data Analytics and Machine Learning
SunCAVE 70 GPUs
WAVE + Vroom 48 GPUs
FIONA with
8-Game GPUs
95 GPUs
for Students
CHASE-CI Grant Provides
96 GPUs at UCSD
for Training AI Algorithms on Big Data
Plus 288 64-bit GPUs
On SDSC’s Comet
46. Next Step: Surrounding the PRP Machine Learning Platform
With Clouds of GPUs and Non-Von Neumann Processors
Microsoft Installs Altera FPGAs
into Bing Servers &
384 into TACC for Academic Access
CHASE-CI
64-TrueNorth
Cluster
64-bit GPUs
4352x NVIDIA Tesla V100 GPUs
GPN Next Step:
Add GPUs
to FIONAs
47. The Second National Research Platform Workshop
Bozeman, MT August 6-7, 2018
Announced in I2 Closing Keynote:
Larry Smarr “Toward a National Big Data Superhighway”
on Wednesday, April 26, 2017
Co-Chairs:
Larry Smarr, Calit2
Inder Monga, ESnet
Ana Hunsinger, Internet2
Local Host: Jerry Sheehan, MSU
48. Expanding to the Global Research Platform
Via CENIC/Pacific Wave, Internet2, and International Links
PRP
PRP’s Current
International
Partners
Korea Shows Distance is Not the Barrier
to Above 5Gb/s Disk-to-Disk Performance
Netherlands
Guam
Australia
Korea
Japan
Singapore
49. Our Support:
• US National Science Foundation (NSF) awards
CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158,
ACI-1540112, & ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet