A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...Larry Smarr
The document discusses UCSD's plans to build a high-performance campus-scale cyberinfrastructure to support data-intensive research. It outlines investments in fiber networks and large-scale storage resources to connect researchers and instruments campus-wide. A key part of the plan is the Triton resource, a shared cluster and storage system hosted at SDSC to enable large-scale data analysis across campus.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...Larry Smarr
06.03.13
Invited Keynote
Annual Meeting CENIC 2006
Title: Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA)
Oakland, CA
Driving Applications on the UCSD Big Data Freeway SystemLarry Smarr
This document provides a summary of a keynote lecture about driving data-intensive applications using high-performance cyberinfrastructure at UC San Diego. The lecture discusses:
1) The exponential growth of digital data and need for dedicated high-bandwidth infrastructure to analyze large datasets.
2) Examples of data-intensive applications at UCSD including climate modeling, protein structure analysis, and medical research requiring fast access to remote supercomputers and large datasets.
3) UCSD's development of an optical "Big Data Freeway System" using high-speed fiber to connect resources and enable real-time analysis of large datasets up to 1000 times faster than the shared internet.
Machine Learning in Healthcare DiagnosticsLarry Smarr
Machine learning and artificial intelligence are rapidly transforming healthcare and medicine. Advances in genetic sequencing have enabled the mapping of human and microbial genomes at low costs. Researchers are using machine learning to analyze genomic and microbiome data to better understand health and disease. Non-von Neumann brain-inspired computing architectures are being developed for machine learning applications and could accelerate medical research and diagnostics. These technologies may help create personalized health coaching and move medicine from reactive sickcare to proactive healthcare.
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
06.01.14
Presentation for the Microbe Project Interagency Team
Title: Building an Information Infrastructure to Support Microbial Metagenomic Sciences
La Jolla, CA
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
National Ocean Exploration Forum 2017
Ocean Exploration in a Sea of Data
Calit2’s Qualcomm Institute
University of California, San Diego
October 21, 2017
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
The Pacific Research Platform (PRP) is a multi-institutional cyberinfrastructure project that connects researchers across California and beyond to share large datasets. It spans the 10 University of California campuses, major private research universities, supercomputer centers, and some out-of-state universities. Fifteen multi-campus research teams in fields like physics, astronomy, earth sciences, biomedicine, and multimedia will drive the technical needs of the PRP over five years. The goal is to create a "big data freeway" to allow high-speed sharing of data between research labs, supercomputers, and repositories across multiple networks without performance loss over long distances.
Calit2 - CSE's Living Laboratory for ApplicationsLarry Smarr
08.05.27
UCSD CSE 91 - Perspectives in Computer Science (Spring 2008)
Calit2@UCSD
Title: Calit2 - CSE's Living Laboratory for Applications
La Jolla, CA
- Digital mirror worlds are software models of physical systems that are continuously updated with real-time data, allowing them to closely mimic and predict the behavior of the real system.
- Advances in computing power and sensors are enabling increasingly detailed digital twins of objects, human bodies, cities, wildfires, and even the observable universe.
- One trillion sensors are expected to feed the planetary computer within the next decade, driving a global industrial internet and $15 trillion in economic value through digital twins of manufactured products.
- Digital twins powered by consumer sensor data may one day provide early disease detection and personalized health coaching at scale.
Using the Pacific Research Platform for Earth Sciences Big DataLarry Smarr
Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
CENIC: Pacific Wave and PRP Update Big News for Big DataLarry Smarr
The document discusses the Pacific Wave exchange and Pacific Research Platform (PRP). It provides an overview of Pacific Wave, including its history and connectivity across the Pacific and western US. It then discusses how the PRP will build on infrastructure projects to create a high-speed "big data freeway" for science across California universities. This will allow researchers to more easily share and analyze large datasets for projects in areas like climate modeling, cancer genomics, astronomy and particle physics. Details are provided on specific science applications and datasets that will benefit from the enhanced connectivity of the PRP.
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
The document discusses the development of the Advanced Networking Initiative (ANI), a 100 Gbps national prototype network and testbed established by the Department of Energy's Energy Sciences Network (ESnet) to support scientific research. Researchers from various fields have used the ANI testbed to test networking technologies and data transfer tools for moving extremely large datasets, such as climate simulation data and radio astronomy data. The testbed has helped researchers optimize their software and protocols for high-speed data transfer over long-distance 100 Gbps networks.
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...Larry Smarr
This document provides a summary of a lecture given by Dr. Larry Smarr on the past, present, and future of supercomputing over the last 50 years. The summary discusses:
- How Smarr solved equations for colliding black holes in the 1970s using a megaFLOPs computer, whereas today collisions are detected using petaFLOPs supercomputers - a billion fold increase in speed.
- How Smarr's research has evolved from modeling astrophysical phenomena to mapping the human gut microbiome using terabytes of sequencing data and hundreds of thousands of core-hours of supercomputing.
- Emerging trends in brain-inspired computing architectures and non-von Neumann systems that are better suited to tasks
From Quantified Self to Quantified SurgeryLarry Smarr
Larry Smarr underwent surgery to remove his diseased sigmoid colon. He used self-quantification of biomarkers over a decade to diagnose chronic inflammation. Pre-surgical planning used 3D imaging from MRI converted to VR. During surgery, 3D organ segmentation guided the surgical team and EGG monitored recovery. Post-surgery, biomarkers like CRP and microbiome returned to healthy levels, showing the benefits of a quantified approach to surgery.
Using Supercomputers and Gene Sequencers to Discover Your Inner MicrobiomeLarry Smarr
This keynote talk discusses research using supercomputers and gene sequencing to study the human microbiome. The human microbiome contains 100 trillion microorganisms and their genes outnumber human genes 300 to 1. The speaker has been collecting data from his own body over 7 years to study his microbiome and immune system interactions. Collaborating researchers have sequenced his gut microbiome over time as well as samples from autoimmune disease patients. Supercomputers are needed to analyze the massive amount of sequencing data and reveal details of microbial ecology and genetics in health and disease. Studying the human microbiome will revolutionize medicine in the next decade.
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
The Human Microbiome, Supercomputers,and the Advancement of MedicineLarry Smarr
The keynote presentation discusses the importance of the human microbiome and how understanding its dynamics can advance medicine. It notes that the human microbiome contains tens of trillions of microbial cells and hundreds of times as many genes as human cells. Understanding the microbiome as an ecology rather than focusing on single pathogens is crucial. The presentation describes research tracking one person's microbiome and biomarkers over time, finding shifts between healthy and diseased states. It advocates developing tools to manage the microbiome and new therapies like fecal transplants. National initiatives now recognize the microbiome's importance in health and disease.
The document summarizes a seminar given by Dr. Larry Smarr on supercomputing the human microbiome. Some key points:
- The human microbiome contains 100 trillion microorganisms and their DNA contains 300 times as many genes as human DNA.
- Dr. Smarr has been collecting extensive data from his own body over 7 years to study his personal microbiome and immune system interactions using high performance computing.
- Analyzing microbiome data requires massive computing resources, such as millions of core hours on supercomputers. This reveals details of microbial ecology and genetics in health and disease.
- Computational analysis of microbiome sequencing data from many subjects shows major shifts in microbial populations between healthy and
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...Larry Smarr
Invited Presentation Microbiology and the Microbiome and the Implications for Human Health Analytic, Life Science & Diagnostic Association (ALDA) 2016 Senior Management Conference
Half Moon Bay, CA
October 3, 2016
Dynamics of Your Gut Microbiome in Health and DiseaseLarry Smarr
This document summarizes a presentation by Dr. Larry Smarr on the dynamics of the gut microbiome in health and disease. It discusses how the gut microbiome contains hundreds of microbial species that vary significantly between healthy and diseased states. Dr. Smarr has tracked his own gut microbiome and biomarkers over time, discovering an autoimmune disease. He is now collaborating on a project combining deep metagenomic sequencing and supercomputing to map differences in the gut microbiome between healthy and inflammatory bowel disease patients.
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
Global Research Platforms: Past, Present, FutureLarry Smarr
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
- The Pacific Research Platform (PRP) interconnects campus DMZs across multiple institutions to provide high-speed connectivity for data-intensive research.
- The PRP utilizes specialized data transfer nodes called FIONAs that provide disk-to-disk transfer speeds of 10-100Gbps.
- Early applications of the PRP include distributing telescope data between UC campuses, connecting particle physics experiments to computing resources, and enabling real-time wildfire sensor data analysis.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
The document discusses the Pacific Research Platform (PRP), a distributed cyberinfrastructure that connects researchers and data across multiple campuses in California and beyond using optical fiber networking. Key points:
- The PRP uses high-speed networking infrastructure like the CENIC network to connect data generators and consumers across 15+ campuses, creating an integrated "big data freeway system".
- It deploys specialized data transfer nodes called FIONAs to enable high-speed transfer of large datasets between sites at near the full network speed.
- Recent additions include using Kubernetes to orchestrate containers across the PRP infrastructure and integrating machine learning resources through the CHASE-CI grant to support data-intensive AI applications.
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
The document discusses creating a California-wide cyberinfrastructure for data-intensive research. It outlines efforts to connect all UC campuses and other research institutions across California with high-speed optical networks. This would create a "big data plane" to share large datasets. Several campuses have received NSF grants to upgrade their networks and implement Science DMZ architectures with 10-100Gbps connections to CENIC. Connecting these resources would provide researchers access to high-performance computing, large scientific instruments, and datasets. This would support collaborative big data science across disciplines like physics, climate modeling, genomics and microscopy.
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
Berkeley cloud computing meetup may 2020Larry Smarr
The Pacific Research Platform (PRP) is a high-bandwidth global private "cloud" connected to commercial clouds that provides researchers with distributed computing resources. It links Science DMZs at universities across California and beyond using a high-performance network. The PRP utilizes Data Transfer Nodes called FIONAs to transfer data at near full network speeds. It has adopted Kubernetes to orchestrate software containers across its resources. The PRP provides petabytes of distributed storage and hundreds of GPUs for machine learning. It allows researchers to perform data-intensive science across multiple universities much faster than possible individually.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Getting Started with Interactive Brokers API and Python.pdfRiya Sen
In the fast-paced world of finance, automation is key to staying ahead of the curve. Traders and investors are increasingly turning to programming languages like Python to streamline their strategies and enhance their decision-making processes. In this blog post, we will delve into the integration of Python with Interactive Brokers, one of the leading brokerage platforms, and explore how this dynamic duo can revolutionize your trading experience.
The Rise of Python in Finance,Automating Trading Strategies: _.pdfRiya Sen
In the dynamic realm of finance, where every second counts, the integration of technology has become indispensable. Aspiring traders and seasoned investors alike are turning to coding as a powerful tool to unlock new avenues of financial success. In this blog, we delve into the world of Python live trading strategies, exploring how coding can be the key to navigating the complexities of the market and securing your path to prosperity.
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...rightmanforbloodline
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B. Fraleigh, Verified Chapters 1 - 56,.pdf
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B. Fraleigh, Verified Chapters 1 - 56,.pdf
Combined supervised and unsupervised neural networks for pulse shape discrimi...Samuel Jackson
Our methodology for pulse shape discrimination is split into two steps. Firstly, we learn a model to discriminate between pulses using "clean" low-rate examples by removing pile-up & saturated events. In addition to traditional tail sum discrimination, we investigate three different choices for discrimination between γ-pulses, fast, thermal neutrons. We consider clustering the pulses directly using Gaussian Mixture Modelling (GMM), using variational autoencoders to learn a representation of the pulses and then clustering the learned representation (VAE+GMM) and using density ratio estimation to discriminate between a mixed (γ + neutron) and pure (γ only) sources using a multi-layer perceptron (MLP) as a supervised learning problem.
Secondly, we aim to classify and recover pile-up events in the < 150 ns regime by training a single unified multi-label MLP. To frame the problem as a multi-label supervised learning method, we first simulate pile-up events with known components. Then, using the simulated data and combining it with single event data, we train a final multi-label MLP to output a binary code indicating both how many and which type of events are present within an event window.
DESIGN AND DEVELOPMENT OF AUTO OXYGEN CONCENTRATOR WITH SOS ALERT FOR HIKING ...JeevanKp7
Long-term oxygen therapy (LTOT) and novel techniques of evaluating treatment efficacy have enhanced the quality of life and decreased healthcare expenses for COPD patients.
The cost of a pulmonary blood gas test is comparable to the cost of two days of oxygen therapy and the cost of a hospital stay is equivalent to the cost of one month of oxygen therapy, long-term oxygen therapy (LTOT) is a cost-effective technique of treating this disease.
A small number of clinical investigations on LTOT have shown that it improves the quality of life of COPD patients by reducing the loss of their respiratory capacity. A study of 8487 Danish patients found that LTOT for 1524 hours per day extended life expectancy from 1.07 to 1.40 years.
Annex K RBF's The World Game pdf documentSteven McGee
Signals & Telemetry Annex K for RBF's The World Game / Trade Federations / USPTO 13/573,002 Heart Beacon Cycle Time - Space Time Chain meters, metrics, standards. Adaptive Procedural template framework structured data derived from DoD / NATO's system of systems engineering tech framework
Big Data and Analytics Shaping the future of PaymentsRuchiRathor2
The payments industry is experiencing a data-driven revolution powered by big data and analytics.
Here's a glimpse into 5 ways this dynamic duo is transforming how we pay.
In essence, big data and analytics are playing a pivotal role in building a future filled with faster, more secure, and convenient payment methods for everyone.
1. “Creating a Science-Driven
Big Data Superhighway”
Remote Briefing to the Ad Hoc Big Data Task Force
of the NASA Advisory Council Science Committee
NASA Goddard Space Flight Center
June 28, 2016
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. Vision:
Creating a Pacific Research Platform
Use Optical Fiber Networks to Connect
All Data Generators and Consumers,
Creating a “Big Data” Freeway System
“The Bisection Bandwidth of a Cluster Interconnect,
but Deployed on a 20-Campus Scale.”
This Vision Has Been Building for 15 Years
3. NSF’s OptIPuter Project: Demonstrating How SuperNetworks
Can Meet the Needs of Data-Intensive Researchers
OptIPortal–
Termination
Device
for the
OptIPuter
Global
Backplane
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI
Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
2003-2009
$13,500,000
In August 2003,
Jason Leigh and his
students used
RBUDP to blast
data from NCSA to
SDSC over the
TeraGrid DTFnet,
achieving18Gbps
file transfer out of
the available
20Gbps
LS Slide 2005
4. DOE ESnet’s Science DMZ: A Scalable Network
Design Model for Optimizing Science Data Transfers
• A Science DMZ integrates 4 key concepts into a unified whole:
– A network architecture designed for high-performance applications,
with the science network distinct from the general-purpose network
– The use of dedicated systems as data transfer nodes (DTNs)
– Performance measurement and network testing systems that are
regularly used to characterize and troubleshoot the network
– Security policies and enforcement mechanisms that are tailored for
high performance science environments
http://fasterdata.es.net/science-dmz/
Science DMZ
Coined 2010
The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis
for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program
5. Creating a “Big Data” Freeway on Campus:
NSF-Funded Prism@UCSD and CHeruB Campus CC-NIE Grants
Prism@UCSD,
PI Phil Papadopoulos,
SDSC, Calit2,
(2013-15)
CHERuB,
PI Mike Norman,
SDSC
CHERuB
6. FIONA – Flash I/O Network Appliance:
Linux PCs Optimized for Big Data on DMZs
FIONAs Are
Science DMZ Data Transfer Nodes (DTNs) &
Optical Network Termination Devices
UCSD CC-NIE Prism Award & UCOP
Phil Papadopoulos & Tom DeFanti
Joe Keefe & John Graham
Cost $8,000 $20,000
Intel Xeon Haswell E5-1650 v3 6-Core 2x E5-2697 v3 14-Core
RAM 128 GB 256 GB
SSD SATA 3.8 TB SATA 3.8 TB
Network Interface 10/40GbE Mellanox 2x40GbE Chelsi+Mellanox
GPU NVIDIA Tesla K80
RAID Drives 0 to 112TB (add ~$100/TB)
Rack-Mount Build:
7. How Prism@UCSD Transforms Big Data Microbiome Science:
Preparing for Knight/Smarr 1 Million Core-Hour Analysis
FIONA:
12 Cores/GPU
128 GB RAM
3.5 TB SSD
48TB Disk
10Gbps NIC
Knight Lab
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
8. NSF Has Funded Over 100 Campuses
to Build Local Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
9. We Are Building on 15 Years of Member Investment in
CENIC: California’s Research & Education Network
• Members in All 58 Counties Connect
via Fiber-Optics or Leased Circuits
– 3,800+ Miles of Optical Fiber
– Over 10,000 Sites Connect to CENIC
– 20,000,000 Californians Use CENIC
• Funded & Governed by Segment Members
– UC, Cal State, Stanford, Caltech, USC
– Community Colleges, K-12, Libraries
– Collaborate With Over
500 Private Sector Partners
– 88 Other Peering Partners
– (Google, Microsoft, Amazon …)
10. Next Step: The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
FIONAs as
Uniform DTN End Points
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-Pis:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
11. Ten Week Sprint to Demonstrate
the West Coast Big Data Freeway System: PRPv0
Presented at CENIC 2015
March 9, 2015
FIONA DTNs Now Deployed to All UC Campuses
And Most PRP Sites
12. January 29, 2016 PRPV1 (L3)
PRP Point-to-Point Bandwidth Map
GridFTP File Transfers-Note Huge Improvement in Last Six Months
June 6, 2016 PRPV1 (L3)
Green is Disk-to-Disk
In Excess of 5Gbps
14. PRP Timeline
• PRPv1
– A Routed Layer 3 Architecture
– Tested, Measured, Optimized, With Multi-Domain Science Data
– Bring Many Of Our Science Teams Up
– Each Community Thus Will Have Its Own Certificate-Based Access
To its Specific Federated Data Infrastructure
• PRPv2
– Incorporating SDN/SDX, AutoGOLE / NSI
– Advanced IPv6-Only Version with Robust Security Features
– e.g. Trusted Platform Module Hardware and SDN/SDX Software
– Support Rates up to 100Gb/s in Bursts and Streams
– Develop Means to Operate a Shared Federation of Caches
– Cooperating Research Groups
15. Invitation-Only PRP Workshop Held in Calit2’s Qualcomm Institute
October 14-16, 2015
• 130 Attendees From 40 organizations
– Ten UC Campuses, as well as UCOP Plus 11 Additional US Universities
– Four International Organizations (from Amsterdam, Canada, Korea, and Japan)
– Five Members of Industry Plus NSF
16. PRP First Application: Distributed IPython/Jupyter Notebooks:
Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images
IJulia
IHaskell
IFSharp
IRuby
IGo
IScala
IMathics
Ialdor
LuaJIT/Torch
Lua Kernel
IRKernel (for the R language)
IErlang
IOCaml
IForth
IPerl
IPerl6
Ioctave
Calico Project
• kernels implemented in Mono,
including Java, IronPython, Boo,
Logo, BASIC, and many others
IScilab
IMatlab
ICSharp
Bash
Clojure Kernel
Hy Kernel
Redis Kernel
jove, a kernel for io.js
IJavascript
Calysto Scheme
Calysto Processing
idl_kernel
Mochi Kernel
Lua (used in Splash)
Spark Kernel
Skulpt Python Kernel
MetaKernel Bash
MetaKernel Python
Brython Kernel
IVisual VPython Kernel
Source: John Graham, QI
17. GPU JupyterHub:
2 x 14-core CPUs
256GB RAM
1.2TB FLASH
3.8TB SSD
Nvidia K80 GPU
Dual 40GbE NICs
And a Trusted Platform
Module
GPU JupyterHub:
1 x 18-core CPUs
128GB RAM
3.8TB SSD
Nvidia K80 GPU
Dual 40GbE NICs
And a Trusted Platform
Module
PRP UC-JupyterHub Backbone
UCB Next Step: Deploy Across PRP UCSD
Source: John Graham, Calit2
18. Cancer Genomics Hub (UCSC) is Housed in SDSC:
Large Data Flows to End Users at UCSC, UCB, UCSF, …
1G
8G
Data Source: David
Haussler, Brad Smith, UCSC
15G
Jan 2016
30,000 TB
Per Year
19. Two Automated Telescope Surveys
Creating Huge Datasets Will Drive PRP
300 images per night.
100MB per raw image
30GB per night
120GB per night
250 images per night.
530MB per raw image
150 GB per night
800GB per night
When processed
at NERSC
Increased by 4x
Source: Peter Nugent, Division Deputy for Scientific Engagement, LBL
Professor of Astronomy, UC Berkeley
Precursors to
LSST and NCSA
PRP Allows Researchers
to Bring Datasets from NERSC
to Their Local Clusters
for In-Depth Science Analysis
20. Global Scientific Instruments Will Produce Ultralarge Datasets Continuously
Requiring Dedicated Optic Fiber and Supercomputers
Square Kilometer Array Large Synoptic Survey Telescope
https://tnc15.terena.org/getfile/1939 www.lsst.org/sites/default/files/documents/DM%20Introduction%20-%20Kantor.pdf
Tracks ~40B Objects,
Creates 10M Alerts/Night
Within 1 Minute of Observing
2x40Gb/s
21. community resources. This facility depends on a range of common services, support activities, software,
and operational principles that coordinate the production of scientific knowledge through the DHTC
model. In April 2012, the OSG project was extended until 2017; it is jointly funded by the Department of
Energy and the National Science Foundation.
OSG Federates Clusters in 40/50 States:
Creating a Scientific Compute and Storage “Cloud”
Source: Miron Livny, Frank Wuerthwein, OSG
22. We are Experimenting with the PRP for Large Hadron Collider Data Analysis
Using The West Coast Open Science Grid on 10-100Gbps Optical Networks
Crossed
100 Million
Core-Hours/Month
In Dec 2015
Over 1 Billion
Data Transfers
Moved
200 Petabytes
In 2015
Supported Over
200 Million Jobs
In 2015
Source: Miron Livny, Frank Wuerthwein, OSG
ATLAS
CMS
23. PRP Prototype of Aggregation of OSG Software & Services
Across California Universities in a Regional DMZ
• Aggregate Petabytes of Disk Space
& PetaFLOPs of Compute,
Connected at 10-100 Gbps
• Transparently Compute on Data
at Their Home Institutions & Systems
at SLAC, NERSC, Caltech, UCSD, & SDSC
SLAC
UCSD
& SDSC
UCSB
UCSC
UCD
UCR
CSU Fresno
UCI
Source: Frank Wuerthwein,
UCSD Physics;
SDSC; co-PI PRP
PRP Builds
on SDSC’s
LHC-UC ProjectCaltech
ATLAS
CMS
other physics
life sciences
other sciences
OSG Hours 2015
by Science Domain
25. Dan Cayan
USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues
NCAR Upgrading to 10Gbps Link Over Westnet
from Wyoming and Boulder to CENIC/PRP
Sponsors:
California Energy Commission
NOAA RISA program
California DWR, DOE, NSF
Planning for climate change in California
substantial shifts on top of already high climate variability
UCSD Campus Climate Researchers Need to Download
Results from NCAR Remote Supercomputer Simulations
to Make Regional Climate Change Forecasts
26. average summer
afternoon temperature
average summer
afternoon temperature
Downscaling Supercomputer Climate Simulations
To Provide High Res Predictions for California Over Next 50 Years
26
Source: Hugo Hidalgo, Tapash Das, Mike Dettinger
27. Next Step: Global Research Platform
Building on CENIC/Pacific Wave and GLIF
Current
International
GRP Partners