SuperStack Next Exit Challenges on CC*IIE at UF Xiaolin (Andy) Li Associate Professor Director, Scalable Software Systems Laboratory (S3Lab) Area Chair of Computer Engineering Division Department of Electrical and Computer Engineering University of Florida, USA http://www.s3lab.ece.ufl.edu/ Acknowledgement: NSF MRI, CC-NIE, GENI, CAREER, PetaApps
UF Campus CI Units RC = Research Compu=ng UFIT = UF Informa=on Technology CNS: Computing & Networking Services FLR = Florida Lambda Rail SSERCA = Sunshine State Educa=on and Research Compu=ng Alliance FAU, FIU, FSU, UCF, UF, UM, USF FAMU, UNF, UWF
UF CI Goal Increase research porholio From about $700 million per year To $1 billion per year Meet a wide spectrum of research needs Health, Engineering, Science Agriculture Business, Law Computer Science and Engineering Cloud, Big Data, Future Internet, Storage Machine Learning, Data Mining, Bioinforma=cs, CDN CPS, Internet of Things, Mobile Social Networks
RC infrastructure before 2010 Compute system with 3,500 cores Storage system of 200 TB Networking of 20Gbps Staff of 3.20 FTE 95 research groups from 29 departments 400 users 35 investors
RC resources in 2014 Compute system with 21,000 cores (x6) Storage system total of 5 PB (x25) Networking of 200Gbps (x10) NSF CC- NIE (Gateway), MRI (CRNv2/GatorCloud) ExoGENI Rack FutureGrid Staff of 10.25 FTE (x3) UF Informa=cs Ins=tute 327 research groups from 87 departments (x3) 1067 users (x2.5) 150 investors (x4) Suppor=ng over $300M in grant ac=vity
Science DMZ: Campus Research Network National Lambda Rail, Internet2, GENI (via Jacksonville) 2*10Gb/s upgraded to 2*100Gb/s FLR SSRB CNS Lab Physics HPC Center - Phy SSRB Campus Datacenter 2 100G 2*10Gb/s upgraded to 2*100Gb/s UF 2 GatorVisor 100G 40G 100G 2 2 Physics CMS/OSG 46 U 8 U ECDC HPC Center - ES 100G 2 40G 4 10G 4 4 100G 46 U Data Center 8 U 2 U 1 U CISE Lab Nets Controller 2 U 1 U 3 U 3 U 3 U 3 U 3 U Cloud Green 3 U 8 U Cloud Orange Data Cloud VM Cloud Larsen HPC Center - Eng 8 U 8 U Larsen HCS Lab Hybrid Controller Golfer Cloud Portal VM Cloud Data Cloud 8 U NEB S3Lab Apps Controller Golfer SDN Switch Phase 1 SDN, 40G/10G Phase 2 SDN, 100G SDN Control Plane
New Datacenter Major Data Centers at UF HiPerGator Supercomputer Ranking from top500 supercomputer list # 10 among public universi=es in US # 14 among universi=es in US # 493 among all machines listed HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Transla=onal Science Ins=tute ACIS/CAC Data Center CHREC Data Center NEB Data Center
Stakeholders Security Science Artifacts Physics Biology/ Bioinform atics CNS UFIT/ CIO RC/HPC Chemistry BigData/ HPC/CS&E Research Astronomy Health MAE Others Energy Climate/ Ag EE/ SmartGrid Labor Resources Usage Cost RC: Research Computing CNS: Computing & Networking Services CS&E: Computer Science & Engineering
Example Uses of RC Infrastructure The RC infrastructure adds value Complex mul=disciplinary research endeavors: CCMT = Center for Compressible Mul=phase Turbulence, PI: S. Balachandar SECIM = Southeast Center for Integra=ve Metabolomics, PI: A. Edison Super- app for gene- sequencing, PI: L. Moroz Medical sensing, analy=cs, and knowledge fusion, PI: X. Li
CMS Experiment $500M experiment Highlighted in red is the endcap muon system whose design and construc=on was led by UF
CMS Collabora=on 38 countries 182 ins=tu=ons 3,000 scien;sts and engineers
US CMS Collabora=on 48 ins=tu=ons: 2 Na=onal Labs and 46 universi=es
UF CMS With almost 40 scien=sts, UF is the 3rd largest US CMS ins=tu=on behind only Fermilab and Univ. of Wisconsin- Madison (and ahead of MIT, Princeton, Caltech, Cornell, CMU, and 39 others )
Compressible Mul;phase Turbulence Goals of the Center To radically advance the field of Compressible Mul=phase Turbulence To advance predic=ve simula=on science through high performance compu=ng To advance a co- design strategy that combines exascale emula=on, exascale algorithms, exascale M&S To educate students/postdocs in simula=on science and place them at Na=onal laboratories T1-14
Integra=ve Metabolomics Core 1: Tim Garrej DRCC at UCSD Within 2 months Core 3: Rick Yost UF RC Storage Process/Normalize Core 2: Art Edison Galaxy + Command line Client mee=ng PI and group members working on the project
Gene sequencing from a ship
Medical Sensing, Analy=cs, and Fusion 4.8cm Healthcare Providers Social Friends/Rela;ves Mobile Apps, Third- Party Apps Web Portals/Browsers 2.7cm Pa;ents, Elders Mobile Devices 4.8cm BLE Online Query, Interaction, Decision Making Interface Sensor PlaRorm 2.7cm BLE Brokers Pub/Sub Streaming Noncontact Sensing Advanced Signal Processing Features, PaIerns, Symptoms Near-line Stream Processing Context- Aware Medical Diagnosis Offline Big Data Engine Platform Multiplexing Two- Way Knowledge Fusion Security & Privacy Mechanisms Pub/Sub Interaction Healthcare Providers Social Networks/Rela;ves Sensor Data Streaming Brokers Nova Swift Distributed File System Cinder Glance Keystone VitalCloud: Noncontact Vital Sensing Cloud Neutron Data Exchange Middleware EHR Medical Facili;es
Challenges & Opportuni=es Build services on top of this infrastructure Large need with non- tradi=onal users Tradi=onal Linux cmdline users are just a frac=on Turbo- charge the desktop Seamless connec=on over networks to storage and compute Cloud is the way people expect services Self- provision Self- configure Always on Accessible from anywhere on any device
Emerging Programming Models - for Big Data, Big System, Big Science Emerging frameworks and rapid innova=ons CIEL" Pregel Percolator Pig Dryad
Conven;onal Prac;ces Rela=vely sta=c environments HPC administrators are responsible for maintaining the sonware stack Use PBS/Torque for batch job execu=on Focusing on MPI framework (Op=onal) Maintain separated cluster for other frameworks, e.g., MapReduce, OpenStack.
Inadequate Support - Mismatch between Requirements of Rapid Innova=on and Rela=vely Rigid Resource Configura=on Users Try new frameworks for big data analy=cs; Innovate on current data analysis frameworks; Cooperate with other organiza=ons; HPC Clusters» Reduce administrative operation overhead;» Improve resource utilization;» Enrich user experience;» q No privileges to users; q No guarantee for cross- organiza=on compa=bility; q No method for fine- grained, dynamic resource alloca=on; q
Time for Change Current Sta;c, Par;;oned Target Unified, Mul;plexed, Dynamic Hadoop CIEL" Pregel Pig Percolator OpenStack Dryad Container Virtual Machine Bare Metal Torque
OpenFlow SuperStack: Unified Campus Cloud Software-Defined Ecosystem *-as-a-service Compute, Storage, Network, Platform, Big Data, HPC, CPS, Cloud, AppStore/AppEngine (Res, Net, Sec)