Future Directions in Canadian Research Computing: Complexities of Big Data TORONTO RESEARCH MANAGEMENT SYMPOSIUM, DECEMBER 4, 2014 1
Role of ARC* Today * Advanced Research Computing New Paradigms Simulation: in silico before going to the lab Computational Chemistry Physiological Models Virtual Crash Test Dummy Big Data Instruments, Sensors: 24/7 production jobs to reduce data, create intermediate datasets LHC/TRIUMF IceCube Ocean Networks Canada (OTN, MEOPAR) BC Genome Sciences Centre, HPC4Health Aggregating cohorts: life sciences, social sciences OICR Cancer Collaboratory GenAP StatsCan, PopDataBC Brain-Code CBRAIN RCMP Detecting correlations Pattern recognition, text analysis From Nice to Have to Essential Canada Provides this Essential Capability 2,500 research groups >10,000 researchers 2
3 Advocating for ARC
ARC Complexities IT Projects Embedded in Research Projects Core competency for research? OR Opportunity for collaboration, platform solutions? Scaling for peak demand, project growth/sustainability Evolving Technology Interconnects, GPUs, memory and storage options Software stacks Multiple technologies required to support the research Data Life Cycle Privacy, Security Reproducibility 4
Canada: Simplifying ARC Embedded IT Projects Canada: 200 experts in HPC on 35 campuses ~50% with PhDs in biology, chemistry, physics, etc. Expertise in parallel code optimization, software architectures, networking, storage systems, etc. Already supporting 100s of research software packages and systems Scaling for peak demand, growth Canada: 200,000 cores/2 Petaflops, 20 Petabytes today CFI: Cyberinfrastructure Initiative: HW refreshes in 2016, 2018 2020 Targets: 100+ Petaflops, 1.5+ Exabytes Concentrated investments to accommodate large and small jobs 10 data centres today with 13.5MW total power/cooling capacity, space for 800 racks Evolving Technology Regular $30-$40M purchases enable Best of breed technology Professionally managed software stacks Mixed architectures support diverse workflows on a single system Production capabilities On demand containers/vms Portals/gateways/workspaces Platform services: data transfer, data management, archiving 5
Where is Canada? Memorial University of Newfoundland University of Alberta St. Francis Xavier University University of Prince Edward Island University of Saskatchewan University of Calgary University of New Brunswick University of Regina Université Laval Université du Québec à Trois-Rivières Carleton University University of Ottawa University of Manitoba University of Victoria Lakehead University Laurentian University Queen s University York University University of Toronto Toronto Hospitals McMaster University Wilfrid Laurier University University of Waterloo University of Western Ontario University of Windsor University of Ontario Institute of Technologies (UOIT) Brock University University of Guelph Dalhousie University Saint Mary's University Université de Sherbrooke McGill University Concordia University Université de Montréal Université du Québec à Montréal Member Univerisity Member Univerisity and Personnel Site Member Univerisity, Personnel, and Infrastructure Site Personnel Site Personnel and Infrastructure Site 6 Simon Fraser University University of British Columbia Genome Sciences Centre
What is Canada? Federated structure integrates existing HPC consortia across Canada Canada WestGrid Ontario Calcul Quebec ACEnet SHARCNet SciNet HPCVL Funded by CFI O&M through Major Science Initiative (MSI) Program Capital: Last additions mostly in 2009, from 2006 National Platform Fund award Cyberinfrastructure Initiative awards expected mid 2015, mid 2016 New installations late 2016 and early 2018 August December 1, 2014 4, 2014 7
Canada s Advanced Research Computing Platform 2014: 50 Supercomputers, 27 Data Centres, 200 Experts Research Software Research Software Storage Research Software Storage Research Software Research Software Storage Storage Storage CANARIE Network 8
New Services Globus/GridFTP: Dropbox for research, future data management platform Drag and drop, reliable transfer of very large research files and data sets CC negotiating international anchor tenant partnership with Globus Collaborative service development: data management, publishing, discovery Single sign-on Testing Canadian Access Federation (CAF) with CANARIE Canada Cloud services CC has hosted CANARIE s DAIR services from their introduction Now bringing CC-Cloud into service to provide greater capacity to users in a cloud environment Allows users to create virtual machines that appear to be completely under their control Supporting development of Cloud Scheduler to enable seamless management of multiple cloud systems Data Management Pilot implementation of two Canadian-born open source success stories (Islandora, Archivematica) to find a solution for the "long tail" of research data. Working with CARL, RDC, CANARIE to validate whether these systems could be a solution for Canada s data management needs. Making data analysis using advanced systems as easy as using the Internet CBRAIN: Neuroinformatics GenAP,: Bioinformatics and genomics Funded by CANARIE and developed in part by Canada personnel 9
Engagement & Consultation Sustainable Planning for Advanced Research Computing (SPARC) Filling in the gap on digital infrastructure planning What resources (compute, storage, services) does Canada need to maintain leadership in science and innovation? What challenges do we need to solve in 2022 and how big are these challenges? Linking needs to leadership in science, research and innovation In line with similar initiatives in EU (PRACE), Australia, US (just starting: National Academies) Input from Disciplines and research communities Institutions Partners in digital infrastructure ecosystem, including TC3+, CANARIE, CARL Research-intensive industry Community Engagement Strawman Stage 1 Refresh Plan released later this month Consultations piggybacking on CFI consultations (week of Jan 19) Domain-specific Workshops March 2015 (post EOI) Domain-specific NOI collaborations April/May 2015 National Town Hall at HPCS2015 (Montreal) June 2015 10
More ARC Complexities, More Solutions? Data Life Cycle, Big Data Sharing Want to work with GAGH, GC, CIHR to validate architectures Ideally, to be deployed in our 2016 refresh Driving an international effort to create a federated data sharing platform Data sovereignty Responding to hard research needs Archiving, Preservation: Links with US National Data Service, EUDAT, others Privacy, Security Conflicting Legal, Regulatory, Ethical frameworks Demise of consent + anonymization Tyranny of the minority Challenging the value of aggregation: correlation both good and bad Several Approaches, including HPC4Health Brain Code Building our Centre of Excellence Platform opportunities: CFI s Challenge 1 11
Thank you Mark J. Dietrich President and CEO Canada mjdietrich@computecanada.ca www.computecanada.ca @computecanada 12