Responding to Canada s Research Computing Needs: Researcher input and the national planning process Jonathan Dursi, CTO jonathan.dursi@computecanada.ca
Compute Canada Project University of Alberta University of Calgary Simon Fraser University University of British Columbia Genome Sciences Centre University of Victoria University of Saskatchewan University of Regina University of Manitoba Lakehead University Laurentian University Queen s University York University University of Toronto Toronto Hospitals McMaster University University of New Brunswick Université Laval Université du Québec à Trois-Rivières Carleton University University of Ottawa Memorial University of Newfoundland St. Francis Xavier University University of Prince Edward Island Université de Sherbrooke McGill University Concordia University Université de Montréal Université du Québec à Montréal Dalhousie University Saint Mary's University Wilfrid Laurier University University of Waterloo University of Western Ontario University of Windsor University of Ontario Institute of Technologies (UOIT) Brock University University of Guelph Member Univerisity Member Univerisity and Personnel Site Member Univerisity, Personnel, and Infrastructure Site Personnel Site Personnel and Infrastructure Site
Compute Canada Project Goals National Platform for Advanced Research Computing Expertise, Services, and Infrastructure Bring an entire national collection of expertise, services, and compute/data resources to bear on individual research problems University of Alberta University of Calgary Simon Fraser University University of British Columbia Genome Sciences Centre University of Victoria University of Saskatchewan University of Regina University of Manitoba Wilfrid Laurier University University of Waterloo University of Western Ontario University of Windsor Lakehead University Queen s University York University University of Toronto Toronto Hospitals McMaster University University of Ontario Institute of Technologies (UOIT) Brock University University of Guelph Université du Québec à Trois-Rivières Laurentian University Carleton University University of Ottawa University of New Brunswick Université Laval Memorial University of Newfoundland St. Francis Xavier University University of Prince Edward Island Université de Sherbrooke McGill University Concordia University Université de Montréal Université du Québec à Montréal Dalhousie University Saint Mary's University Member Univerisity Member Univerisity and Personnel Site Member Univerisity, Personnel, and Infrastructure Site Personnel Site Personnel and Infrastructure Site
Who is Compute Canada? The Compute Canada project team: National office of the same name Keep the funding flowing, keep coordination on track, advocacy, national/international collaborations, etc Regional organizations with autonomy, flexibility, responsibility to implement parts of the platform WestGrid in Western Canada National-level coordination on services, interoperability, knowledge exchange Working toward improving interoperability (systems, experts), increasing breadth of services
What s it all for? - Now Now: Enable Canadian research that needs HPC by Providing access to HPC resources Providing a national network of experts (140 technical staff) to help Providing training, etc
Researcher Input Cannot be successful without strong mechanisms for Researcher Input. Must provide what researchers need while also keeping eye out for upcoming technologies researchers will need.
Researcher Input Many regional organizations are very well integrated into existing research communities Get constant feedback Not always disseminated, combined across the country Get different types of input Weaker connections to, input from, research communities not already already big users (= successful with current setup)
National Researcher Input Try to get broad, routine, national, researcher input to bring into Operations Longer term planning Advocacy
National Researcher Input ACOR (Advisory Council on Research) National, broad, panel of researchers; advisory to the board Recent Researcher Needs survey National ~15 min online survey, 425 completed Strategic Plan consultations ~25 national meetings, 500 people engaged Chief Science Officer, Dugan O Neil, SFU
Today Want to tell you today about The planning process - Strategic Plan, Management Plan The results of the national consultations (strategic plan town halls, survey) have been How these inputs are shaping plans Want your feedback on those results Want your feedback on mechanisms for sustained, continued input
Strategic Plan High-level set of basic purpose, goals of project Build written national agreement about what business we re in, what broadly we should be doing nationally to enable research Will not lay out specific projects to meet those goals - more cloud computing, different training programs, etc.
Strategic Plan Needed because, though has existed since ~2006, still not widespread agreement about basic mission, what does and doesn t fall within remit. Also, CFI condition on MSI grant. Draft to be circulated shortly (?)
Management Plan Or Operations Plan, or Putting meat on the bones of the Management Plan If these are our priorities, what do we do, and how do we measure success? What is process for determining investment priorities?
Strategic Plan Town Halls 24 in-person townhalls, 3 online St John s to Victoria Hundreds of attendees
Strat Plan: What we heard People Storage Ease of use/interoperability Funding New use cases Industry engagement National/Regional organization Cutting edge technologies
What we heard: People Importance of technical staff in getting research done Very high quality of technical staff Consulting/support, training extremely important and must be supported and grow Access to resources vital - but for many groups, without access to that technical expertise, the resources have much less value.
What we heard: Storage Not nearly enough, even for traditional compute/simulation intensive workflows New data-intensive work extremely poorly supported Different types of storage (archival, nearline, on-line) all important
What we heard: Usability Ease of use/interoperability For many, existing systems throw up roadblocks. Systems highly heterogeneous; use a (say) WG system, start from scratch to use a (say) CQ system. Insufficient access to interactive resources Insufficient access to web-based interfaces Insufficient access to cloud-type systems
What we heard: Funding! Wide agreement about need for predictable, sustained funding For hardware refreshes, staff, operations.. This concern is shared throughout the entire research community (Digital Infrastructure summit) Advocacy has to be an activity of the national organization
What we heard: New uses Data-intensive: storage Cloud-type workflows Need more support for researchers in disciplines with emerging compute requirements Humanities Social Sciences Bioinformatics Big Data Hardware, software, processes, expertise
What we heard: Industry! Need to encourage, develop programs for, private-sector engagement. Make funders happy Incoming stream of interesting applied research problems Opportunities for grad students Increase HPC adoption Many mechanisms suggested
What we heard: Structure National/Regional organization How will this work? Many examples of similar organizations Relationship need not be the same in every region But clarity essential
What we heard: New Tech Cutting edge technologies - hardware, software, etc Need leading edge for Researchers who want it now Evaluation for researchers unsure Need to help shape technologies of particular importance to Canadian research
Strat Plan: What we heard People Storage Ease of use/interoperability Funding New use cases Industry engagement National/Regional organization Cutting edge technologies
Strategic Plan Is being reflected in the Strategic Plan. Stronger emphasis on people, services Growth into broader use cases without sacrificing existing HPC users Emphasis on engaging with funders for predictable, infrastructure-like funding Emphasis on engaging w/ private sector R&D Negotiating clarity in role of nat l office, regions, sites will take time, but a goal
Researcher Needs Survey In the field, Oct 2013 Short (15 min) survey Questions about Current research tasks Current pain points, priorities Future growth Opt-in for further contact
Researcher Needs Survey Well-suited to more immediate management plan drafting Concrete, specific computing needs Will begin followup shortly
Goals of Survey Broad community results Identify groups for followup - to occur One-on-one interviews Deep dive into future needs in representative groups Develop a picture of what is needed by the community in coming 3-5 years
Complementary Data International surveys on academic researcher needs (XSEDE, PRACE) International surveys on commercial researcher needs (IDC, Intersect360) Ontario: ORION/Ontario Needs Assessment (~50-100 interviews) CC user surveys/feedback
Responses ~425 responses Overwhelmingly academic Results reflect: Cdn. SME Cdn Enterprise Int'l Enterprise Not for Profit Responses by Sector Relative lack of communications power outside academic circles Sector Other Hospital Prov Govt Consultation fatigue: CIHR, SSHRC, CFI were all in field simultaneously Federal Govt Academia 0 100 200 300 400 500 # Respondents
Text fields Asked about current research projects, data, analysis, current bottlenecks, problems with existing services/ resources desirable future services/resources
What we heard: Storage Storage Researchers wasting time shuffling, compressing, re-generating data More storage of various types needed Future needs growing rapidly
Aging Infrastructure Aging infrastructure was a repeated problem ~50% of respondents said they could do more, better research if they had more access to compute/storage Unreliability, related in part to aging infrastructure, also cited
Usage Barriers Lack of interactive access Long queue times Last-mile connectivity Uptime/reliability Lack of large-memory nodes Commercial software licensing
Sharing Many researchers would greatly like to enable web interfaces to their work Data sharing (secured or public) was repeatedly cited as a need
Support Very strong vote of confidence overall in support, training Many don t seem to know the breadth of support available Many people need help with automation, optimization Incoming grad students need much more basic training
General Research Area Arts And Letters Range of Engineering disciplines 2/3 still traditional physical science/hpc areas But rapid growth in other fields Health Sciences Natural Sciences Other Social Sciences Humanities 0 20 40 % Respondents
Where Computation is Done CC significant fraction of research computing Other $$ Cloud Abroad CC Collab Cluster (amongst respondents) Very little use of commercial cloud Significant use of independent resources Dept Cluster Group Cluster Dept Server Group Server Multiple PCs PC 0 20 40 60 80 % Respondents
Current Problems Other Current pain points Lack of Viz Unreliable Delays in Access Existing resources: Too slow/small Delays in access Lack of particular hardware/ software Reliability remains an issue Bad Interface Mismatched H/W Missing S/W Too Small Too Slow 0 10 20 30 40 50 % Respondents
Respondents ranking Priority in top 3 Consultancy Current priorities Traditional compute, storage, training are top three priorities Helpdesk Viz Hosting Collaboration Traditional Compute Web/Database Storage Training 0 20 40 60 % Respondents
Having additional digital expertise would be useful... 40 Training Almost all groups think that having additional expertise to bring to bear on their problems would be useful % Respondents 30 20 10 0 Always Sometimes Rarely Never
Training Almost all groups think that having additional expertise to bring to bear on their problems would be useful for a variety of reasons comput data run simul analyzlarger access storag better set resourc time system faster current softwar resolut analysi process competit higher research result improv intern much need product effici abl help model also effect allow expertis can limit make perform use dataset larg problem develop order scale tool visual addit avail experi get requir routin analys generat increas power space longer magnitud project support technic will work canada memori new understand code realist autom complex core group job now paramet program size studi mani parallel queue train algorithm calcul high hpc issu long physic processor public quick simultan across cluster continu explor file genom includ machin method might one possibl someon statist differ general move network node reliabl right sampl sophist step thing accur amount becom case cpu done exist experiment facil gpus greater implement interest like materi molecular optim potenti remot script site speed test user way westgrid applic area around bigger compar complet enabl even find focus futur hardwar librari main number obtain particular peopl point produc rather reduc share solut structur text theori thus within write appli collabor comparison databas demand design detail dimension field forc good great handl import inform lead less lot manag output part post procedur realli secur shorter signific store student various year add advanc approach appropri ask assembl atom benefit best bioinformat build challeng collect concern consid domain dynam effort engin essenti exampl factor give given gpu hard hous individu input instead intens interact keep lab level look maintain member numer oper packag pipelin provid question reason repres search see sequenc server singl solv stay take though transfer two variabl without world 100 abil actual alloc archiv backup barcod behind big bootstrap canadian capac carri cloud common concurr contain creat date difficult digit drive easili equip function fund graduat human hundr imag industri instal instanc interfac just know lack learn least life may mean mine minim multipl necessari next observ offer often open overal person platform present privaci probabl properti purpos quicker rapid real recent region scienc similar skill sometim sooner sourc specif spend substanti sure tackl techniqu technolog term thousand togeth tri turbul turn unfortun usag wait well Word cloud: what could accomplish with more advanced computing expertise
Satisfaction with CC Computational Resources 1... Very Dissatisfied 2 CC Compute 3 Users mostly satisfied or very satisfied with existing compute resources.. 4 5... Very Satisfied Not applicable 0 10 20 30 40 50 % Respondents
Compute Change, 5 years Increase >3x Increase <3x CC Compute Users mostly satisfied or very satisfied with existing compute resources.. But ~1/3 of users will need more than 3x as many resources to stay internationally competitive Same Decrease >3x Decrease <3x Unsure 0 10 20 30 40 % Respondents
Storage Currently Used Not sure Less than 1 TB 1 TB... 10 TB 10 TB... 100 TB 100 TB... 500 TB 500 TB... 1 PB Storage Usage Mostly fairly modest Mostly satisfied with CC storage where applicable More than 1 PB 1... Very Dissatisfied 0 10 20 30 % Respondents 2 Satisfaction with CC Storage Resources 3 4 5... Very Satisfied Not applicable 0 10 20 30 40 % Respondents
Storage Change, 5 years Increase >3x Increase <3x Storage Usage Mostly fairly modest Mostly satisfied with CC storage where applicable But will again need substantial increases to stay internationally competitive Same Decrease >3x Decrease <3x Unsure 0 10 20 30 40 % Respondents
Why Don't I Use Compute Canada Other Not using CC Good, in a way More disuse because wasn t aware (fixable) and don t need (yet) than unsuitable But still need to work on the unsuitable Specific Needs Too hard Don't Need Wasn't Aware 0 10 20 30 40 % Respondents
Next Steps: Survey Responses by Sector Next steps: Begin followup surveys Cdn. SME Cdn Enterprise Int'l Enterprise Not for Profit Continue push into underrepresented sectors Sector Other Hospital Prov Govt Compare with complementary data Federal Govt Academia 0 100 200 300 400 500 # Respondents
Next Steps Broadly Share SP draft in a couple of weeks Begin management plan as SP converges Incorporate Research needs survey into more concrete programs, needs
Future Researcher Consultation What should we do? ACOR New web page, updated with useful information, planning, etc. Annual town halls (+ town hall at HPCS?) Annual surveys?