How do we train Data Scientists and Data Engineers? Eric Rozier Asst Prof of EECS at the University of Cincinnati Faculty Mentor DSSG at the University of Chicago
Training the Next Generation of Data Scientists Focus on two main programs: Summer 3 month intensive program DSSG Normal year curriculum development to support in class hands-on experiences
Data Science for Social Good
The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship http://dssg.uchicago.edu @datascifellows
Data Science for Social Good @datascifellows Intro to DSSG
What is DSSG? 40-50 Fellows in teams of 3-4 Experienced Mentors 12 weeks in Chicago Impactful problems with nonprofit & govt partners Data Science for Social Good Fellowship Data Science for Social Good @datascifellows
Goals of the Fellowship Train data scientists who care about and understand how to solve social problems Expose and train governments & non profits to use data to make better decisions Seed a community of people and organizations working together to make social impact Create open source data science tools that are targeted at the needs of high impact social problems Data Science for Social Good @datascifellows
By the Numbers 2013 2014 36 Fellows 48 Fellows 6 Mentors 12 Weeks 12 Projects 8 Mentors 12 Weeks 14 Projects Data Science for Social Good @datascifellows
Ideal Fellows Problem Formulation Computer Science & Programming Statistics & Machine Learning Communication Making an Impact with Data Econometrics & Social Science Methods Experimental Design Databases Data Science for Social Good @datascifellows
2013-2014 Fellows ~1000 Applicants 40 countries ~250 Universities 84 fellows Computer Scientists CMU U. of Chicago Northwestern Harvard MIT Stanford ITAM Cornell Yale Villanova Ohio State USC U Penn Notre Dame U of Minnesota U of Michigan Cambridge McGill UC Berkeley U of Colorado Swarthmore Oberlin UIUC Emory Duke Fordham Johns Hopkins IIT SAIC NYU Penn State Simon Fraser UC Santa Barbara Statisticians Economists Public Policy (and other computational and quantitative fields) Data Science for Social Good @datascifellows
Breadth of Projects Partners: Non-Profits, Government Agencies, Corporations with a Social Mission Geographies: Local, State, National, and International Types of Problems: Impact Evaluation, Targeting, Risk Modeling, Types of Data: Structured data, geospatial data, time series, text data, network data Data Science for Social Good @datascifellows
Variety of Project Areas Health Energy Education Economic Development Corruption Federal Budgeting Home inspection data Smartmeter data Education records Administra -tive data Contract data Congressional bills Predicting lead poisoning Reducing energy use via disaggregatio n Predicting high school dropout Targeting and assessing urban revitalization Detecting collusion Identifying earmarks Data Science for Social Good @datascifellows
Data Science for Social Good @datascifellows 2014 Project Partners
Sample Projects Improving high school graduation rates by identifying at-risk students early Increasing government transparency by identifying earmarks Developing new strategies to reduce maternal mortality Preventing Lead Poisoning by proactive home inspections and health check-ups Data Science for Social Good @datascifellows
Prediction Saves Time & Money No Prediction Current Model Model Forecast Buildings: 197,157 Time: 76 years Money: $98 million Buildings: 42,695 Time: 16.4 years Money: $21.3 million Buildings: 378 Time: 2 months Money: $189,000 The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
At Risk Children Even without detailed child-level features, there are strong, sanity-checked, predictioncapable patterns Lead Levels During Childhood The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
Target: Prediction From Birth The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
Target: Prediction From Birth The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
The Tool The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
Who we re looking for? Fellows Mentors Partners Data Science for Social Good @datascifellows
Expertise in one or more of the following ares Computer Science Statistics Public Policy Social Science Other Quantitative or Analytical Areas Some coding experience Passion for making a social impact Problem solving (critical thinking) experience Enjoy working on a team Fellows Data Science for Social Good @datascifellows
Mentors Deep expertise in computer science, machine learning, statistics, or social sciences Ben Yuhas Principal, Yuhas Consulting Group Eric Rozier Assistant Professor of Electrical and Computer Engineering Experience working on real problems in industry Experience leading teams and managing projects Data Science for Social Good @datascifellows Kate Cagney Sociology & Health Studies Director, Population Research Center U Chicago Joe Walsh Lead Forecaster for GE Healthcare & Policy Consultant
Project Partners Organizations that 1. Have an interesting social-impact problem to solve 2. Have data that can help solve it 3. Have a desire to put our work into action Especially interested in longer-term collaborations beyond 12 weeks of the fellowship Governments / Government Orgs Foundations Non-Profits Research Institutions Data Science for Social Good @datascifellows
Application deadlines: Fellows: Feb 1, 2015 Mentors: Feb 1, 2015 Partners: Jan 10, 2015 Get Involved! Applications & more info: http://dssg.uchicago.edu Or email: datasciencefellowship@ci.uchicago.edu Data Science for Social Good @datascifellows
Data Science in the Curriculum with the Digital Observatory
The Data Deluge
The Data Deluge Big Data education suffers from similar challenges. How do we help students drink from the fire hose?
Big Data and the Curriculum Big Data is putting pressure on the curriculum Not just CS/ECE: Business, Finance, Social Science, Economics, Biology, Medicine, Public Policy NIH has held several meetings on Big Data education. Wants to integrate Big Data/Data Science into the regular curriculum.
NIH Conclusions Teach from case studies Proper training should include hands on experience with real data. Use and study of cutting edge: Tools Techniques
NIH Conclusions Teach from case studies Proper training should include hands on experience with real data. Use and study of cutting edge: Tools Techniques
NIH Conclusions Train Data Scientists to work as team members. The team is one of the most important parts of real data science applications. Emphasize multidisciplinary teams.
New Ways of Thinking Get students used to the pace of change, thinking exponentially
New Ways of Learning vs
Active Learning After 2 weeks we tend to remember: Passive learning 10% of what we read 20% of what we hear 30% of what we see 50% of what we hear and see Active learning 70% of what we say 90% of what we say and do
Bloom s Taxonomy Evaluation Synthesis Analysis Application Comprehension Knowledge
Three Pronged Approach Reading, presenting, and discussing current state of the art. Hands on study with real data. Original research in the field.
Involving Partners
Involving Partners
Creating a Classroom Around a Digital Observatory
Telescopes for Big Data
Transitioning a DSSG like Environment to the Year Identify a smaller number of partners to work with larger groups on a longer time scale. Understand that our expectations need to be tempered Summer exclusive, competitive program with international recruitment Year drawn from, admittedly excellent, student body at large, motivation may be lower.
Frontiers of Data Science Class Several published papers resulting from the class. Mixed undergrad and graduate, interdisciplinary environment. Awarded Frontiers of Engineering Education by the NAE
Growth of the Course First year 8 students Electrical Engineers, Computer Engineers, Computer Scientists, Environmental Scientists, Economists
Growth of the Course First year 8 students Electrical Engineers, Computer Engineers, Computer Scientists, Environmental Scientists, Economists Second year 14 students More industry involvement
Developing Scalable Infrastructures
Developing Scalable Infrastructures Understand the financial limitations of the classroom Develop resources which can be leveraged for research and curriculum, a practical curriculum based on real experience will have similar needs anyway!
The Need for Practice in the Academy We need to train ourselves in Data Science to teach it. Many faculty haven t had real industrial experience with Data Science. The field and practice is changing fast. Encourage the development of Data Science workshops, boot camps, and summer programs for faculty as well as students.
More information http://dssg.io http://dataengineering.org