Ann$Arbor$is$Already$Recognized$ $as$a$ Big$Data$Powerhouse $ Ann Arbor is one of top 5 US Cities for Big Data1 Ranking criteria: Data openness of city across 19 types of data Economic conditions for innovation in data science Volume: Number companies specializing in data processing, hosting and related services, per 100,000 residents. Velocity: Internet download speeds (megabits per second). Variety: % of the workforce employed in computer and mathematical occupations.ations. Ann Arbor is the most desirable small US city for Millenials2 Value: Better quality of life and more satisfying careers for future workforce. 1. smartasset.com, Aug 16 2015 2. American Institute of Economic Research, May 15 2015
URM$Data$Science$IniAaAve$(DSI)$ UM$CollaboraAng$Units$ Academic)Leadership)&)Engagement( COE,(UMMS,(LS&A,(SI,(SPH,(SON,( ISR,(UMBS,(others( Services)&)Infrastructure) ARC3TS,(CSCAR,(others( Michigan$InsAtute$for$Data$Science$ (MIDAS)$ 100+(U3M(Faculty(Affiliates((2015)( 20330(Exis7ng(U3M(Faculty(slots( 10(New(U3M(Faculty(slots( 4(Data(Science(Grand(Challenges( Data(Science(Methodologies(&( Analy7cs( Data(Science(Educa7on(&(Training( programs( Industry(Engagement( Data$Science$Services$ (CSCAR)$ Consul'ng)for) Database(Crea7on,( Prepara7on(&(Inges7on( Data(Visualiza7on( Data(Access( Data(Analy7cs( Data$Science$Infrastructure$ (ARCRTS)$ Hadoop,(SPARK( SQL,(NoSQL(databases( Analy7cs(Plaaorms( Integra7on(with(HPC(Flux( Plaaorm(
MIDAS$is$PosiAoning$the$URM$as$a$ NaAonal$Leader$in$Data$Science$ Research(3(Will(build(from(the(exis7ng(strong(Data(Science(( research(ac7vi7es(at(u3m,(bringing(addi7onal(visibility(and(impact( EducaAon(3(Will(result(in(training(a(new(category(of(scien7sts(and( professionals(to(sa7sfy(a(growing(market,(by(enhancing(exis7ng( curricula(and(adding(new(content( Services(3(Will(place(U3M(in(the(forefront(that(will(benefit(the( campus,(the(state(and(the(na7on( Industry$Engagement$ (Will(engage(local,(regional(and(na7onal( partnerships(with(big(data3oriented(companies(
Michigan$InsAtute$for$Data$Science$(MIDAS)$ http://midas.umich.edu/ Currently(have(100+(U3M(Faculty(Affiliates((Fall(2015)( Launching(Data(Science(Educa7on(&(Training(programs( Launching(Data(Science(Services((with(CSCAR)( Launching(industry(engagement(ac7vi7es( Will(fund(4(Data(Science(Grand(Challenges(in(201532016( Will(grow(to(30(core(faculty(over(the(next(two(years( 20(slots(for(exis7ng(U3M(faculty( 10(slots(for(recrui7ng(external(faculty(
Selected$MIDAS$Faculty$Leaders$ H.$V.$Jagadish,$PhD,(is( the(bernard(a.(galler( Collegiate(Professor( of((electrical( Engineering(and( Computer(Science.( Vijay$Nair,(PhD,(is(the( Donald(A.(Darling( Professor(of(Sta7s7cs( and(professor(of( Industrial(and( Opera7ons( Engineering.( Ivo$D.$Dinov,(PhD,(is(an( Associate(Professor(of( Nursing;(Faculty( Affiliate,(Center(for( Computa7onal( Medicine(and( Bioinforma7cs( MIDAS$DisAnguished$ScienAsts$ Associate(Director,((MIDAS( Educa7on(&(Training(( George$Alter,(Ins7tute(for(Social(Research( Brian$Athey,(Medical(School( SaAnder$Baveja,(College(of(Engineering( Ivo$Dinov,(School(of(Nursing,(Medical(School( Anna$Gilbert,(Literature,(Science,(and(the(Arts( Margaret$Hedstrom,(School(of(Informa7on( Al$Hero$III,(College(of(Engineering( HV$ Jag $Jagadish,(College(of(Engineering( Timothy$McKay,(Literature,(Science,(and(the(Arts( MIDAS$Management$Commieee$ Eric$Michielssen,(College(of(Engineering( Vijayan$Nair,(Literature,(Science,(and(the(Arts( Kerby$Shedden,(Center(for(Sta7s7cal(Consul7ng(and(Research( Kevin$Smith,(Medical(School( Peter$Sweatman,$Mobility(Transforma7on(Center( Jeremy$Taylor,(School(of(Public(Health( Jieping$Ye,(Medical(School(
Current MIDAS Jr. Faculty Members and the Future of Data Science Our$Students$
Data$Science$Training$and$EducaAon$ Ivo Dinov (MIDAS Assoc. Dir) Erin Shellman Data Science Patrick Education&Training Harrington New U-M programs Nandit launched Soparkar (2015) Rackham Data Science Certificate Undergraduate Major in Data Science Other activities Data Science Bootcamps Summer Schools, Online Training Understand principles of: data collection, access, security, preparation, management, Focus session on MIDAS Education visualization, and analysis Wide spectrum of disciplines involved math and statistics, engineering, natural science, health and life science, information science, social and political science; business, economics and law
URM$Data$Science$Challenge$Thrusts$ Learning$ AnalyAcs$ TransR$ portaaon$ Social$ Sciences$ Health$ Sciences$ Future$$ Challenge$ $Thrusts$ AnalyAcs$and$VisualizaAon$of$Complex$Data$ Machine$LearningRenabled$AnalyAcs$ Temporal,$MulARScale$and$StaAsAcal$Models$ IntegraAon$of$Heterogeneous$Data$ Data$Scrubbing,$Wrangling$and$Provenance$Tracking$ Data$Privacy$and$Cybersecurity$ Leveraging Data Science Services & Infrastructure
URM$Data$Science$Challenge$Thrusts:$ Crosscuhng$methodologies$ Analytics and Visualization of Complex Data networked single-user and collaborative visualization of massive multi-modality Carley datasets. Machine Learning-enabled Analytics Machine learning methods such as anomaly detection, dictionary learning, reinforcement learning, similarity learning, and transfer learning must be scalable to massive Nowak, data Murphy, scales. McKeown Temporal, Multi-Scale and Statistical Models Mathematical, computational and statistical models are needed to integrate multimodal data collected at many different time and Murphy length scales. Integration of Heterogeneous Data Integration of numerical data, symbolic data, structured data, and streaming data at various stages of the analysis pipeline. Carley, Murphy Data Scrubbing, Wrangling and Provenance Tracking Automation of data preparation steps such as normalization, calibration, outlier treatment, Soparkar and annotation. Data Privacy and Cybersecurity The tradeoffs between data privacy/security and data utility must be understood in the context of the specific application, e.g., medicine, Goroff transportation, or business analytics, throughout the data storage, management, and analysis pipeline.
MIDAS$Health$Challenge$ BioRbehavioral$Outcomes$$ Integrated$personal$omics$profiling$ Murphy Pervasive$wearable$ Health$Domain$ health$sensors$ ExperAse$ (MED,(SPH,(SoN,( Environment$ Pharmacy,(Den7stry,( LS&A,(LSI,(CoE)( Demographics$ PredicAve$analyAcs$for$ Poste personalized$health$and$medicine$ Cancer,$Obesity,$Diabetes,$$ Alzheimer s$disease,$ $ Security$&$ Privacy$ExperAse$ ( (EECS.(ISR)( MIDAS$ Methodology$ ExperAse$$ (EECS,(SPH,(DCMB,( IOE,(SI,(Math,( Sta7s7cs )( Data$deRidenAficaAon$and$privacy$ Goroff
MIDAS$Learning$AnalyAcs$Challenge$ MulAmodal$ assessment$of$ learner$ outcomes$ Saxberg Personalized$educaAon$ at$scale$ Saxberg Learning$Sciences$ Domain$ExperAse$ (UMSI,(SOE,(LSA)( PredicAve$ modeling$ and$expert$ advising$ UM:$EducaAon$at$Scale$ MulAmodal$capture$of$ learning$behavior$$ Social$network$characterizaAon$ and$intervenaon$$ Privacy$&$Data$ Handling$ExperAse$( (ISR,(SPP,(EECS)( Development$of$Big$Data$ enabled$teachers$and$learners$ Saxberg MIDAS$ Methodology$ ExperAse$$ (SI,(SPH,(SPP,( Sta7s7cs,(Math( EECS)(
MIDAS$Social$Science$Challenge$ InformaAon$security$and$privacy$ Goroff Business$analyAcs,$and$ targeted$markeang$ InsAtute$for$Social$Research$$ MediaRdriven$socioReconomic$ predicaon$ Carley Data$aggregaAon:$posAngs,$$ social,$economic,$demographic$ Privacy$&$Data$ Handling$ExperAse$ ( (ISR,(SPH,(EECS)( Carley Social$Domain$ ExperAse$ (ISR,(LS&A,(Ross,( SSW)(( MIDAS$ Chinnam, Harringto Social$network$ dynamics$ Carley Methodology$ ExperAse$$ (EECS,(IOE,(SI,(SPH( LS&A )( SocialRmedia$survey$analyAcs$
MIDAS$TransportaAon$Challenge$ TransportaAon$data$ecosystems$ for$connected$vehicles$ Owen Mcity:$A$32RAcre$Outdoor$Lab$ TransportaAon$ Domain$ExperAse$ (MTC,(UMTRI)( AutomoAve$ data$analyacs$ Owen Freight$data$ analyacs$ AutomoAve$cybersecurity$ for$connected$vehicles$ Accident$and$safety$data$analyAcs$ DataRanalysis$for$mass$transit$ Security$&$ Privacy$ExperAse$ ( (EECS)( Owen MIDAS$ Methodology$ ExperAse$$ (EECS,(ME,(IOE,(SI,( Math,(Sta7s7cs )(
Staging(of(Challenge(Thrust(Proposals( Timeline$ Fall(2015( Challenge$Thrust$ Transporta7on,(Learning(Analy7cs( Winter(2016( Personalized(Medicine(and(Health,(Social(Sciences( Fall(2016( Transporta7on,(Learning(Analy7cs( Winter(2017( Personalized(Medicine(and(Health,(Social(Sciences( MIDAS plans to fund a total of 8 proposals in the next 2 years Evenly split over the 4 challenge thrusts Multi-disciplinary teams Funded over 3 years New Challenge Thrusts will be launched In Year 3 and beyond
University$Data$Science$Efforts$are$Launching$ Widely$ MIDAS How can MIDAS Efforts Stand Apart?
The$OpportuniAes$for$Partnerships$are$Limitless$ http://socr.umich.edu/docs/bd2k/bigdataresourceome.html
Data$About$Data$ScienAsts$ O Reilly 2015 Data Science Salary Survey https://www.oreilly.com/ideas/2015-data-science-salary-survey
MIDAS $Value$ProposiAon$for$Industry$Partners$ 1. Education and recruiting Contact$MIDAS.umich.edu$ Help shape UM data science education/training programs Provide employee continuing education: degree, non-degree, short course Industry-student interaction via DS Certificate project sponsorships, internships Student scholarships (Northrup-Grumman has endowed one of these already) 2. Access to DS experts Machine learning, statistics, signal processing, data mining, database management, security, etc. 3. R&D partnerships Involvement in challenge thrusts as sponsor, data provider, etc. Industry partnering on MIDAS extramural grant proposals 4. Broader visibility Participation or sponsorship of seminar series, workshop or short course
Henry(Kelly MIDAS(Industry( Partnership(Leader( Chief Scientist, Energy Policy and Systems Analysis (EPSA), US Department of Energy Senior Advisor to the Director of the White House, Office of Science and Technology Policy (7/2012-8/2013) Principal Deputy Assistant Secretary for Energy Efficiency and Renewable Resources, US Department of Energy (7/2008-7/2012) President, Federation of American Scientists (FAS) (6/2000-6/2008) Assistant Director for Technology, Office of Science and Technology Policy (1993-2000) Acting Associate Director for Technology, Office of Science and Technology Policy (10/96-11/97) Dr. Kelly will manage NSF MWBH Activities and support building novel industry partnerships on behalf of MIDAS
Data$Science$Services$and$IT$Infrastructure$ Data$Science$Services$ (CSCAR Center$for$StaAsAcal$ConsulAng$and$Research)$ Consul'ng)for) Database(Crea7on,(Prepara7on(&(Inges7on( Data(Visualiza7on( Data(Access( Data(Analy7cs( Advanced(Geographic(Informa7on(Systems((GIS+)( $ Data$Science$Infrastructure$ (ARCRTS)$ Haddon,(SPARK( SQL,(NoSQL(databases( Analy7cs(Plaaorms( Integra7on(with(the(Flux(HPC(Plaaorm(
MIDAS$ARC$Cloud$CompuAng$and$Data$ Infrastructure$Vision$ User analysis User data Applications/ Flux Computing Systems Tools and Visualization Public Datasets/ Michigan Data Sets
UM$Data$Science$$IniAaAve$Working$Group$ (N=$32)$ Hosted by UMOR Office of Advanced Research Computing (ARC); Michielssen Co-Chaired by H.V. Jagadish and Brian Athey Engineering (n=7): Jagadish, (CSE), Singh (CSE), Hero (ECE), Jin (IOE), Powell (Aero) Med School (n=4): Athey (DCMB/Psych), Ayanian (IHPI/Int. Med), Friedman (LHS) LS&A (n=8): Nair (Statistics), Miller (Astronomy), Gilbert (Math), Smith (EEB), Evrard (Physics) School of Information (n=6): Finholt, Mei, Lagoze, van Houweling School of Nursing (n = 1): Dinov School of Public Health (n=2): (SPH): Abecasis, Little ISR (n=4): Alter (ISR/LSA), Gonzales (ISR/LSA), Pasek (ISR/LSA)