!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23



Similar documents
Ann$Arbor$is$Already$Recognized$ $as$a$ Big$Data$Powerhouse $

RFI Summary: Executive Summary

Collaborations between Official Statistics and Academia in the Era of Big Data

Learning outcomes. Knowledge and understanding. Competence and skills

Predictive Analytics Certificate Program

Big Data Challenges: Data Management, Analytics & Security

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

ANALYTICS CENTER LEARNING PROGRAM

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Statistics for BIG data

Proposal for Undergraduate Certificate in Large Data Analysis

Course Requirements for the Ph.D., M.S. and Certificate Programs

OUTLINE FOR AN INTERDISCIPLINARY CERTIFICATE PROGRAM

COMP9321 Web Application Engineering

Data, Measurements, Features

Big Data and Healthcare Payers WHITE PAPER

Information Management course

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

Introduction to Data Mining

Doctor of Philosophy in Computer Science

An interdisciplinary model for analytics education

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Big Data and Analytics: Challenges and Opportunities

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data Analytics. Theory Marks

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA)

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Computational Science and Informatics (Data Science) Programs at GMU

Principles of Data Mining by Hand&Mannila&Smyth

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Master of Science in Computer Science

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Master of Science in Health Information Technology Degree Curriculum

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

An Overview of Knowledge Discovery Database and Data mining Techniques

Graduate Certificate in Systems Engineering

GRADUATE CATALOG LISTING

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

An Introduction to Data Mining

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Introduction to Data Mining

Information Visualization WS 2013/14 11 Visual Analytics

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Health Informatics Student Handbook

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

Master of Science (MS) in Biostatistics Program Director and Academic Advisor:

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Why is Internal Audit so Hard?

How To Get A Computer Science Degree At Appalachian State

The Scientific Data Mining Process

Data Mining and Machine Learning in Bioinformatics

Information and Decision Sciences (IDS)

2015 Workshops for Professors

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

I. Justification and Program Goals

Scholarship for Service (SFS) PhD with Information Assurance (IA) Concentration*

MS1b Statistical Data Mining

National and Transnational Security Implications of Big Data in the Life Sciences

Statistical Models in Data Mining

Data Mining + Business Intelligence. Integration, Design and Implementation

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Data Science, Predictive Analytics & Big Data Analytics Solutions. Service Presentation

Data Science Certificate Program

Big Data Text Mining and Visualization. Anton Heijs

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Data Mining Solutions for the Business Environment

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Database Marketing, Business Intelligence and Knowledge Discovery

Winter 2016 Course Timetable. Legend: TIME: M = Monday T = Tuesday W = Wednesday R = Thursday F = Friday BREATH: M = Methodology: RA = Research Area

Predictive Big Data Analytics: Imaging-Genetics Fundamentals, Research Challenges, & Opportunities. Outline

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

University of Manchester Health Data Science Masters Modules

Proposal for New Program: BS in Data Science: Computational Analytics

Transcription:

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23 Ivo D. Dinov Assoc. Director MIDAS Ed & Training Erin Shellman Research Scientist AWS Patrick Harrington Co-Founder Chief Data Scientist CompGenome.com Nandit Soparkar CEO, Ubiquiti

MIDAS Data Science Education & Training: Challenges & Opportunities Ivo D. Dinov Associate Professor, School of Nursing Associate Education Director, Michigan Institute for Data Science (MIDAS) University of Michigan http://midas.umich.edu/education

4%&'()%5*6'.*7%&%*8$'1)$1** 9#--'$#5%*9()2&155%&'()3 MIDAS Recently established Data Science institutes and curricular programs!

:;7<8*6'.*7%&%*!$(2=2&103 MIDAS http://socr.umich.edu/docs/bd2k/bigdataresourceome.html!

:;7<8*>($#2*()*71?15(@').*6'.*7%&%*8A'5523 o! Listening: streams, analyzing sentiment, intent and trends; o! Looking: searching, indexing and memory management of heterogeneous datasets; Raw, derived or indexed data as well as meta-data; o! Programming: Handling Map-Reduce/HDFS, No-SQL DB, protocol provenance, algorithm development/optimization, pipeline workflows; o! Inferring: Principles of data analyses, Bayesian modeling, inference, uncertainty and quantification of likelihoods; Reasoning & logic; Analytics: Regression, feature selection, dimensionality reduction, temporal patterns, validation; Actionable Knowledge o! Machine Learning: Classification, clustering, mining, information extraction, knowledge retrieval, decision making; o! Predicting: Forecasting, neural models, deep learning, and research topics; o! Summarizing: Presentation of data, processing protocol, analytics provenance, visualization, synthesis

8@1$'B$*7%&%*8$'1)$1*,-%')').*9C%551).123 o! Technological advances and students IT skills far outpace educators technological expertise and the current rate of IT adoption in college curricula o! Lack of open learning resources and interactive interoperable platforms o! Difficulty sharing data, tools, materials & activities across different disciplines o! Limited technology-enhanced continuing instructor education opportunities o! Discipline-specific knowledge boundaries o! Skills for communication and teamwork involving cooperation in dynamic groups of dispersed researchers o! Ability to aggregate & harmonize data (e.g., fusion of qualitative and quantitative elements).

9(-1*:;7<8*!"#$%&'()*9(0@()1)&23 o! Drivers: Motivations, datasets (6D of Big Data), challenges, applications! o! Methods: foundational and trans-disciplinary scientific techniques! o! Tools: web-services, software, code, platforms! o! Analytics: practice of data interrogation!

!D'2&1)&*7%&%*8$'1)$1*!"#$%&'()*+*,-%')').3 o! Undergraduate DS Degree Program! o! Stats + Engineering! o! Started Fall 2015! o! www.eecs.umich.edu/eecs/undergraduate/datascience! o! DS Summer Institute (Public Health)! o! 20+ faculty, 40+ trainees! o! Full support/in residence (1 month)! o! BigDataSummerInst.sph.umich.edu! o! Big Data Summer Bootcamp! o! 5 years running, Business- Engineering! o! Practice of Econ-Bio-Social Analytics! o! ibug-um15.github.io/2015-summercamp! o! Graduate DS Certificate Program! o! Transdisciplinary (SM,SPH,LS&A,SN,CoE,SI)! o! 12 cr, Modeling+Technology +Practice! o! http://midas.umich.edu/certificate!

:;7<8*!"#$%&'()*+*,-%')').*EF(').*>(-G%-"H 3 o! Graduate Data Science Certificate Program! o! Enrollment of 100 (UMich) students! o! Develop the Online Graduate Data Science Certificate Curriculum! o! Develop a 5-yr dual undergrad-grad MS Program in Data Science! o! DS Summer Institute (Public Health)! o! Big Data Summer Bootcamp! o! MIDAS Seminar Series (academia, industry, government, partners)! o! Web-streamed and archived for broader impact! o! National Events! o! BD2K, Midwest Big-Data-Hub, Math, Stats, Computer Vision, Machine Learning, Informatics,! o! Funding: NIH, NSF, Foundation: Research & Traineeship Applications!

:'$C'.%)*7'I1-1)$1*')*3 7%&%*8$'1)$1*!"#$%&'()*+*,-%')').3 o! Public-Private-Partnership (PPP) Model! o! Integration of Research + Practice + Education! Industry Community UMich Fed Orgs o! Problems + Modeling + Technology + Applications! AtheyLab!

8&%&'2&'$2*J)5')1*9(0@#&%&'()%5*K12(#-$1*E8J9KH* ;)&1.-%&'()*(L*!"#$%&'()M*K121%-$C*+*/-%$&'$13 o! Probability and Statistics EBook! o! Motivation, Methods, Techniques, Practice! o! wiki.socr.umich.edu/index.php/ebook! o! Data sets! o! 100 s of Research, Observed & Simulated! o! wiki.socr.umich.edu/index.php/ SOCR_Data! o! Hands-on Activities (Team-focused)! o! Modeling, Inference, Concept Demos! o! wiki.socr.umich.edu/index.php/socr_edumaterials! o! Applets and Webapps! o! Over 500 Web tools! o! Distributed services based on Java, HTML5/ Features: Free, Cloud-service, no-barrier access, LGPL/ JavaScript! CC-BY! Ref: Dinov et al., TS (2013), DOI: 10.1111/test.12012! SOCR has over 9.6M users worldwide since 2002! www.socr.umich.edu!

:;7<8N8J9K*7%2CO(%-"P*6'.*7%&%*>#2'()*!D%0@513 http://socr.umich.edu/html5/dashboard!

6'.*7%&%*<)%5=&'$23 http://socr.umich.edu/html5/dashboar o! Web-service combining and integrating multi-source socioeconomic and medical datasets! o! Big data analytic processing! o! Interface for exploratory navigation, manipulation and visualization! o! Adding/removing of visual queries and interactive exploration of multivariate associations! o! Powerful HTML5 technology enabling mobile on-demand computing! Husain, et al., 2015, J Big Data!

:;7<8*J)5')1*7%&%*8$'1)$1*!"#$%&'()*/-(.-%0*E78!/H3 MIDAS plans to: o! Develop new hands-on MOOC DS courses (data, learning modules, services, code snippets, applications/partners) o! Deploy MIDAS Training as a Service (TaaS) MOOC platform o! Compile Datasets (research-derived, observational, simulated) identify, aggregate, manage, navigate, and service exemplary Big Data sets o! Faculty/Mentors instructors, collaborators, partners, students and staff to develop a core DS course-series (4 courses) o! Instructional resources and complete end-to-end learning modules o! Track and Validate the DSEP Program annual reports (MIDAS ETC), review of evaluations, sustainability (financial, resources, space), impact

How to Engage & Partner in MIDAS Education? Partners! o! Open-Source Community! o! Academia! o! Trainees! o! Industry! o! Non-profit! o! Government! o! Philanthropy! o! Community Orgs! Activities!! Drivers! o! Challenges! o! Datasets! o! Applications! Methods!! Technologies!! Tools/ Services!! Training Opps! o! Mentorship! o! DS Projects! o! Internships!

:;7<8*!"#$%&'()*+*,-%')').*,1%0 3 Engineering MIDAS Education & Training Committee Ivo Dinov, Margaret Hedstrom, Honglak Lee, Sebastian Zöllner, Richard Gonzalez, Kerby Shedden Other Contributing MIDAS Faculty Alfred Hero: Electrical Engineering and Computer Science; Biomedical Engineering, Stats H. V. Jagadish: Electrical Engineering and Computer Science Mike Cafarella: Computer Science and Engineering Karthik Duraisamy: Atmospheric, Oceanic, and Space Sciences Judy Jin: Industrial & Operations Engineering Dragomir Radev: School of Information; Computer Science and Engineering; Linguistics Information & Health Sciences Brian Athey: Computational Medicine and Bioinformatics, Medicine Carl Lagoze: School of Information Qiaozhu Mei: School of Information Jeremy Taylor, Biostatistics, Public Health Basic Sciences Vijay Nair: Statistics & Engineering George Alter: Institute for Social Research, History Christopher Miller: Physics & Astronomy August Evrard: Physics & Astronomy Anna Gilbert: Mathematics, Engineering Stephen Smith: Ecology and Evolutionary Biology Ambuj Tewari: Statistics; Computer Science and Engineering Contact http://midas.umich.edu! midascontact@umich.edu! Dinov@umich.edu

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23 Ivo D. Dinov Assoc. Director MIDAS Ed & Training Erin Shellman Research Scientist, AWS Nandit Soparkar CEO, Ubiquiti Patrick Harrington Co-Founder/Chief Data Scientist CompGenome.com

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23 Ivo D. Dinov Assoc. Director MIDAS Ed & Training Erin Shellman Research Scientist AWS Patrick Harrington Co-Founder Chief Data Scientist CompGenome.com Nandit Soparkar CEO, Ubiquiti

6'.*7%&%*8$'1)$13 From a descriptive to a constructive definition of Big Data! IBM s 4V s volume, variety, velocity (speed) and veracity (reliability)! MIDAS Big Data Characterization identifies gaps, challenges & needs Mixture of quantitative! & qualitative estimates! Dinov, et al. (2014)! Size Expected 2016 Increase 2018 Incompleteness Example: analyzing observational data of 1,000 s Parkinson s disease patients based on 10,000 s signature biomarkers derived from multi-source imaging, genetics, clinical, physiologic, phenomics and demographic! data elements.!! Software developments, student training, service platforms and methodological advances associated with the Big Data Discovery Science all present existing opportunities for learners, educators, researchers, practitioners and policy makers!

Q-="1-R2*5%GP*!D@()1)&'%5*F-(G&C*(L*7%&%*3 6E+15 Walter, Sci Am, 2005! storage doubles every 1.2 yrs! Computational power! double each 1.6 years! 4E+15 2E+15 0 1 µm 15000000 10000000 5000000 0 10 µm 100 µm Increase of Image Resolution! " Time "! 1mm 1cm Cryo_Color Cryo_Color Cryo_Short Gryo_Byte Gryo_Byte Cryo_Short Moore s Law (1000'sTrans/CPU) Genomics_BP(GB) Neuroimaging(GB) Data volume! Increases faster! than computational power! Dinov, et al., 2014! Neuroimaging(GB) Genomics_BP(GB) Moore s Law (1000'sTrans/CPU)

7%&%*8$'1)$1*,-%')').*/-(.-%0P*9(-1*9#--'$#5#03 Themes! Training! Examples! Data Scrubbing! Data Mining! Data Inference! Big Data Management! Big Data Representation! Mining Social- Network Graphs! Modeling! Link Analysis! Clustering! Classification! Dimensionality Reduction! Big Data technology, Searching, indexing, memory management, Information extraction, feature selection, Supervised and unsupervised-learning, stream mining Matrix Representation of Sets, Minhashing, Jaccard Similarity, Distance Measures, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance, Networks, Graph similarity, Sets as Strings, Prefix Indexing, Data Streams and Processing, Representative Samples, Bloom Filter, Flajolet-Martin Algorithm, TF/IDF, MoM, MLE, Alon-Matias-Szegedy Algorithm for Second Moments, Datar-Gionis-Indyk-Motwani Algorithm (DGIM)! Bio-Social Network Graphs, Root-Mean-Square Error, UV-Decomposition, The NetFlix Challenge, Tweeter Challenge, Distances and Clustering of Social-Network Graphs, Network Cluster Betweenness, Complete Bipartite Subgraphs, Graph Partition, Matrices and Graphs, Eigen-values/ vectors of the Laplacian Matrix, Graph Neighborhood Properties, Diameter of a Graph, Transitive Closure and Reachability, Crowdsourcing Algorithms Statistical Modeling, Machine Learning, Computational Modeling, Feature Extraction, Power Laws, Map-Reduce/Hadoop! PageRank, Spider Traps, Taxation and loops in Big Data traversal, Random Walks! Curse of Dimensionality, Classification and Regression Trees/Random Forests, Hierarchical Clustering, K-means Algorithms, Bradley, Fayyad, and Reina (BFR) Algorithm, CURE Algorithm, Clusters in the GRGPF Algorithm! Random Forest, SVM, Neural Networks, Latent Class Models, Finite Mixture Models! Eigenvalues and Eigenvectors, Principal-Component Analysis, Singular-Value Decomposition, Wrapper-Based vs Filter-Based Feature Selection, Independent Components Analysis, Multidimensional Scaling, CUR Decomposition!

BD I K A Big Data Information Knowledge Action Raw Observations Processed Data Maps, Models Actionable Decisions Data Aggregation Data Fusion Causal Inference Treatment Regimens Data Scrubbing Summary Stats Networks, Analytics Forecasts, Predictions Semantic-Mapping Derived Biomarkers Linkages, Associations Healthcare Outcomes