Data Science Initiative Joint Research Committees Meeting October 27, 2014 Terri L. Lomax Vice Chancellor Research, Innovation + Economic Development
Data Science is a Big Deal Managing, processing and exploiting data to improve decision making will continue to grow in importance
Key Insights: after McKinsey (May 2011) From McKinsey (May 2011): 1. Data have swept into every industry and business function and are now an important factor of production. 2. Big Demand Data creates for value deep but analytical ONLY if appropriate positions and could advanced exceed Data Science the supply is used by to 140,000 deal with Big to Data. 190,000 positions in 3. Use 2018. of Big Data and Data Science will become a key basis of competition, The need growth for additional and pro-active managers risk management and analysts for individual in firms. 4. Big the Data US underpins who can new ask waves the of right productivity questions growth and consumer surplus. consume the results of data analysis is estimated 5. Big at Data 1.5 will million. matter across sectors, but some sectors are poised for greater gains. In the short-term, retraining existing talent will be 6. There is already a shortage of talent necessary for organizations to take required to meet demand. advantage of Big Data.
Big data can generate significant financial value across sectors Source: McKinsey Global Institute Analysis
Data Science is multidisciplinary Source: Data Science Research Center, Amsterdam (http://dsrc.nl/ what-is-data-science/) Large Scale Databases Distributed Processing SoBware Engineering Store and Process Decision Theory System/ Network Engineering Visual Analy3cs Understand and decide Business Analy3cs Security Privacy Provenance Informa3on Retrieval Percep3on Cogni3on Reasoning Analyze & Model Machine Learning Knowledge Represent n Mul3media Retrieval Modeling & Simula3on
Research Triangle - Data Science Powerhouse LAS
Data4Decisions unique concept is guided by the event s Advisory Council, a powerful compendium of the region s leading research universi3es, private companies and thought- leading Research Triangle- based associa3ons:
NC State Data Science-Related Centers + Institutes New: NSF I/UCRC on End- to- End Enablement of Data
Laboratory for Analytic Sciences (LAS) NSA s goal is to build an advanced data innovation hub in the Research Triangle with LAS as the anchor tenant
CHANCELLOR S FACULTY EXCELLENCE CLUSTERS
Data Science Education at NC State Institute for Advanced Analytics PSM, primarily SAS tools COE/PCOM executive education for based on opensource tools CSC graduate track in Data Science CSC/Stat/Math undergraduate concentration in Data Science PCOM executive education based on IBM tools UNC GA Research Opportunities Initiative Data Science Institutionalize NC State s Data Science Initiative (together with UNC Charlotte and RENCI)
Data Science Infrastructure at NC State Portions of VCL-HPC facilities (large memory machines) run Linux-based, user-provided analytics CSC MRC VCL-BigData testbed (x86 and IBM Power7 and Power8 computers with lots of memory, tightly coupled storage, and advanced accelerators) run IBM Analytics NCBP-VCL cluster lots of memory and disk space runs SAS analytics IAA VCL facilities primarily runs SAS analytics OSCAR lab - Extensive BiGData and Data Science computational and data storage facilities, including a BlueGene/P supercomputer, also LAS low lab.
NC State Data Science Initiative Goals Raise visibility & increase reputation Coordinate data science activities, including education Increase research funding Build industry partnerships Establish interdisciplinary undergraduate curriculum Provide services & infrastructure to faculty Organizational Structure Director / Assistant Coordinating Council Steering Committee External Advisory Committee
Data Science Initiative Coordinating Council (formative stage) COE Mladen Vouk CSC Director Dan Stancil ECE Jerry Bernholc CHIPS Michael Young DGRC James Lester CEI Paul Turinsky CASL Jacob Jones AIF Dennis Kekas ITng Yousry Azmy CNEC Rada Chirkova new I/UCRC (STEED Lab, CHMPR) Provost Michael Rappa IAA ORIED Otis Brown NCICS COS Montse Fuentes Stat Marie Davidian CQSB Alyson Wilson Cluster, LAS John Blondin Phys Loek Helminck Math Tom Banks CRSC Fred Wright Bioinformatics PCOM Mike Kowolenko CIMS CED Glenn Kleiman WIFIEI CNR Ross Mietenmeyer GSA CHASS Carolyn Miller Dig. Humanities
Summary Managing and extracting information from complex data sets continues to grow in importance in most all sectors of the US economy The Research Triangle has significant programs in data science that will be leveraged for future growth Importance of data science recognized by all levels and types of industry, government and academia Working together at NC State, we can: capitalize on multidisciplinary opportunities, build significant programs, and educate the skilled workforce to maintain our national leadership in analytics and data science
Data to Knowledge Terri L. Lomax research.ncsu.edu
Gap Analysis for Data Science Cluster Proposal Physical models Social models Symbolic models Numerical solvers HPC, HPD, OS So=ware architectures Storage/Index/Access Privacy Security StaDsDcal methods Discrete mathemadcs AI/Knowledge Mgmt Database Data integradon Natural Language 1 2 3 4 Coverage: Weak LiIle Strong
Current Gap Analysis for Data Science Cluster (Oct 2014) Physical models Social models Symbolic models Numerical solvers HPC, HPD, OS So=ware architectures Storage/Index/Access Privacy Security StaDsDcal methods Discrete mathemadcs AI/Knowledge Mgmt Database Data integradon Natural Language 1 2 3 4 Coverage: Weak LiIle Strong
CSC Data Science Grants current & recent Ensemble and Comparative Visualization of Scientific Datasets (Sandia, Christopher Healey) Computer-aided Human Centric Cyber Situation Awareness (Penn State, Peng Ning; Michael Young) Runtime System for I/O Staging in Support of In-Situ Processing of Extreme Scale Data (DOE, Nagiza Samatova) Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems (DOE, Nagiza Samatova) Damsel: A Data Model Storage Library for Exascale Science (DOE, Nagiza Samatova) Scalable Data Management, Analysis, and Visualization (SDAV) Institute (DOE) Nagiza Samatova; Anatoli Melechko) Scientific Data Management Center (DOE, Vouk) Collaborative Research: Understanding Climate Change: A Data Driven Approach (NSF, Nagiza Samatova; Frederick Semazzi) Policy-Based Governance for the OOI Cyberinfrastructure (NSF, Munindar Singh) Interdisciplinary Cyber-Enabled Crime Reconstruction through Innovative Methodology and Engagement (IC-CRIME); (NSF, David Hinks; Michael Young, ASU, IU-B)
Source: hsp://www.ibmbigdatahub.com/infographic/four- vs- big- data
Capturing Value from Data Create transparency Enable experimentation to discover needs, expose variability, and improve performance Segment populations to customize actions Replace and/or support human decision making with automated algorithms Innovate new business models, products, and services Source: Big Data: The next fron4er for innova4on, compe44on, and produc4vity, McKinsey Global Ins3tute, May 2011
Big Data Research and Development IniDaDve Consider: CQSB NCICS CSC ECE Transportation Pattern Recognition NC B- Prepared RENCI CSC COD CHiPS COE ITng CSC Modeling & Simulation Processing & Preservation Policy/ Governance Visualization v- Centennial CSC VCL Virtualization Data Types Structured Unstructured Image Signal Streams Analytics IAA, CSC, CIMS, MEAS, + Data Science Acquisition ApplicaDons Biological Sciences Business Climate Engineering Aps. Energy, Health Social, Humani3es Physics Security Policy Etc. Computation Sciences ICSE ORSC VCL CSC Math Physics Cyber Infrastructure Cyber Security Data Management ICSE ITng CSC ECE SOSI SDM CSC ITng SoSI CSC Natural Language Processing Fault Tolerance /Recovery ITng CSC ECE Networking Mobility/ Wireless ITng CSC, ECE, SOSI Informatics Gaming BRC, CSC Education DGRC IAA CEI CSC Stat
Graphic Representation of Data Science Source: The First Rule of Data Science, hsp://berkeleysciencereview.com/ar3cle/first- rule- data- science/
Big Data Workshops at NC State Two one-day Workshops held at NC State Hosted by VCs Terri Lomax and Marc Hoit Led by Tina Bennefield, HR Senior Consultant & Performance Leadership Program Manager Organized by Bonnie Aldridge Day 1 Individual faculty presentations on current research Table discussions on trends, barriers and needs Day 2 Developing a shared vision Developing recommendations
Big Data Workshop Attendees (* presented) CALS Bird* COT Pasquinelli CHASS McDonald* Wolfram CVM Breen Kennedy-Stoskopf* CNR Devine Whetten* PCOM Kouri* Kowolenko* Krishnamurthy* PAMS Blondin Brown* Daniels* Ghosh Ipsen Mitasova* Reading* Sullivant* Xie* Yuter Zhou COE Baron* Bolotnov* Chakrabortty* Chirkova Chow* Dai Edwards* Ferguson* Franzon* Healey* Krim* Misra* Muth Overton Rotenberg Vouk Westmoreland* Xie*
Big Data Workshop Findings Research trends Analysis of unstructured data sets Enhanced visualization methods Data interoperability and fusion techniques Model-driven vs data-driven approaches Barriers Infrastructure (bandwidth, storage, power, etc.) Human capital Privacy, proprietary and standards Departmental cultures Needs & Vision Understand industry funding Collaborative data tools Communication between producers and consumers Overarching coordinating structure