Data Science Initiative



Similar documents
NC State University Initiatives in Big Data

BigData Management and Analytics in the Cloud** In the Contenxt of the Materials Genome Initiative

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Data Centric Systems (DCS)

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

Center for Dynamic Data Analytics (CDDA) An NSF Supported Industry / University Cooperative Research Center (I/UCRC)

Visualization and Data Analysis

Center for Dynamic Data Analytics (CDDA) An NSF Supported Industry / University Cooperative Research Center (I/UCRC) Vision and Mission

locuz.com Big Data Services

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Master of Science in Computer Science

A Professional Big Data Master s Program to train Computational Specialists

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

New Jersey Big Data Alliance

National Big Data R&D Initiative

Professional Organization Checklist for the Computer Information Systems Curriculum

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

NC State Engineering. Research Overview. Richard F. Keltie Associate Dean, Graduate Programs and Research

DGE /DG Connect

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Doctor of Philosophy in Computer Science

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

Big Data a threat or a chance?

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Big Data R&D Initiative

Office of Information Technology Strategic Planning Process June 2012

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing

Joint Legislative Oversight Committee on Information Technology April 3, 2014

High Performance Computing

BIG DATA DRIVEN BUSINESS ANALYTICS: We are just getting started!

Data Analytics, Management, Security and Privacy (Priority Area B)

Powered by VCL - Using Virtual Computing Laboratory (VCL) Technology to Power Cloud Computing

BUSINESS INTELLIGENCE COMPETENCY CENTER

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

Master Specialization in Knowledge Engineering

IBM Deep Computing Visualization Offering

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

HPC & Visualization. Visualization and High-Performance Computing

Technology and Trends for Smarter Business Analytics

An interdisciplinary model for analytics education

NC State University Model for Providing Campus High Performance Computing Services

Using Predictive Analytics To Drive Workforce Optimization. New Insights From Big Data Analysis Uncover Key Drivers of Workforce Profitability

What You Need to Know About the NC State Chancellor S Faculty Excellence Program

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

NITRD and Big Data. George O. Strawn NITRD

GRADUATE DEGREES IN DATA ANALYTICS: MS MBA CONCENTRATION MS/MBA DUAL DEGREE

Appendix 1 ExaRD Detailed Technical Descriptions

Computational Science and Informatics (Data Science) Programs at GMU

DIGITAL FORENSICS SPECIALIZATION IN BACHELOR OF SCIENCE IN COMPUTING SCIENCE PROGRAM

Statistical Analysis and Visualization for Cyber Security

Hank Childs, University of Oregon

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

UMKC STRATEGIC PLAN LIFE AND HEALTH SCIENCES 1. IMPLEMENT ORGANIZATIONAL ENHANCEMENTS TO ADVANCE THE

CONNECTING DATA WITH BUSINESS

Big Data better business benefits

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

This Symposium brought to you by

The Talent Management Framework

Deloitte and SuccessFactors Workforce Analytics & Planning for Federal Government

Analysis of Current and Future Computer Science Needs via Advertised Faculty Searches for 2016

The National Consortium for Data Science (NCDS)

RFI Summary: Executive Summary

Proposal for New Program: Minor in Data Science: Computational Analytics

Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014

Elke Rundensteiner

Educating Cyber Professionals:

Accelerated Bachelor of Science/Master of Science in Computer Science. Dual Degree Program

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps

Science Gateways in the US. Nancy Wilkins-Diehr

Data UNC. Vinayak Deshpande

Big Data Management in the Clouds and HPC Systems

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

Making critical connections: predictive analytics in government

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Oracle Big Data Building A Big Data Management System

CLUSTER ANALYSIS WITH R

MOAB CON 2009 RMSC Case Study Providing Supercomputing Platforms as a Service (SPaaS)

Make the Most of Big Data to Drive Innovation Through Reseach

Leveraging Information For Smarter Business Outcomes With IBM Information Management Software

Information Technology Services Strategic Plan. Values and Foundational Principles

The Missing Middle: RMSC s Approach to Adoption & Outreach of HPC for Regional Economic Development

Vinay Parisa 1, Biswajit Mohapatra 2 ;

DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING

DESCRIPTION. Centennial Campus Dirty Bomb (SBI Bomb Squad, Raleigh Hazmat, NC Mass Casualty unit)

Associate or Full Professor Position in Leadership in Public Science (Communication)

IBM Software Hadoop in the cloud

Data Centric Interactive Visualization of Very Large Data

SDN Security Challenges. Anita Nikolich National Science Foundation Program Director, Advanced Cyberinfrastructure July 2015

I. Justification and Program Goals

Building Platform as a Service for Scientific Applications

NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Enterprise Storage Manager. Managing Video As A Strategic Asset November 2009

Big Data and Real World Evidence

Prairie View A&M University Computer Science Department 招 生 简 介

High Performance Computing Initiatives

Transcription:

Data Science Initiative Joint Research Committees Meeting October 27, 2014 Terri L. Lomax Vice Chancellor Research, Innovation + Economic Development

Data Science is a Big Deal Managing, processing and exploiting data to improve decision making will continue to grow in importance

Key Insights: after McKinsey (May 2011) From McKinsey (May 2011): 1. Data have swept into every industry and business function and are now an important factor of production. 2. Big Demand Data creates for value deep but analytical ONLY if appropriate positions and could advanced exceed Data Science the supply is used by to 140,000 deal with Big to Data. 190,000 positions in 3. Use 2018. of Big Data and Data Science will become a key basis of competition, The need growth for additional and pro-active managers risk management and analysts for individual in firms. 4. Big the Data US underpins who can new ask waves the of right productivity questions growth and consumer surplus. consume the results of data analysis is estimated 5. Big at Data 1.5 will million. matter across sectors, but some sectors are poised for greater gains. In the short-term, retraining existing talent will be 6. There is already a shortage of talent necessary for organizations to take required to meet demand. advantage of Big Data.

Big data can generate significant financial value across sectors Source: McKinsey Global Institute Analysis

Data Science is multidisciplinary Source: Data Science Research Center, Amsterdam (http://dsrc.nl/ what-is-data-science/) Large Scale Databases Distributed Processing SoBware Engineering Store and Process Decision Theory System/ Network Engineering Visual Analy3cs Understand and decide Business Analy3cs Security Privacy Provenance Informa3on Retrieval Percep3on Cogni3on Reasoning Analyze & Model Machine Learning Knowledge Represent n Mul3media Retrieval Modeling & Simula3on

Research Triangle - Data Science Powerhouse LAS

Data4Decisions unique concept is guided by the event s Advisory Council, a powerful compendium of the region s leading research universi3es, private companies and thought- leading Research Triangle- based associa3ons:

NC State Data Science-Related Centers + Institutes New: NSF I/UCRC on End- to- End Enablement of Data

Laboratory for Analytic Sciences (LAS) NSA s goal is to build an advanced data innovation hub in the Research Triangle with LAS as the anchor tenant

CHANCELLOR S FACULTY EXCELLENCE CLUSTERS

Data Science Education at NC State Institute for Advanced Analytics PSM, primarily SAS tools COE/PCOM executive education for based on opensource tools CSC graduate track in Data Science CSC/Stat/Math undergraduate concentration in Data Science PCOM executive education based on IBM tools UNC GA Research Opportunities Initiative Data Science Institutionalize NC State s Data Science Initiative (together with UNC Charlotte and RENCI)

Data Science Infrastructure at NC State Portions of VCL-HPC facilities (large memory machines) run Linux-based, user-provided analytics CSC MRC VCL-BigData testbed (x86 and IBM Power7 and Power8 computers with lots of memory, tightly coupled storage, and advanced accelerators) run IBM Analytics NCBP-VCL cluster lots of memory and disk space runs SAS analytics IAA VCL facilities primarily runs SAS analytics OSCAR lab - Extensive BiGData and Data Science computational and data storage facilities, including a BlueGene/P supercomputer, also LAS low lab.

NC State Data Science Initiative Goals Raise visibility & increase reputation Coordinate data science activities, including education Increase research funding Build industry partnerships Establish interdisciplinary undergraduate curriculum Provide services & infrastructure to faculty Organizational Structure Director / Assistant Coordinating Council Steering Committee External Advisory Committee

Data Science Initiative Coordinating Council (formative stage) COE Mladen Vouk CSC Director Dan Stancil ECE Jerry Bernholc CHIPS Michael Young DGRC James Lester CEI Paul Turinsky CASL Jacob Jones AIF Dennis Kekas ITng Yousry Azmy CNEC Rada Chirkova new I/UCRC (STEED Lab, CHMPR) Provost Michael Rappa IAA ORIED Otis Brown NCICS COS Montse Fuentes Stat Marie Davidian CQSB Alyson Wilson Cluster, LAS John Blondin Phys Loek Helminck Math Tom Banks CRSC Fred Wright Bioinformatics PCOM Mike Kowolenko CIMS CED Glenn Kleiman WIFIEI CNR Ross Mietenmeyer GSA CHASS Carolyn Miller Dig. Humanities

Summary Managing and extracting information from complex data sets continues to grow in importance in most all sectors of the US economy The Research Triangle has significant programs in data science that will be leveraged for future growth Importance of data science recognized by all levels and types of industry, government and academia Working together at NC State, we can: capitalize on multidisciplinary opportunities, build significant programs, and educate the skilled workforce to maintain our national leadership in analytics and data science

Data to Knowledge Terri L. Lomax research.ncsu.edu

Gap Analysis for Data Science Cluster Proposal Physical models Social models Symbolic models Numerical solvers HPC, HPD, OS So=ware architectures Storage/Index/Access Privacy Security StaDsDcal methods Discrete mathemadcs AI/Knowledge Mgmt Database Data integradon Natural Language 1 2 3 4 Coverage: Weak LiIle Strong

Current Gap Analysis for Data Science Cluster (Oct 2014) Physical models Social models Symbolic models Numerical solvers HPC, HPD, OS So=ware architectures Storage/Index/Access Privacy Security StaDsDcal methods Discrete mathemadcs AI/Knowledge Mgmt Database Data integradon Natural Language 1 2 3 4 Coverage: Weak LiIle Strong

CSC Data Science Grants current & recent Ensemble and Comparative Visualization of Scientific Datasets (Sandia, Christopher Healey) Computer-aided Human Centric Cyber Situation Awareness (Penn State, Peng Ning; Michael Young) Runtime System for I/O Staging in Support of In-Situ Processing of Extreme Scale Data (DOE, Nagiza Samatova) Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems (DOE, Nagiza Samatova) Damsel: A Data Model Storage Library for Exascale Science (DOE, Nagiza Samatova) Scalable Data Management, Analysis, and Visualization (SDAV) Institute (DOE) Nagiza Samatova; Anatoli Melechko) Scientific Data Management Center (DOE, Vouk) Collaborative Research: Understanding Climate Change: A Data Driven Approach (NSF, Nagiza Samatova; Frederick Semazzi) Policy-Based Governance for the OOI Cyberinfrastructure (NSF, Munindar Singh) Interdisciplinary Cyber-Enabled Crime Reconstruction through Innovative Methodology and Engagement (IC-CRIME); (NSF, David Hinks; Michael Young, ASU, IU-B)

Source: hsp://www.ibmbigdatahub.com/infographic/four- vs- big- data

Capturing Value from Data Create transparency Enable experimentation to discover needs, expose variability, and improve performance Segment populations to customize actions Replace and/or support human decision making with automated algorithms Innovate new business models, products, and services Source: Big Data: The next fron4er for innova4on, compe44on, and produc4vity, McKinsey Global Ins3tute, May 2011

Big Data Research and Development IniDaDve Consider: CQSB NCICS CSC ECE Transportation Pattern Recognition NC B- Prepared RENCI CSC COD CHiPS COE ITng CSC Modeling & Simulation Processing & Preservation Policy/ Governance Visualization v- Centennial CSC VCL Virtualization Data Types Structured Unstructured Image Signal Streams Analytics IAA, CSC, CIMS, MEAS, + Data Science Acquisition ApplicaDons Biological Sciences Business Climate Engineering Aps. Energy, Health Social, Humani3es Physics Security Policy Etc. Computation Sciences ICSE ORSC VCL CSC Math Physics Cyber Infrastructure Cyber Security Data Management ICSE ITng CSC ECE SOSI SDM CSC ITng SoSI CSC Natural Language Processing Fault Tolerance /Recovery ITng CSC ECE Networking Mobility/ Wireless ITng CSC, ECE, SOSI Informatics Gaming BRC, CSC Education DGRC IAA CEI CSC Stat

Graphic Representation of Data Science Source: The First Rule of Data Science, hsp://berkeleysciencereview.com/ar3cle/first- rule- data- science/

Big Data Workshops at NC State Two one-day Workshops held at NC State Hosted by VCs Terri Lomax and Marc Hoit Led by Tina Bennefield, HR Senior Consultant & Performance Leadership Program Manager Organized by Bonnie Aldridge Day 1 Individual faculty presentations on current research Table discussions on trends, barriers and needs Day 2 Developing a shared vision Developing recommendations

Big Data Workshop Attendees (* presented) CALS Bird* COT Pasquinelli CHASS McDonald* Wolfram CVM Breen Kennedy-Stoskopf* CNR Devine Whetten* PCOM Kouri* Kowolenko* Krishnamurthy* PAMS Blondin Brown* Daniels* Ghosh Ipsen Mitasova* Reading* Sullivant* Xie* Yuter Zhou COE Baron* Bolotnov* Chakrabortty* Chirkova Chow* Dai Edwards* Ferguson* Franzon* Healey* Krim* Misra* Muth Overton Rotenberg Vouk Westmoreland* Xie*

Big Data Workshop Findings Research trends Analysis of unstructured data sets Enhanced visualization methods Data interoperability and fusion techniques Model-driven vs data-driven approaches Barriers Infrastructure (bandwidth, storage, power, etc.) Human capital Privacy, proprietary and standards Departmental cultures Needs & Vision Understand industry funding Collaborative data tools Communication between producers and consumers Overarching coordinating structure