1 Demystifying The Data Scientist Natasha Balac, Ph.D. Predictive Analytics Center of Excellence, Director San Diego Supercomputer Center University of California, San Diego
2 Brief History of SDSC : NSF national supercomputer center; managed by General Atomics : NSF PACI program leadership center; managed by UCSD PACI: Partnerships for Advanced Computational Infrastructure : Internal transition to support more diversified research computing still NSF national resource provider 2009-future: Multi-constituency cyberinfrastructure (CI) center provide data-intensive CI resources, services, and expertise for campus, state, and nation Approaching $1B in lifetime contract and grant activity
3 PACE Predictive Analytics Center of Excellence Closing the gap between Government, Industry and Academia
4 PACE: Closing the gap between Government, Industry and Academia Bridge the Industry and Academia Gap Provide Predictive Analytics Services Data Mining Repository of Very Large Data Sets Inform, Educate and Train Predictive Analysis Center of Excellence Foster Research and Collaboration Develop Standards and Methodology High Performance Scalable Data Mining PACE is a non-profit, public educational organization oto promote, educate and innovate in the area of Predictive Analytics oto leverage predictive analytics to improve the education and well being of the global population and economy oto develop and promote a new, multi-level curriculum to broaden participation in the field of predictive analytics
5 Develop Standards and Methodology Bridge the Industry and Academia Gap Provide Predictive Analytics Services Data Mining Repository of Very Large Data Sets Inform, Educate and Train Predictive Analysis Center of Excellence Foster Research and Collaboration Develop Standards and Methodology High Performance Scalable Data Mining CRISP-DM PMML Data Mining Standards: Data Warehouse, Web, API and Grid standards Benchmarking NIST Big Data WG
6 Bridge the Industry and Academia Gap Provide Predictive Analytics Services Data Mining Repository of Very Large Data Sets Inform, Educate and Train Predictive Analysis Center of Excellence Foster Research and Collaboration Develop Standards and Methodology High Performance Scalable Data Mining A data-intensive supercomputer based on SSD flash memory and virtual shared memory Emphasizes MEM and IO over FLOPS Scalable Data Mining Visualizations
7 Foster Research and Collaboration Bridge the Industry and Academia Gap Provide Predictive Analytics Services Data Mining Repository of Very Large Data Sets Inform, Educate and Train Predictive Analysis Center of Excellence Foster Research and Collaboration Develop Standards and Methodology High Performance Scalable Data Mining Fraud Detection Modeling user behaviors Smart Grid Analytics Solar powered system modeling Microgrid anomaly detection Battery Storage Analytics Sport Analytics Genomics
8 UCSD Smart Grid UCSD Smart Grid sensor network data set 45MW peak micro grid; daily population of over 54,000 people Self-generate 92% of its own annual electricity load Smart Grid data over 100,000 measurements/sec Sensor and environmental/weather data Large amount of multivariate and heterogeneous data streaming from complex sensor networks Predictive Analytics throughout the Microgrid
9 Solar Power Integration Through Predictive Analytics Solar + EV integration Predictive analytics Business & Societal change Battery Storage Analytics
10 Sustainable San Diego Partnership Clean Tech San Diego, OSIsoft, SDG&E and UC San Diego Common data infrastructure connects physical assets: electrical, gas, water, waste, buildings, transportation &traffic Platform to securely transfer high volumes of Big Data from multiple, distributed measurement units Crowd-sourced Big Data in a cyber-secure, private cloud Predictive analytics on real-time time-series data White House Big Data Event: Data to Knowledge to Action Launch Partners Award Predictive Analytics Center of Excellence
11 Big Data from Microgrids Complexities introduced by the large amount of multivariate and heterogeneous data streaming from complex sensor networks Extremely large, complex sensor networks, enabling a novel feature reduction method that scales well
12 Number of V s of Big Data IBM, 2012
17 What Is The Value In Big Data?
18 What to do with big data? ERIC SALL s list Big Data Exploration To get an overall understanding of what is there 360 degree view of the customer Combine both internally available and external information to gain a deeper understanding of the customer Monitoring Cyber-security and fraud in real time Operational Analysis Leveraging machine generated data to improve business effectiveness Data Warehouse Augmentation Enhancing warehouse solution with new information models and architecture
19 Big Data Big Training Data Scientist The Hot new gig in town O Reilly report Data Scientist: The Sexiest Job of the 21st Century Harvard Business Review, October 2012 The next sexy job in next 10 years will be statistician Hal Varian, Google Chief Economist Geek Chic Wall Street Journal new cool kids on campus The future belongs to the companies and people that turn data into products The human expertise to capture and analyze big data is both the most expensive and the most constraining factor for most organizations pursuing big data initiatives Thomas Davenport New curriculum Boot camps, Certificates, Data Science Institute, 14 MAS
20 Data scientist: The hot new gig in tech Article in Fortune The unemployment rate in the U.S. continues to be abysmal (9.1% in July), but the tech world has spawned a new kind of highly skilled, nerdy-cool job that companies are scrambling to fill: data scientist McKinsey Global Institute Big data Report By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions
21 Data Science Job Growth By 2018 shortage of ,000 predictive analysts and 1.5M managers / analysts in the US
22 What is Data Science? Who are Data Scientists? The buzz and the rise of Analytics culture Big Data and Data science Defining Data Science Who? What? Where? How? The new Cool Kids on Campus
23 What is Data Science? Who are Data Scientists?
24 Rising Part of the Culture and Enterprises Movies and bestsellers Enterprises competing Chief Data Officer now Chief Data Scientist Data Driven Decisions strategic, tactical, operational
25 Past and Present Traditional approaches have been for DM experts: White-coat PhD statisticians DM tools also fairly expensive Today: approach is designed for those with some Database/Analytics skills DM built into DB, easy to use GUI, Workflows Many jobs available from statistical analyst to data scientist but many people are lacking education and experience! Hands on experience is extremely valuable
26 Adapted from B. Tierney, 2013 Business Analysis Operations Research
27 Successful Data Scientist Characteristics Intellectual curiosity, Intuition Find needle in a haystack Ask the right questions value to the business Communication and engagements Presentation skills Let the data speak but tell a story Story teller drive business value not just data insights Creativity Guide further investigation Business Savvy Discovering patterns that identify risks and opportunities Measure
28 Big Data And Beyond Fueling Data Science Beyond Relational- click streams, machine data Beyond Structure - raw complex data that defines metadata and structure Beyond Warehouse - Hadoop, Mapreduce, NoSQL Advanced analytics - beyond BI, dashboards and warehouses Expanded views Behavioral data, social media data, multiple sources Sample sizes explode Correlation vs. Causality Deep learning
29 Data Scientist Job Description Amazon s Shopper Marketing & Insights team focuses on serving the advertisers and our overall ad business to provide strategic media planning, customer insights, targeting recommendations, and measurement and optimization of advertising. We are hiring outstanding Data Scientists who will use innovative statistical and machine learning approaches to drive advertising optimization and contribute to the creation of scalable insights. The ideal candidate should have one hand on the white-board writing equations and one hand on the keyboard writing code.
30 Data Scientist Qualities
31 Data Scientist Qualities A New Role Exist the Data Scientist One part Scientist/Statistician Two parts Sleuth/Artist One Part Programmer Working with business leaders and process owners to create business value
32 Analyzing the Analyzers O Reilly Strata Survey Harris, Murphy & Vaisman, 2013 Based on how data scientists think about themselves and their work, not Years of experience, Academic degrees, favorite tools Titles, pay scales, org charts Identified four Data Scientist clusters
33 Data Scientist Self-ID O Reilly Strata Survey suggested Self-ID Group, along with the self-id categories most strongly associated with each Group
34 Strata Survey Skills Strata survey, 2013
35 Strata survey, 2013
36 Range of Skills Engineering analogy Define Analyst by the breath of skills most successful data scientists are those with substantial, deep expertise in at least one aspect of data science, be it statistics, big data, or business communication T-Shaped Skills Data science is an inherently collaborative and creative Skills terminology ID career path development Strata survey, 2013
37 Strata survey, 2013
38 Learning and Training Opportunities Many MS, MAS, Courses, Training, Workshops, Certificates, Boot camps, etc. Introduction to Data Science Example Part 1: Data Manipulation at scale Databases and the relational algebra Parallel databases, parallel query processing, in-database analytics, MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages Key-value stores and NoSQL; Entity resolution, record linkage Part 2: Analytics, Predictive Analytics, Text mining Part 3: Communicating Results Visualization, data products, visual data analytics Provenance, privacy, ethics, governance https://www.coursera.org/course/datasci
39 From Theory To Practice Internships large organizations Mentoring Organization Integration Data Science Team direct access to both raw data and decision-makers diversity of skills to make best use of that access Avoid silos rotated among internal teams Career Paths to manage or not to manage?
40 How Long Does It Take For a Beginner to Become a Good Data Scientist? Number of Year Responses < 1 year (6) 2% 1-2 years (33) 12% 2-4 years (85) 31% 5-8 years (91) 33% > 8 years (35) 13% Not sure (28) 10% KDnuggets survey [278 votes total]
41 How long does it take for a beginner to become a good data scientist per Region? Region (Count) Avg Years to become a good data scientist AU/NZ (9) 6.9 years E. Europe (19) 5.9 years US/Canada (143) 4.9 years W. Europe (60) 4.9 years Asia (25) 4.9 years Africa/Middle East (9) 4.4 years Latin America (12) 3.9 years
42 Key to a Great Data Scientist Τεχηνιχαλ σκιλλσ (Προγραµµινγ, Στατιστιχσ, Ματη) + Χοµµιτµεντ +Χρεατιϖιτψ + Ιντυιτιον +Πρεσεντατιον Σκιλλσ +Βυσινεσσ Σαϖϖψ = Γρεατ ατα Σχιεντιστ!
43 Key to a Great Data Scientist Technical skills (Coding, Statistics, Math) + Commitment +Creativity + Intuition +Presentation Skills +Business Savvy = Great Data Scientist!
44 PACE Education Educating the new generation of Data Scientist Data mining Boot Camps Boot Camp 1 February, 2014 Boot Camp 2 March 2014 On-site training Tech Talks - 3rd Wednesday Workshops one day latest and greatest Hadoop, PMML Institute for Data Science and Engineering MAS in Data Science in Fall 14
45 Mike Gualtieri's blog
47 Questions? For further information, contact Natasha Balac Thank you!
Hurwitz ViCtOrY index Advanced Analytics: The Hurwitz Victory Index Report SAP Hurwitz Index d o u b l e v i c t o r Marcia Kaufman COO and Principal Analyst Daniel Kirsch Senior Analyst Table of Contents
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
B ig Data and Analytics in Northern Virginia and the Potomac Region May 2014 Sponsored by Northern Virginia Technology Council 2214 Rock Hill Road Herndon, Virginia 20170 703.904.7878 (phone) 703.904.8009
Big Data in Big Companies Date: May 2013 Authored by: Thomas H. Davenport Jill Dyché Copyright Thomas H. Davenport and SAS Institute Inc. All Rights Reserved. Used with permission Introduction Big data
IBM Industries White paper Business analytics in the cloud Driving business innovation through cloud computing and analytics solutions 2 Business analytics in the cloud Contents 2 Abstract 3 The case for
TDWI research First Quarter 2014 BEST PRACTICES REPORT Predictive Analytics for Business Advantage By Fern Halper Co-sponsored by: tdwi.org TDWI research BEST PRACTICES REPORT First Quarter 2014 Predictive
EXCLUSIVELY FOR TDWI PREMIUM MEMBERS volume 18 number 4 The leading publication for business intelligence and data warehousing professionals Analytics Outsourcing: The Hertz Experience 4 Hugh J. Watson,
BIG DATA Preconditions to Productivity Dr. Steve Hallman, DBA 1, Dr. Michel Plaisent 2, Jasur Rakhimov 3, Dr. Prosper Bernard 4 1,3 MBA Program, Park University, Parkville, MO 64152 U.S.A. 2 Management
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
Human Capital Management Trends 2013 It s a Brave New World January 2013 Mollie Lombardi and Madeline Laurano Page 2 Executive Summary Human capital management is a key business initiative. Without insight
2014 The Massachusetts Big Data Report A Foundation For Global Leadership The Massachusetts Big Data Ecosystem Chapter Three MassTech: Who We Are The Massachusetts Technology Collaborative, or MassTech,
IBM Business Services Business Analytics and Optimization In collaboration with Saïd Business School at the University of Oxford Executive Report IBM Institute for Business Value Analytics: The real-world
BIG DATA TECHNOLOGIES, USE CASES, AND RESEARCH ISSUES Il-Yeol Song, Ph.D. College of Computing & Informatics Drexel University Philadelphia, PA 19104 ACM SAC 2015 April 14, 2015 Salamanca, Spain Source:
2014 Market Research Results R E P O R T Data Analytics/Big Data in Financial Services Data Analytics and Big Data in Financial Services Market Research Results Market Research Report Components: Introduction
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
CHAPTER 1.9 Making Big Data Something More than the Next Big Thing ANANT GUPTA HCL Technologies Big data is the business buzzword du jour. But how can you turn this hot topic into a real source of business
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
May 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company s business
CIO Roundtable - Big March 13, 2013 Big and its Dimensions Big refers to internal and external data that is multi-structured, generated from diverse sources in near real-time and in large volumes making
Cover Page DEMYSTIFYING BIG DATA A Practical Guide To Transforming The Business of Government Prepared by TechAmerica Foundation s Federal Big Data Commission 1 TechAmerica Foundation: Federal Big Data
Big Data: Opportunities and Challenges Omar Al-Hujran, Rana Wadi, Reema Dahbour, Mohammad Al-Doughmi Department of Management Information Systems Princess Sumaya University for Technology Amman, Jordan
E N V I R O N M E N T A L S C A N CYBERSECURITY Los Angeles and Orange Counties J U N E 2 0 1 2 E N V I R O N M E N T A L S C A N CENTER OF EXCELLENCE Los Angeles and Orange Counties Audrey Reille, Director
Research Memorandum 94 July 2014 Creating business value from Big Data and business analytics: organizational, managerial and human resource implications Hull University Business School Prof Richard Vidgen
July 2013 Contents 1. Introduction 3 2. What is Big Data? 4 3. Big Data Adoption 5 4. Drivers and Barriers 11 5. Opportunities for Digital Entrepreneurship 14 5.1. Supply-side Business opportunities 14
3 Big Data: Challenges and Opportunities Roberto V. Zicari Contents Introduction... 104 The Story as it is Told from the Business Perspective... 104 The Story as it is Told from the Technology Perspective...
For: Application Development & Delivery professionals The Forrester Wave : Big Data Predictive Analytics Solutions, Q1 2013 by mike Gualtieri, January 3, 2013 updated: January 13, 2013 Key TaKeaWays Vendors