Data Science Applications & Use Cases

Similar documents
Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Statistics for BIG data

An interdisciplinary model for analytics education

Sunnie Chung. Cleveland State University

Integrating a Big Data Platform into Government:

Big Data Explained. An introduction to Big Data Science.

BEYOND POINT AND CLICK THE EXPANDING DEMAND FOR CODING SKILLS BURNING GLASS TECHNOLOGIES JUNE 2016

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Science, Predictive Analytics & Big Data Analytics Solutions. Service Presentation

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

Introduction to Data Mining

Introduction to Data Mining

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Making big data simple with Databricks

ESS event: Big Data in Official Statistics

Navigating Big Data business analytics

ANALYTICS CENTER LEARNING PROGRAM

Predictive Analytics Certificate Program

How To Handle Big Data With A Data Scientist

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Hurwitz ValuePoint: Predixion

Big Data Executive Survey

This Symposium brought to you by

Big Data and Healthcare Payers WHITE PAPER

Collaborations between Official Statistics and Academia in the Era of Big Data

Doing Multidisciplinary Research in Data Science

Information Management course

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Advanced Big Data Analytics with R and Hadoop

BIG DATA What it is and how to use?

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

COMP9321 Web Application Engineering

Big Data and Analytics: Challenges and Opportunities

Data Mining Applications in Higher Education

Data Science Certificate Program

Outline. What is Big data and where they come from? How we deal with Big data?

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Big Data and Data Science. The globally recognised training program

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

MACHINE LEARNING BASICS WITH R

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

ANALYTICS IN BIG DATA ERA

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

Role of Social Networking in Marketing using Data Mining

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Big Data: Key Concepts The three Vs

Machine Learning with MATLAB David Willingham Application Engineer

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Proposal for New Program: Minor in Data Science: Computational Analytics

Sunnie Chung. Cleveland State University

DRIVING COMPETITIVE ADVANTAGE BY PREDICTING THE FUTURE

Customer Case Study. Sharethrough

Foundations of Business Intelligence: Databases and Information Management

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Proposal for New Program: BS in Data Science: Computational Analytics

Big Data. Fast Forward. Putting data to productive use

Statistics Meets Big Data 統 計 遇 見 大 數 據

Big Data and Data Science: Behind the Buzz Words

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics

Advanced In-Database Analytics

GETTING STARTED WITH R AND DATA ANALYSIS

Why big data? Lessons from a Decade+ Experiment in Big Data

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

BIG DATA & DATA SCIENCE

A Berkeley View of Big Data

Information Visualization WS 2013/14 11 Visual Analytics

MSCA Introduction to Statistical Concepts

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

How To Turn Big Data Into An Insight

How To Make Sense Of Data With Altilia

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Indian Journal of Science The International Journal for Science ISSN EISSN Discovery Publication. All Rights Reserved

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Databricks. A Primer

Data Science at U of U

White Paper. Version 1.2 May 2015 RAID Incorporated

DATA SCIENCE ADVISING NOTES David Wild - updated May 2015

CLUSTER ANALYSIS WITH R

BIG DATA: BIG BOOST TO BIG TECH

Predictive Analytics

Why You Should Consider Graduate Study in the UCLA Department of Biostatistics

HSD. W Business Analytics (M.Sc.) IT in Business Analytics. IT Applications in Business Analytics SS2016 / 01 Introduction Thomas Zeutschler

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Increase Revenue THE JOURNEY TO BIG DATA. Gary Evans. CTO EMC Ireland. Twitter.com/Gary3vans. Copyright 2013 EMC Corporation. All rights reserved.

Data Mining and Machine Learning in Bioinformatics

Information and Decision Sciences (IDS)

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

I. Justification and Program Goals

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

ANALYTICS STRATEGY: creating a roadmap for success

HITACHI DATA SYSTEMS INTRODUCES NEW SOLUTIONS AND SERVICES TO MAKE SOCIETIES SAFER, SMARTER AND HEALTHIER

Transforming the Telecoms Business using Big Data and Analytics

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

Transcription:

Data Science Applications & Use Cases Instructor: Ekpe Okorafor 1. Accenture Big Data Academy 2. Computer Science African University of Science & Technology

Objectives Objectives Understand Big Data Challenges What exactly is Data Science and what do Data Scientists do Data Science contrasted with other disciplines Case Study & Use Cases 2

Outline Big Data & Challenges What is Data Science Data Science & Academia Data Science & Others Case Studies Essential points Conclusion 3

Data All Around Lots of data is being collected and warehoused Scientific Experiments Internet of Things Web data, e-commerce Financial transactions, bank/credit transactions Online trading and purchasing Social Network many more! 4

Big Data Big Data are data sets so large or so complex that traditional methods of storing, accessing, and analyzing their breakdown are too expensive. However, there is a lot of potential value hidden in this data, so organizations are eager to harness it to drive innovation and competitive advantage. Big Data technologies and approaches are used to drive value out of data rich environments in ways that traditional analytics tools and methods cannot. 5

What To Do With These Data? Aggregation and Statistics Data warehousing and OLAP Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining Statistical Modeling Data Driven Predictive Analytics Deep Learning 6

Big Data & Data Science the sexy job in the next 10 years will be statisticians, Hal Varian, Google Chief Economist The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. McKinsey Global Institute s June 2011 New Data Science institutes being created or repurposed NYU, Columbia, Washington, UCB,... New degree programs, courses, boot-camps: e.g., at Berkeley: Stats, I-School, CS, Astronomy One proposal (elsewhere) for an MS in Big Data Science Plans for Data Science Stream at AUST RDA-CODATA School of Research Data Science 7

What is Data Science? Some definitions link computational, statistical, and substantive expertise. 8

What is Data Science? Other definitions focus more on technical skills alone. 9

What is Data Science? An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data Data science principles apply to all data big and small 10

What is Data Science? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education Computer Science Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI Mathematics Mathematical Modeling Statistics Statistical and Stochastic modeling, Probability. 11

Data Science Vs Analysis Vs Software Delivery Component Traditional Analysis Traditional Software Delivery Tools Analytical Methods Team Structure Time Frame SAS, R, Excel, SQL, inhouse tools Regressions, classifications, measuring prediction accuracy and coverage/error, sampling Statisticians, Mathematicians, Scientists Either: Usually on-going research and discovery within a team in the organization Or: Specific project to determine answers Java, source control, Linux, continuous integration, unit testing, bug reports and project management N/A Developers, Project Managers, Systems Engineers Regular software release cycle, continuous delivery, etc. Data Science R, Java, scientific Python libraries, Excel, SQL, Hadoop, Hive, Pig, Mahout and other machine learning libraries, github for source control and issue management Classification, clustering, similarity detection, recommenders, unsupervised and supervised learning, small- and large-scale computations, measuring prediction accuracy and coverage/error Mathematicians, Statisticians, Scientists, Developers, Systems Engineers Either: Discovery/learning phase leading to product development Or: On-going research and product invention/improvement 12

Contrast: Scientific Computing Image General purpose classifier Supernova Not Nugent group / C3 LBL Scientific Modeling Physics-based models Problem-Structured Mostly deterministic, precise Run on Supercomputer or High-end Computing Cluster Data-Driven Approach General inference engine replaces model Structure not related to problem Statistical models handle true randomness, and un-modeled complexity. Run on cheaper computer Clusters (EC2) 13

Contrast: Machine Learning Machine Learning Develop new (individual) models Prove mathematical properties of models Improve/validate on a few, relatively clean, small datasets Publish a paper Data Science Explore many models, build and tune hybrids Understand empirical properties of models Develop/use tools that can handle massive datasets Take action! 14

Contrast: Data Engineering Data Science Data Engineering Approach Scientific (Exploration) Engineering (Development) Problems Unbounded Bounded Path to Solution Education Iterative, exploratory, nonlinear More is better (PhD s common) Mostly linear BS and/or self-trained Presentation Skills Important Not as important Research Experience Programming Skills Important Not as important Not as important Important Data Skills Important Important 15

Data Science & Academia In the words of Alex Szalay, these sorts of researchers must be "Pi-shaped" as opposed to the more traditional "T-shaped" researcher. In Szalay's view, a classic PhD program generates T-shaped researchers: scientists with widebut-shallow general knowledge, but deep skill and expertise in one particular area. The new breed of scientific researchers, the data scientists, must be Pishaped: that is, they maintain the same wide breadth, but push deeper both in their own subject area and in the statistical or computational methods that help drive modern research: 16

Data Science & Academia In a post by Jake Vanderplas in 2014 related to SciFoo discussion on: Academia and Data Science, the following questions below were discussed. I encourage you to develop your own thoughts on them and come up with your assessment Where does Data Science fit within the current structure of the university & research institutions? What is it that academic data scientists want from their career? How can academia offer that? What drivers might shift academia toward recognizing & rewarding data scientists in domain fields? Recognizing that graduates will go on to work in both academia and industry, how do we best prepare them for success in both worlds? 17

Data Science Applications Summary What is happening? What is possible Business Health Care Urban Leaving From car design to insurance to pizza delivery, businesses are using data science to optimize their operations and better meet their customers expectations. Two-Way Street for the Ford Focus Electric Car Better Fraud Detection Boosts Customer Satisfaction E-Commerce Insights: Domino s Secret Sauce Using Social Data to Select Successful Retail Locations. Tomorrow s healthcare may look more efficient thanks to things like electronic health records. It also may look a lot more effective. Reduced readmissions, better care, and earlier detection are on the horizon. Reducing Hospital Readmissions Better Point-of-Care Decisions Medical Exams by Bathroom Mirrors For the first time in human history, more people live in cities than in suburban or rural areas. An emerging field called urban informatics combines data science with the unique challenges facing the world s growing cities Taking on Megacity Traffic Fighting Crime with Data "predictive policing" Instrumenting cities 18

Contrast: Computational Sciences Is there a contrast between Data Science and Computational Science? 19

Data Science: Case Study Cancer Research Cancer is an incredibly complex disease; a single tumor can have more than 100 billion cells, and each cell can acquire mutations individually. The disease is always changing, evolving, and adapting. Employ the power of big data analytics and high-performance computing. Leverage sophisticated pattern and machine learning algorithms to identify patterns that are potentially linked to cancer Huge amount of data processing and recognition 20

Data Science: Case Study Health Care Stanford Medicine, Google team up to harness power of data science for health care Stanford Medicine will use the power, security and scale of Google Cloud Platform to support precision health and more efficient patient care. Analyzing genetic data Focusing on precision health Data as the engine that drives research http://med.stanford.edu/news/all-news/2016/08/stanford-medicine-google-team-up-to-harness-power-of-data-science.html 21

Data Science: Case Study Elections The Obama campaigns in 2008 and 2012 are credited for their successful use of social media and data mining. Micro-targeting in 2012 http://www.theatlantic.com/politics/archive/2012/04/thecreepiness-factor-how-obama-and-romney-are-getting-to-knowyou/255499/ http://www.mediabizbloggers.com/group-m/how-data-and-micro- Targeting-Won-the-2012-Election-for-Obama---Antony-Young- Mindshare-North-America.html Micro-profiles built from multiple sources accessed by aps, realtime updating data based on door-to-door visits, focused media buys, e-mails and Facebook messages highly targeted. 1 million people installed the Obama Facebook app that gave access to info on friends. 22

Data Science: Case Study Internet of Things (IoT) The Internet of Things is rapidly growing. It is predicted that more than 25 billion devices will be connected by 2020. The Internet of Things (IOT) will soon produce a massive volume and variety of data at unprecedented velocity. If "Big Data" is the product of the IOT, "Data Science" is it's soul. 23

Data Science: Case Study Customer Analytics 24

Essential Points Big Data has given rise to Data Science Data science is rooted in solid foundations of mathematics and statistics, computer science, and domain knowledge Sexy profession Data Scientists Not every thing with data or science is Data Science! The use cases for Data Science are compelling 25

Conclusion In this section you have learned What Big Data Challenges are What exactly is Data Science and what do Data Scientists do Data Science contrasted with other disciplines Case Study & Use Cases 26

Questions? 27

Thank You! http://www.ign.com/articles/2015/12/16/star-wars-the-force-awakens-review 28