A RoadMap to Data Science. Dr. Geoffrey Malafsky CEO, Phasic Systems Inc. www.phasicsystemsinc.com 703-945-1378



Similar documents
How to Run a Successful Big Data POC in 6 Weeks

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

Creating a Business Intelligence Competency Center to Accelerate Healthcare Performance Improvement

Ten Mistakes to Avoid

Measure Your Data and Achieve Information Governance Excellence

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

CONNECTING DATA WITH BUSINESS

Challenges of Analytics

From Lab to Factory: The Big Data Management Workbook

Implementing an Information Governance Program CIGP Installment 2: Building Your IG Roadmap by Rick Wilson, Sherpa Software

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Class 2. Learning Objectives

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Top 10 Business Intelligence (BI) Requirements Analysis Questions

Extend your analytic capabilities with SAP Predictive Analysis

Big Data Governance. ISACA Chapter Annual Conference Sarova Whitesands Hotel, Mombasa 29th - 31st July, Prof. Ddembe Williams KCA University

BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance

Master of Science in Health Information Technology Degree Curriculum

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Getting Started Practical Input For Your Roadmap

Big Data and Data Analytics

Developing an Analytics Strategy that Drives Healthcare Transformation

Wikibon Big Data Analytics Adoption Survey, Frequency Analysis

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

How To Use Big Data For Business

Big Data Comes of Age: Shifting to a Real-time Data Platform

The Enterprise Data Hub and The Modern Information Architecture

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

Self-Service Big Data Analytics for Line of Business

Synerscope Sept 2013

Information Management & Data Governance

Ganzheitliches Datenmanagement

Big Data Analytics Nokia

Presented By: Leah R. Smith, PMP. Ju ly, 2 011

White Paper

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Achieving Greater Agility with Business Intelligence Improving Speed and Flexibility for BI, Analytics, and Data Warehousing.

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Traditional BI vs. Business Data Lake A comparison

Big Data and Healthcare Payers WHITE PAPER

End Small Thinking about Big Data

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Three Open Blueprints For Big Data Success

The Future of Data Management

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

SAP BusinessObjects Information Steward

The New Landscape of Business Intelligence & Analytics New Opportunities, Roles and Outcomes. Summit 2015 Orlando London Frankfurt Madrid Mexico City

Data Warehousing in the Age of Big Data

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

Business Intelligence for The Internet of Things

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Getting Started with Business Intelligence

Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short

The Clear Path to Business Intelligence

Winning with an Intuitive Business Intelligence Solution for Midsize Companies

Operational Analytics

RESEARCH REPORT. The State of Streaming Big Data Analytics: 2014 Survey Results

ENTERPRISE BI AND DATA DISCOVERY, FINALLY

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

OPTIMUS SBR. Optimizing Results with Business Intelligence Governance CHOICE TOOLS. PRECISION AIM. BOLD ATTITUDE.

Business Data Authority: A data organization for strategic advantage

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

SOLUTION BRIEF CA ERwin Modeling. How can I understand, manage and govern complex data assets and improve business agility?

The Lab and The Factory

The Essential CMO Guide to an Agile B2B Marketing Plan

Data Virtualization and ETL. Denodo Technologies Architecture Brief

9 Reasons Your Product Needs. Better Analytics. A Visual Guide

ETL tools for Data Warehousing: An empirical study of Open Source Talend Studio versus Microsoft SSIS

Data2Diamonds Turning Information into a Competitive Asset

HOW TO USE THE DGI DATA GOVERNANCE FRAMEWORK TO CONFIGURE YOUR PROGRAM

What? So what? NOW WHAT? Presenting metrics to get results

15 Principles of Project Management Success

UNIFY YOUR (BIG) DATA

Big Analytics: A Next Generation Roadmap

THE ANALYTICS HUB LEVERAGING A SHARED SERVICES MODEL TO UNLOCK BIG DATA. Thomas Roland Managing Director. David Roggen Director CONTENTS

Key Issues for Data Management and Integration, 2006

Using Tableau Software with Hortonworks Data Platform

TOP 10 TRENDS FOR 2016 BUSINESS INTELLIGENCE

KNOWLEDGENT WHITE PAPER. Big Data Enabling Better Pharmacovigilance

Developing a Business Analytics Roadmap

Transcription:

A RoadMap to Data Science Dr. Geoffrey Malafsky CEO, Phasic Systems Inc. www.phasicsystemsinc.com 703-945-1378

2 About the Speaker Geoffrey Malafsky, Ph.D, Founder and CEO, former scientist Nanotechnology researcher (Naval Research Laboratory) Technology advisor and sleuth DARPA MEMS Situational Awareness via real-time information fusion Office of Naval Research MEMS Littoral sensors Dept of Energy: Nanotechnology dual use Applying science to difficult data challenges as consultant, analyst, system developer

3 What is Data Science? Latest in long line of hot IT topics IT follows Neil Young: It is better to burn out than it is to rust Data Science is different than past IT hot spots Science binds it to a well structured culture, procedures, and ethics Science is fundamentally rigorous in maintaining auditable, open lineage of data collection, data rationalization, data analysis, theory comparison, adjudicating possible scenarios, and making conclusions Data Science is not analytics, Business Intelligence, warehouse design, Big Data, Cloud whatever, Hadoop,.

4 Big or Small Data: It Is the Quality That Counts Social media analysis, Big Beautiful Data: See Our Social Exchange from Twitter to CNN, Kristina Farrah, 2April2012, http://siliconangle.com/blog/2012/04/02/big-beautiful-data-see-our-social-exchange-from-twitter-to-cnn/

5 Data Science As A Form of Science Study scientific method (Encyclopædia Britannica, Inc.) mathematical and experimental techniques employed in the natural sciences; more specifically, techniques used in the construction and testing of scientific hypotheses. Many empirical sciences, especially the social sciences, use mathematical tools borrowed from probability theory and statistics, together with such outgrowths of these as decision theory, game theory, utility theory, and operations research. Philosophers of science have addressed general methodological problems, such as the nature of scientific explanation and the justification of induction.

6 Data Science From A Practitioner Mike Loudikes, What is Data Science?, 2June2010, http:// radar.oreilly.com/ Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: here s a lot of data, what can you make from it?

7 Our Data Science Principles Data Science is the field applying the scientific method to data collection, management, analysis, and reporting as a single integrated environment for general business purposes Rely on well known and practiced methods of data collection, correction, integration, pedigree tracking, quality assurance, statistical analysis, model design and testing, tabular and graphical presentation, and visible traceability of conclusions through all analysis and conclusion steps Embrace uncertainty and transparency

8 Data Science Roadmap Understand what it is and is not (ignore the cacophony of charlatans and certificate mills) Identify high value insights (note not BI nor reports) to your C- executives that they want and can turn into action This makes Data Science applied instead of basic Start small; plan a big win; find a senior management champion; don t wait for organizational clearance (they are waiting for you to succeed or fail first); be prepared for significant resistance and civil disobedience (work around) Continuously communicate that the win is a win for everyone and no one has to give up control Package results in extremely pretty and informative visualizations (see Tufte for some of the best)

9 Foundations Data collection Multi-source: warehouse, external structured sets, unstructured high volume (email, social media), images, sensors, metadata Multi-format Raw versus refined and corrected Data rationalization Continuous cleaning, correcting, aligning, adjudicating Little errors grow exponentially; little garbage in à large garbage out

10 Foundations Data analysis Multi-technique: statistics, models, graphical, linear/non-linear equations Understand the scope, limits, and biases or each technique, especially statistics (be skeptical) Making conclusions You will likely be wrong 80% of the time this is a good thing Keep it to yourself until you challenge, probe, rebuke, debunk Make sure you can support every contention you make you hard facts and figures, or clear valid analysis steps Presenting results Show the main results as simply as possible Keep the interesting (to you) results and analyses as backup

11 Focus on Data Rationalization Most data environments are badly misaligned with semantically unknown relationships and value conflicts There will never be perfect data but you cannot even start doing analysis until you control your data and understand the good, bad, and untrusted Data Rationalization is the process of building and managing a continuously adaptive data environment that fuels current and future business needs for decision making and system operations

12

13

14 The Ψ KORS System Model Point-select data models, codes, entities

15 Corporate NoSQL

16 Different Meanings (Legal and Business Activities) NKY HomeSeekers Texas Example solution: 1. Create table title aligned to business = Garage 2. Create vocabulary for distinct use cases system, value analysis, business use = (spaces, spaces.description, spaces.national, spaces.state, listingservice,.) 3. Define ETL logic 4. Merge in warehouse and process in virtualization layer 5. Change as needed

17 Summary Data Science is new and exciting It is an excellent career opportunity for explorers with discipline and a continuous zeal for investigation and uncovering important new insights To succeed, the result must be important to a senior decision maker Get champion at beginning by making business case of big win for small investment Expect resistance and work to turn nay into yay with constant no one loses communication Use clear, concise attractive graphics to get people excited

18 More Look for in-depth learning webinar on Data Science and Data Rationalization New PSI-KORS Foundation will promulgate noncommercial use of Ψ KORS metamodel and Corporate NoSQL Contact us to bring success into your career and organization