True Analytics & Base-Band Visualization A Return to Tukey s Exploratory Data Analytics and Bloom s Taxonomy



Similar documents
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

This Symposium brought to you by

Data Warehousing and Data Mining in Business Applications

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Budgeting and Planning with Microsoft Excel and Oracle OLAP

CoolaData Predictive Analytics

Mohan Sawhney Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management

Application of Predictive Analytics for Better Alignment of Business and IT

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

Data Virtualization A Potential Antidote for Big Data Growing Pains

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Turn your information into a competitive advantage

High-Performance Analytics

Big Data & Netflix. Paul Ellwood February 9th, 2015

End Small Thinking about Big Data

Achieving Business Value through Big Data Analytics Philip Russom

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Using Business Intelligence to Achieve Sustainable Performance

VIEWPOINT. High Performance Analytics. Industry Context and Trends

The Scientific Data Mining Process

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Microsoft Business Intelligence

Integrating a Big Data Platform into Government:

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Structure of the presentation

XpoLog Center Suite Log Management & Analysis platform

ADVANCED DATA VISUALIZATION

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

CLUSTER ANALYSIS WITH R

UNIFY YOUR (BIG) DATA

Bringing Big Data Modelling into the Hands of Domain Experts

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

A Strategic Approach to Unlock the Opportunities from Big Data

[callout: no organization can afford to deny itself the power of business intelligence ]

Data2Diamonds Turning Information into a Competitive Asset

INVENTING THE FUTURE HITACHI DATA SYSTEMS BIG DATA ROADMAP MICHAEL HAY

SAP BusinessObjects Predictive Analysis. Transforming the Future with Insight Today

6 Steps to Faster Data Blending Using Your Data Warehouse

The Six A s. for Population Health Management. Suzanne Cogan, VP North American Sales, Orion Health

ANALYTICS STRATEGY: creating a roadmap for success

Migrating Discoverer to OBIEE Lessons Learned. Presented By Presented By Naren Thota Infosemantics, Inc.

BioVisualization: Enhancing Clinical Data Mining

The Enterprise Data Hub and The Modern Information Architecture

Bringing Big Data into the Enterprise

Business Intelligence

HOW TO DO A SMART DATA PROJECT

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Data Mining with Qualitative and Quantitative Data

SharePoint BI. Grace Ahn, Design Architect at AOS

Increase Revenue THE JOURNEY TO BIG DATA. Gary Evans. CTO EMC Ireland. Twitter.com/Gary3vans. Copyright 2013 EMC Corporation. All rights reserved.

Business Intelligence Solutions for Gaming and Hospitality

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

JDE Data Warehousing and BI/Reporting with Microsoft PowerPivot at Clif Bar & Company Session ID#:

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

BI360 Template Examples for Budgeting and Reporting. C o p y r i g h t - S o l v e r, I n c P a g e w w w. s o l v e r u s a.

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Big Data Volume, Velocity, Variability

Big Data Effects on Weather and Climate

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Advanced In-Database Analytics

Software that writes Software Stochastic, Evolutionary, MultiRun Strategy Auto-Generation. TRADING SYSTEM LAB Product Description Version 1.

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Transforming Trading Operations. Using Analytics to Drive Trading Strategy

Practical meta data solutions for the large data warehouse

The 4 Pillars of Technosoft s Big Data Practice

Oracle Real Time Decisions

ANALYTICS BUILT FOR INTERNET OF THINGS

Business Intelligence

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

PLANNING YOUR DASHBOARD PROJECT

CONCEPTUAL FRAMEWORK OF BUSINESS INTELLIGENCE ANALYSIS IN ACADEMIC ENVIRONMENT USING BIRT

HIGH PERFORMANCE ANALYTICS FOR TERADATA

Safe Harbor Statement

2015 Workshops for Professors

Big Data Integration: A Buyer's Guide

Intelligent Business Operations

Transcription:

True Analytics & Base-Band Visualization A Return to Tukey s Exploratory Data Analytics and Bloom s Taxonomy By James P. LaRue Overwhelmed with the analytics of all that data? Why YOU must reset the lost art of true analytics and lead back to leveraging data in its basic form AAS Instrument Electronics BA Mathematics and BA in Education MS Mathematics PhD Applied Science and Engineering Signal Processor and Data Scientist by Profession Proprietary Copyright Charter Global, Inc. 205 May 205

Outline Introducing YOUR Eco-System A hierarchical sales format (with Bloom intro) Where does Tukeys EDA enter Bloom s Taxonomy? It may surprise you A formal business and technology problem statement A sonobuoy big data example (it is equivalent to streaming IP) What do we mean by base-band visualization? We re not talking pie charts, but practical and meaningful pixel arrays Finding pattern within plasticity of s and 0s Revisit the business/tech problem, plus a Model/Simulation example The advantage to actually increasing the number of data points A table based problem in Excel Returning to YOUR Eco-System Edureka: Pause for educational advertisement The Charter Global strategic data analytics reset program True analytics and the round table Eco-system

Customer Activity Systems Architect & Security A proposed BD/BI question The BI/BD answer + ECO-derivatives Data Source Acquisitions and ETL The Eco-system of Data requires a base-set of thought provoking visualizations to initiate round-table discussions to drive cross-table observations to empower team consensus to draw-out winning derivatives Data QA-Post ETL/Pre Model Segment Extract and Model

Legacy Data Systems & New Big Data Systems

Hierarchical Sales Format & Bloom s Taxonomy of 956 Assess Current State Playbook Development Technology Forensics Develop Roadmap Infrastructure Support Vendor Stack Selection BD/BI User Trials Data Aggregation Analytics Demo Develop Augment Administer Foundation-Orientation Cursory Evaluation of Blueprint Big Data Architecture + Tools Implementation Analytics Team Actualize Launch & Yield Knowledge Comprehension Application Analysis Synthesis Future Aspirations Partnering and Planning Retained Agency of Record Evaluation

Bloom s Taxonomy & the Cognitive Domain + Tukey s Exploratory Data Analysis (EDA) Knowledge: assembling facts and making definitions about the data Comprehension: translate, interpret, extrapolate, organize the data Application: solve problems using knowledge + comprehension of the data using old models Analysis: break data into the elements, examine the pieces, generalize the data Fact: John Tukey introduced the term bit, the contraction of Binary Digit Synthesis: partition data elements into segments and apply old models or form new models Evaluation: present and defend what you think you KNOW about the data based on model Pie chart visualizations are for conveying knowledge, comprehension and evaluation of data Base-band visualization is for analyzing the raw-form elements of data in pixel form Formulas are for application and reference in evaluation Creativity lies in synthesis and applies pressure to evaluation http://en.wikipedia.org/wiki/bloom%27s_taxonomy/ http://en.wikipedia.org/wiki/john_tukey

Technology Side Business Side A Formal Business & Technology Solution Business Outcome: Oil company to address environmentalist concerns of disturbing whale habitat and feeding, breeding, and resting. X amount of Dollars available to look for solution. Premise : Underwater blasting for Seismic surveys affects habitat. Premise 2: Whales, and other cetaceans, naturally change habitats. Premise 3: Shipping traffic affects habitat domain. 2 3 4 5 Hypothesis to premise : Abrupt changes in pressure due to blasting damages the ears of the whale. Hypothesis to premise 3: Shipping noise affects whales ability to communicate. Problem Domain: How does changes in pressure link correlation between shipping traffic, seismic blasting, and whale movements? Develop Facets: Use exploitation techniques to uncover hidden attributes and then group. (K-means, higher moments, image Processing/computer vision) 4 2 3 5 Data Source: Sonobuoy recording 2000 pts/sec x 24 hrs = Gpts/ day

Base-Band Visualization Part One: 440 x 900 pixels is a lot of pixels, so let s use them

Base-Band Visualization Part Two: Color the elements 0.9 Colorbar ranges from 0 to 2 0.8 0.7 Given the code word elements: 0 0 3 4 5 0.6 0.5 0.4 0.3 6 7 0.2 0. 0.5 0.6 0.7 0.8 0.9..2.3.4.5 0

A little faster now Five Seven element Code words to 7x5 pixel matrix 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 2 0.8 0.7 3 0.6 4 0.5 5 0.4 0.3 6 0.2 7 0.5.5 2 2.5 3 3.5 4 4.5 5 5.5 0. 0

A 7x50 pixel matrix 00000000000000000000000 0000000000000000000000 000000000000000000000000000 0000000000000000000000000000 00000000000000000000000000 00000000000000000000000 0000000000000000000000000 0.9 2 0.8 0.7 3 0.6 4 0.5 5 0.4 0.3 6 0.2 7 0. 5 0 5 20 25 30 35 40 45 50 0

Finding Patterns in Patterns of s & 0s

Exercise in Pattern Digging 4 5 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A 000x000 pixel matrix 000 columns of 000 random numbers ranging -5 to +5,000,000 unique colors being displayed. Hello 00 200 300 400 500 4 3 2 0 600 700 - -2 800-3 900-4 000 00 200 300 400 500 600 700 800 900 000-5

Return to the Sonobuoy Example with Tukey s EDA We took the,000,000,000 acoustic sonobuoy points, transformed a little, and formed a data pool matrix of 000 x 8000 elements. At a high level, the information appears uniform. However, from the blue data pool of elements, signal processing uncovers several underlying structures. (buoy carrier, oil explorations, ships, storms, calm seas). These structures form the new elements. Thus from one data source, we form several more data pools. This segmentation is presented to the Eco-system, to initiate round-table discussions, to drive crosstable observations, to empower team consensus.

Why look at two simple plots when you can look at 300 simultaneously? (3-30 MHz by increments of.) Path Loss db 50 Path Loss db 40 00 20 50 00 Sea State 3 @ 28 MHz Sea State 3 @ 6 Mhz 0 0 50 00 50 Nautical miles 200 80 250 300 60 Path Loss db 40 200 00 0 0 300 250 5 200 0 MATLAB 50 5 Frequency 3-30 MHz 00 20 50 25 30 0 Nautical Miles

A Database Example that Moved from Row Entry to Time Domain 000 customers were recorded for Open/Close door activity over 28 days. during the day. Activity ranged 50-750 door Open (gold)/close (blue) total activities per customer. We expanded the table to form a uniform time scale of 00 time slots per day per home. i.e., 2800 time slots for each of the 000 customers. Took spreadsheet of ~78,000 lines of feature events Customer Engineered time domain to visualize as 2800x000 matrix Day Applied a cascade of discovery transforms Day 28 Presented the 2,800,000 events in discovery framework to BI team Red box: 40% of customers did not have device installed properly Green Box: 30% had late starts Yellow box: Data Warehouse dropped 30 hours of (paid for) recorded data Analytics at this fundamental level is a section of QA

Base-Band Visualization of Analytics Invites a Roundtable Approach 2. ETL asks Data Warehouse For activity on 000 customers. DW returns 78,000 table entries Customer Activity. BD task - work schedule -6: Eco-System Derivatives Architecture/Data Storage DW purchase lapse ETL Data Source Consistency Modeling 20% valid segment BI 24 Hr. Home Habits BD Ask Techs to check sensors 6. BD Solution Work Schedule 8:45 AM to 5:30 PM 7:00 pm 6:59 pm 3. Engineer a structured visualization 4. Signal Processing to see what you have or thought you had 5. Modeling & Simulation solution with what you have

Edureka!! Others that are honing in on EDA and Visualization From the Computation Institute (University of Chicago/Argonne National Labs) and AT&T Labs https://www.ci.uchicago.edu/blog/new-algebra-data-visualization and https://www.ci.uchicago.edu/blog/new-computational-commons-cancer-genomic-data An Algebraic Process for Visualization Design by Kindlmann and Scheidegger (204), http://algebraicvis.net/assets/vis204_talk_slides.pdf Data Mining Challenges for Digital Libraries by founder of Open Data Group, Robert Grossman. Back in 996 he mentions three principle purposes for Visual Analytics: anomaly checks, Tukey s EDA, and checking model assumptions. John W. Tukey wrote the book "Exploratory Data Analysis" in 977 From to Data Visualization Innovation Summit, April 205, San Jose, Elijah Meeks, Senior Data Visualization Engineer at Netflix, presented, Beyond Line and Pie Charts: Practical Applications of Complex Data Viz https://www.codeshowse.com/ Charleston, SC May 205, with keynote speaker Jeff Hammerbacher of Cloudera presenting his work with Big Data and predicting the process and treatment of disease.

The Charter Global Strategic Data Analytics Reset Program True Analytics & the Roundtable Eco-System BEFORE YOU START your investment path (take a step back) DEFINE THE GAME Your Business Development Directive (keep it purposely loose) GET TO KNOW your BI/BD/ETL/Mod/Dev team (collective or stove-piped) ESTABLISH ACCESS TO your Big Data Repository (costly and ad-hoc deck of cards) Call in CGI to set the odds to success Base-band visualization (show what s in the deck) Now, call in your players and STAND BACK AND LEAD

True Analytics & Base-Band Visualization A Return to Tukey s EDA and Bloom s Taxonomy