Using Normalized Status Change Events Data in Business Intelligence. Mark C. Cooke, Ph.D. Tax Management Associates, Inc.

Similar documents
Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Making Good Use of Data at Hand: Government Data Projects. Mark C. Cooke, Ph.D. Tax Management Associates, Inc.

Querying Microsoft SQL Server 2012

MicroStrategy Course Catalog

Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment

Harnessing Big Data with KNIME

uncommon thinking ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION ONSITE TRAINING OUTLINES

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

Anomaly Detection and Predictive Maintenance

SQL Server 2012 Business Intelligence Boot Camp

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

NEXT Analytics Business Intelligence User Guide

Querying Microsoft SQL Server 2012

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

Course 10774A: Querying Microsoft SQL Server 2012

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Implementing a Data Warehouse with Microsoft SQL Server

Implementing Data Models and Reports with Microsoft SQL Server

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

Querying Microsoft SQL Server (20461) H8N61S

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

What s Cooking in KNIME

Oracle Data Miner (Extension of SQL Developer 4.0)

LearnFromGuru Polish your knowledge

Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.

90% of your Big Data problem isn t Big Data.

Implementing Data Models and Reports with Microsoft SQL Server

Saskatoon Business College Corporate Training Centre

Power Tools for Pivotal Tracker

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data. Data and database. Aniel Nieves-González. Fall 2015

Course ID#: W 35 Hrs. Course Content

Implementing a Data Warehouse with Microsoft SQL Server

Querying Microsoft SQL Server 20461C; 5 days

Sterling Business Intelligence

Advanced In-Database Analytics

SAP BOBJ. Participants will gain the detailed knowledge necessary to design a dashboard that can be used to facilitate the decision making process.

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data processing goes big

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Oracle BI 11g R1: Create Analyses and Dashboards

LEARNING SOLUTIONS website milner.com/learning phone

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

GEHC IT Solutions. Centricity Practice Solution. Centricity Analytics 3.0

CONCEPTUAL FRAMEWORK OF BUSINESS INTELLIGENCE ANALYSIS IN ACADEMIC ENVIRONMENT USING BIRT

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Sterling Business Intelligence

Implementing a Data Warehouse with Microsoft SQL Server 2012

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

SPSS Modeler Integration with IBM DB2 Analytics Accelerator

SharePoint Designer 2013 vs Workbox. Prepared by JMS

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Implementing a Data Warehouse with Microsoft SQL Server

Analytics Canvas Tutorial: Cleaning Website Referral Traffic Data. N m o d a l S o l u t i o n s I n c. A l l R i g h t s R e s e r v e d

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Copyright Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement

DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING

OBIEE DEVELOPER RESUME

GoodData. Platform Overview

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

VMware vcenter Operations Manager Enterprise Administration Guide

KNIME opens the Doors to Big Data. A Practical example of Integrating any Big Data Platform into KNIME

In-Database Analytics

INTRODUCTION: SQL SERVER ACCESS / LOGIN ACCOUNT INFO:

Business Intelligence with Excel Excel new features for reporting and data analysis

Data Integrator: Object Naming Conventions

Business Intelligence & Product Analytics

Big Data Management and Security

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

<no narration for this slide>

Introduction to Querying & Reporting with SQL Server

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

KNIME Enterprise server usage and global deployment at NIBR

Foundations of Business Intelligence: Databases and Information Management

Module 1: Getting Started with Databases and Transact-SQL in SQL Server 2008

Foundations of Business Intelligence: Databases and Information Management

Best Practices for Hadoop Data Analysis with Tableau

Data Mining in the Swamp

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL

Data warehousing/dimensional modeling/ SAP BW 7.3 Concepts

Creating Dashboards for Microsoft Project Server 2010

P.I. e C.F.: When useful is nice

Data Mining & Data Stream Mining Open Source Tools

Open source framework for data-flow visual analytic tools for large databases

USING SOFTWARE TO SPEED-UP YEAR-END ACCOUNTS WORK. Part 2: Reviewing & correcting clients data

KNIME & Avira, or how I ve learned to love Big Data

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Conventional BI Solutions Are No Longer Sufficient

Transcription:

Using Normalized Status Change Events Data in Business Intelligence Mark C. Cooke, Ph.D. Tax Management Associates, Inc.

A note to the audience TMA is a company that serves state and local government, providing auditing and fraud detection services I am an anthropologist by education, which means I love behavior and meaning (data as a referent and remnant). By business experience, I am involved in data analysis, solving problems with data transformations, aggregations, and puzzles of all sorts and shapes

The Solution Thanks to Rosaria Silipo for making this a complete KNIME solution. By providing a Java Snippet node suggestion she resolved two problems including relying on an external R script. Bravissimo!

The Problem Software developers like to build software that serves a purpose The database (MySQL) that drives the software is designed to follow two principles: It is normalized to enhance functionality It stores data that helps serve the purpose of the product in other words, a lot about the product, not so much about how the software is used

The Real Problem Management likes to know what s going on In order to know, they need to know not just what is happening, but who is doing it, how it is happening, how long it is taking, etc. Management is concerned with three things: How many widgets did we make in this period? How many resulted from each person s efforts? How much work did each person do, and was it efficient?

Some Clues The application in question uses a state machine model to force widgets down a production line, metaphorically Every user initiated state change in the system is recorded as a normalized instance of who did it, when they did it, and what state it was changed from and to:

The States, generally New Widget Collection of Process States Good Widget Bad Widget Quality Review Process Approved Widget Review Widget Client Accepts Client Rejects Rejected Widget

More Problems A widget (single entity) may be deemed good multiple times over it s life cycle (same state) by different people within the same time period (even with the same day) Therefore, a query of the number of good widgets will always return more than the unique number of widgets with a good state Widgets can go from good to bad to good again, or just bad to good or good to bad We still want to know who is doing what and where all of these state changes are coming from

Crafting a Solution First Step: Recognize the resource, i.e. the state table and the who, what, when Second Step: Recognize the limitations of a normalized database structure so that SQL querying was inopportune KNIME is the best possible resource because it allows a functional workflow that can be integrated and run on demand to produce business intelligence when required In our case, this report is run at minimum once a week, sometimes more frequently

KNIME Highlights 1. File read from copy of table 2. Column names cleaned 3. Date Extracted to KNIME format 4. Sort by ACCOUNT + KDATE as the ORDER BY operation 5. Use the Java Snippet to add an EVENT as a new column

KNIME Highlights A Pivoting Node allows us to take the data from long to wide or normalized to de-normalized table structure very powerful as now we have the opportunity to complete many analyses including time series, association pattern mining, etc. We are interested here in looking for widgets made and widgets approved by individuals, two analyses which we can rejoin at the end by employee name and for any given period Rule Engines do a lot of the heavy work in this branch to organize the data and reduce it to what we need

Rule Engines Used to logically review the de-normalized data to find states of interest that either never shifted out of that state, only progressed forward, or at least ended up in the state of interest with no further changes They are then also used to pluck out the WHO column entry and the KDATE for each line of data which is the last entry in the time series to meet the logical condition

Second Workflow Pivoting off the long data grouping by ACCOUNT and pivoting on TO so that we have a COUNT BY states changed to by state name The Rule Engine is used to create a column of WHAT? types identifying accounts which were declared good more than once or went good to bad Filtering by types and joining with the wide table now gives me a class of objects with their history for analyses (cf. patterns)

Third Workflow Branching from the long data structure Reduce data down to previous week The Pivoting Node is used again, but this time grouping by WHO and pivoting on TO achieving a table with counts by staff by state change made

What We Achieve Happy managers, hooray! (okay, that depends on what the numbers actually are ) Measure of product velocity the speed at which an account moves from new to billable Measure of project velocity the speed at which a client moves from start to finish Employee Productivity Individual and Period on Period Review of behavioral patterns which might be indicative of a bottleneck, systemic issue, or possible enhancement Eventually analyses of predictive or interpretive analytics related to account success measures (survival and time

Thank you! If you have any ideas on how this process could be improved, or would like to discuss how it was possible now is the time! If you would like to contact me in the future: Email: Mark.Cooke@tma1.com Skype: markccooke LinkedIn: http://lnkd.in/vyxfaw