On Generating High InfoQ with Bayesian Networks

Similar documents
on doing fine work in statistics Old Huxley Building

Bayesian Networks of Customer Satisfaction Survey Data

Lecture. Simulation and optimization

The Performance Implications and Success Factors of Channel Design and Channel Management on B2B Markets

1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some

Master of Science in Marketing Analytics (MSMA)

CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CII Institute of Logistics

WHITE PAPER Speech Analytics for Identifying Agent Skill Gaps and Trainings

Sample Reporting. Analytics and Evaluation

AN ELECTRONIC LEARNING ENVIRONMENT FOR APPLIED STATISTICS: QUALITY CARE AND STATISTICS EDUCATION IN HIGHER EDUCATION

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Evaluation of Machine Learning Techniques for Green Energy Prediction

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Big Data Strategies Creating Customer Value In Utilities

TEACHING OF STATISTICS IN KENYA. John W. Odhiambo University of Nairobi Nairobi

MSCA Introduction to Statistical Concepts

How To Understand Customer Satisfaction In Auto Insurance

START Selected Topics in Assurance

FORRESTER CONSULTING INTERNET OF THINGS SURVEY - KEY FINDINGS. Building Value from Visibility: 2012 Enterprise Internet of Things Adoption Outlook

Why do statisticians "hate" us?

AP CALCULUS AB 2007 SCORING GUIDELINES (Form B)

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Quantitative methods in marketing research

Bridging Branding and Strategic Planning A Practical Toolkit

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES

STRATEGIC COST MANAGEMENT ACCOUNTING INSTRUMENTS AND THEIR USAGE IN ALBANIAN COMPANIES ABSTRACT

Testing Random- Number Generators

TEXT ANALYTICS INTEGRATION

RESEARCH METHODS IN I/O PSYCHOLOGY

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Supply chain intelligence: benefits, techniques and future trends

Statistics Graduate Courses

Introduction to Business Intelligence

Challenges and Implementation of Effort Tracking System

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

ROOT: A data mining tool from CERN What can actuaries do with it?

SAP FINUG Teknologiaseminaari

Week 1: Functions and Equations

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

Computer Graphics AACHEN AACHEN AACHEN AACHEN. Public Perception of CG. Computer Graphics Research. Methodological Approaches

Computation of the Aggregate Claim Amount Distribution Using R and actuar. Vincent Goulet, Ph.D.

RESEARCH METHODS IN I/O PSYCHOLOGY

A Comparison of Novel and State-of-the-Art Polynomial Bayesian Network Learning Algorithms

Monte Carlo analysis used for Contingency estimating.

Outsourcing Survey March 2012

Practice with Proofs

An Assessment of the Performance Evaluation System Used to Evaluate Teachers in Secondary Schools in Meru Central District-Kenya

M E M O R A N D U M. Faculty Senate Approved April 2, 2015

Measurement and measures. Professor Brian Oldenburg

Developing a Clinical Nurse Leader Practice Model

JYSK ANALYSE A/S A full service research company

Beyond CTMS. Beyond Operational Metrics. True Performance.

The 2013 Superannuation Consumer Recommendation & Loyalty Study

Deriving Value From Big Data Visual, Predictive, GeoLocation and Event Analytics

Channel Incentive Travel: A Case Study

Customer Experience Management

The Big Data Deluge: Creating Serious Business Problems. Analytics: Harnessing Big Data Deluge to Acquire Business Power

A Study on Customer Satisfaction in Mobile Telecommunication Market by Using SEM and System Dynamic Method

Vaciado de artículos. Journal of marketing research , v. 50, n. 4, august, p

Ghassan R. Odeh 1 & Hamad R. Alghadeer 2

CONTENT CONNECTIVITY COLLABORATION

MSCA Introduction to Statistical Concepts

LISTEN TO THE VOICE OF CUSTOMER EXPERIENCE

FINDING BIG PROFITS IN THE AGE OF BIG DATA

Realizing the Full Value of GHX. A GHX Education Paper for Healthcare Supply Chain Professionals

Banking Analytics Training Program

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

2WB05 Simulation Lecture 8: Generating random variables

8 9 +

EcoPinion Consumer Survey No. 19 The Conflicted Consumer Landscape in the Utility Sector. May 2014

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

PS 271B: Quantitative Methods II. Lecture Notes

20 Best Practices for Customer Feedback Programs Building a Customer-Centric Company

Classification and Prediction

The Next Frontier in CRM Analytics Oracle Transactional Business Intelligence Enterprise for CRM Cloud Service

Does Trust Matter to Develop Customer Loyalty in Online Business?

OPTIMIZED STAFF SCHEDULING AT SWISSPORT

Linking Employee Satisfaction, Employee Engagement, and Employee Ambassadorship Session 1: Ambassadorship Concept/Framework Introduction and Rationale

Transcription:

On Generating High InfoQ with Bayesian Networks Ron S. Kenett Research Professor, University of Turin Chairman and CEO, KPA Ltd. ron@kpa-group.com Based on joint work with Galit Shmueli, Silvia Salini and Federica Cugnata 1

Applied statistics is about meeting the challenge of solving real world problems with mathematical tools and statistical thinking 2

Mathematical Tools Statistical Thinking 3

Mathematical Tools Statistical Thinking Real World Problems 4The mathematical statistician

Mathematical Tools Statistical Thinking Real World Problems 5The proactive statistician, G. Hahn

Presentation topics Bayesian networks Information Quality Robustness of network Sensitivity of predictions information knowledge numbers data findings statistical analysis Examples from: Customer surveys Risk management of telecom systems Monitoring of bioreactors Managing healthcare of ESRD patients Kenett, Ron S., Applications of Bayesian Networks (2012). Available at SSRN: http://ssrn.com/abstract=2172713 6or http://dx.doi.org/10.2139/ssrn.2172713

Information Quality (InfoQ) The potential of a particular dataset to achieve a particular goal using a given empirical analysis method g X f U A specific analysis goal The available dataset An empirical analysis method A utility measure InfoQ(f,X,g) = U( f(x g) ) Kenett, R.S. and Shmueli, G. (2014) On Information Quality, Journal of the Royal Statistical Society, Series A 7(with discussion), Vol. 177, No. 1, pp. 3-38, 2014. http://ssrn.com/abstract=1464444.

Domain Space Knowledge Analytic Space Information Quality Bayesian networks InfoQ Robustness of network Sensitivity of predictions Goals InfoQ(f,X,g) = U(f(X g)) 1. Data resolution 2. Data structure 8 Data Quality Primary Data Analysis Quality Secondary Data - Experimental - Experimental - Observational - Observational 3. Data integration 4. Temporal relevance 5. Chronology of data and goal 6. Generalizability 7. Operationalization 8. Communication

f = Bayesian Networks Judea Pearl 2011 Turing Medalist 9

10 Causal calculus, Counterfactuals, Do calculus, Transportability, Missing data models, Causal mediation, Graph mutilation, External validity, Causality in missingness,.

Customer Surveys Behaviors Recommendation Loyalty Loyalty Index Attitudes & Perceptions Experiences & Interactions Overall Satisfaction Perceptions Equipment and System Sales Support Technical Support Training Supplies and Orders Software Add On Solutions Customer Website Purchasing Support Contracts and Pricing System Installation 11

Customer Surveys Goals Goal 1. Goal 2. Goal 3. Goal 4. Goal 5. Goal 6. Goal 7. Goal 8. Goal 9. Goal 10. Decide where to launch improvement initiatives Highlight drivers of overall satisfaction Detect positive or negative trends in customer satisfaction Identify best practices by comparing products or marketing channels Determine strengths and weaknesses Set up improvement goals Design a balanced scorecard with customer inputs Communicate the results using graphics Assess the reliability of the questionnaire Improve the questionnaire for future use 12

Bayesian Network Analysis of Customer Surveys 13 Kenett, R.S. and Salini S. (2009). New Frontiers: Bayesian networks give insight into surveydata analysis, Quality Progress, pp. 31-36, August.

14

20% BOT12 Set up improvement goals 15

39% BOT12 Set up improvement goals 16

13% BOT12 Set up improvement goals 17

Robustness and Sensitivity Analysis of Bayesian Networks How are the factors in the model affecting the target variable levels? Which factors affect customer satisfaction and how How are factors affecting the distance between conditioned distributions and observed data? Which factors will have the largest impact if changed 18

Network learning Robustness of Network bnlearn: 1) Grow-Shrink (GS) algorithm, 2) Incremental Association (IAMB) algorithm, 3) Interleaved-IAMB (Inter-IAMB) algorithm, 4) Fast-IAMB (Fast-IAMB) algorithm, 5) Max-Min Parents and Children (MMPC) algorithm, 6) ARACNE and Chow-Liu algorithms, 7) Hill- Climbing (HC) greedy search algorithm, 8)Tabu Search (TABU) algorithm, 9) Max-Min Hill-Climbing (MMHC) algorithm and 10) two-stage Restricted Maximization (RSMAX2) algorithm for both discrete and Gaussian networks 19

20 Robustness Analysis

Network estimation Sensitivity to Driver Variables Generate a full factorial experiments based on driver combinations. Analyse the target variable thus generated (overall satisfaction) with respect to the observed one, in order to study the effect that each driver combinations has on its variability. 21

Sensitivity Analysis 50,0% 45,0% Target variable: Satisfaction 44,0% 40,0% 35,0% 30,0% 25,0% 26,4% 20,0% 15,0% 10,0% 5,0% 4,6% 9,7% 15,3% 0,0% 1 2 3 4 5 22

Sensitivity Analysis 1000 simulated target variable using the BN estimated on the observed dataset. Distribution of simulated target variable for each level of each variable Equipment Sales Technical Support Mean 2.995 3.005 3.015 Mean 2.995 3.005 3.015 Mean 2.995 3.005 3.015 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Impact of factors on target variable Suppliers Administrative Support Terms and Conditions Mean 23 2.995 3.005 3.015 1 2 3 4 5 Mean 2.995 3.005 3.015 1 2 3 4 5 Mean of simulate target variable for each level of each variable (ABC) Mean 2.995 3.005 3.015 1 2 3 4 5

Sensitivity Analysis Compare the observed distribution with the simulated one (1000 runs) using Kolmogorov-Smirnov test Equipment Sales Technical Support Mean of KS 0.256 0.260 0.264 Mean of KS 0.256 0.260 0.264 Mean of KS 0.256 0.260 0.264 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Impact of factors on conditioned target variable distribution Suppliers Administrative Support Terms and Conditions Mean of KS 24 0.256 0.260 0.264 Mean of KS 0.256 0.260 0.264 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Mean of Kolmogorov-Smirnov for each level of each variable (ABC) Mean of KS 0.256 0.260 0.264

Information Quality (InfoQ) of Integrated Analysis f(f 1,f 2, f n ) Models f 1 f 2 f 3 f 4. N f Goals g 1 4 g 2 2 g 3 2 g 4 1 N g 1 2 3 3 Kenett R.S. and Salini S. (2011). Modern Analysis of Customer Surveys: comparison of models and integrated 25 analysis, with discussion, Applied Stochastic Models in Business and Industry, 27, pp. 465 475

Goal BN CUB Rasch CC 1 Decide where to launch improvement initiatives 2 Highlight drivers of overall satisfaction 3 Detect positive or negative trends in customer satisfaction 4 Identify best practices by comparing products or marketing channels 5 Determine strengths and weaknesses 6 Set up improvement goals 7 Design a balanced scorecard with customer inputs 8 Communicate the results using graphics 9 Assess the reliability of the questionnaire 10 Improve the questionnaire for future use Kenett R.S. and Salini S. (2011). Modern Analysis of Customer Surveys: comparison of models and integrated 26 analysis, with discussion, Applied Stochastic Models in Business and Industry, 27, pp. 465 475

Goal 1: Identify causes of risks that materialized Goal 2: Design risk mitigation strategies Goal 3: Provide a risk management dashboard 27

28 Bayesian network of communication network data

Conditioned BN Experimental Array Robustness of Prediction 29

Conditioned BN Experimental Array Robustness of Prediction Classification error runs i 1 I ( ( x) s s ) i i 30 BAYESIAN NETWORKS: ROBUSTNESS AND SENSITIVITY ISSUES By Federica Cugnatay, Ron Kenetty,z and Silvia Salini http://users.unimi.it/salini/rsbn/data.zip http://users.unimi.it/salini/rsbn/rcodes.zip http://users.unimi.it/salini/rsbn/materials.html

Analysis: Main effect plots of mean GoF Main Effects Plot (data means) for GoF mean 61.0 Lines Extensions 60.5 60.0 Mean of GoF mean 59.5 61.0 60.5 60.0 59.5 1 1 2 2 3 Smart phones 3 4 5 4 6 1 2 3 Analyzing the mean GoF for each factor, the most robust levels are Lines=3, Smart Phone=3, and Extensions=3 31 4 5

Analysis: Main effect plots of mean GoF 5 4 Contour Plot of GoF mean vs Extensions, Lines GoF mean < 59.0 59.0-59.5 59.5-60.0 60.0-60.5 60.5-61.0 > 61.0 Extensions 3 Hold Values Smart phones 3.5 2 1 1.0 1.5 2.0 2.5 Lines 3.0 3.5 4.0 A central number of lines and a central level for extension is associated with a low mean GoF 32

Analysis: Main effect plots of mean GoF Smart phones Contour Plot of GoF mean vs Smart phones, Lines 6 5 4 3 GoF mean < 59 59-60 60-61 61-62 > 62 Hold Values Extensions 3 2 1 1.0 1.5 2.0 2.5 Lines 3.0 3.5 4.0 A limited number of lines and a limited level for smart 33 phone is associated with a low mean GoF

Analysis: Main effect plots of mean GoF Contour Plot of GoF mean vs Smart phones, Extensions Smart phones 6 5 4 3 GoF mean < 58.5 58.5-59.0 59.0-59.5 59.5-60.0 60.0-60.5 60.5-61.0 61.0-61.5 > 61.5 Hold Values Lines 2.5 2 1 1 2 3 Extensions 4 5 A limited level both for extension and smart phone is associated 34 with a low mean GoF

Knowledge Goals Information Quality InfoQ(f,X,g) = U(f(X g)) Information Quality Data Quality Primary Data Analysis Quality Secondary Data - Experimental - Experimental - Observational - Observational g X f U A specific analysis goal The available dataset An empirical analysis method A utility measure 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance What How 5. Chronology of data and goal 6. Generalizability 7. Operationalization 8. Communication 35

Big Data Analytics InfoQ(f,X,g) = U(f(X g)) 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Chronology of data and goal 6. Generalizability 7. Operationalization 8. Communication Russom, P., Big Data Analytics, TDWI Best Practices Report, Q4 2011 36

Knowledge 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Chronology of data and goal 6. Generalizability 7. Construct operationalization 8. Communication Data Quality Information Quality Analysis Quality Goals Bayesian Network 37

Thank you for your attention What did we cover? Statistical thinking Bayesian networks Information Quality (InfoQ) Robustness of network Sensitivity of predictions