on doing fine work in statistics Old Huxley Building



Similar documents
On Generating High InfoQ with Bayesian Networks

Information Visualization WS 2013/14 11 Visual Analytics

How To Understand Business Intelligence

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Common Situations. Departments choosing best in class solutions for their specific needs. Lack of coordinated BI strategy across the enterprise

a Data Science Univ. Piraeus [GR]

Reinventing Business Intelligence through Big Data

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Information Management course

Business Intelligence with SharePoint 2010

CHAPTER 1 INTRODUCTION

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

TEXT ANALYTICS INTEGRATION

Core Curriculum to the Course:

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Dong-Joo Kang* Dong-Kyun Kang** Balho H. Kim***

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

"The performance driven Enterprise" Emerging trends in Enterprise BI Platforms

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Cis330. Mostafa Z. Ali

Analytics on Big Data

A RoadMap to Data Science. Dr. Geoffrey Malafsky CEO, Phasic Systems Inc

KEY KNOWLEDGE MANAGEMENT TECHNOLOGIES IN THE INTELLIGENCE ENTERPRISE

A Conceptual Approach to Data Visualization for User Interface Design of Smart Grid Operation Tools

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Food & Beverage Industry Brief

COURSE SYLLABUS COURSE TITLE:

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

ENTERPRISE BI AND DATA DISCOVERY, FINALLY

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

Real World Application and Usage of IBM Advanced Analytics Technology

GRAPHING DATA FOR DECISION-MAKING

Hunting for the Undefined Threat: Advanced Analytics & Visualization

DATA TRANSPARENCY TOWN HALL MEETING

Management Accountants and IT Professionals providing Better Information = BI = Business Intelligence. Peter Simons peter.simons@cimaglobal.

Dynamic Visualization and Time

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

So Many Tools, So Much Data, and So Much Meta Data

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Ignite Your Creative Ideas with Fast and Engaging Data Discovery

ATLAS.ti: The Qualitative Data Analysis Workbench

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Employee Survey Analysis

Data Warehousing and Data Mining in Business Applications

The Value of Visualization for Understanding Data and Making Decisions

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Why do statisticians "hate" us?

Big Data and Healthcare Payers WHITE PAPER

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

DATA VISUALIZATION AND DISCOVERY FOR BETTER BUSINESS DECISIONS

INVESTOR PRESENTATION. Third Quarter 2014

Course/Seminar Gilbert Ritschard Wednesday 10h15-14h M-5383 Anne-Laure Bertrand (Ass)

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Using Big Data Analytics to

Agile BI The Future of BI

Business Intelligence & IT Governance

Global Headquarters: 5 Speen Street Framingham, MA USA P F

A Review of Data Mining Techniques

ADVANCED DATA VISUALIZATION

Business Intelligence: Effective Decision Making

Microsoft Business Intelligence solution. What makes Microsoft BI difference

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

An Overview of Knowledge Discovery Database and Data mining Techniques

Visual Analytics: Empower Your Organization through Interactive Data

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

Undergraduate Psychology Major Learning Goals and Outcomes i

Concept and Applications of Data Mining. Week 1

IBM Big Data in Government

BI Platforms User Survey, 2011: Customers Rate Their BI Platform Vendors

Is Business Intelligence an Oxymoron?

Understanding the Value of In-Memory in the IT Landscape

Business Intelligence Solutions for Gaming and Hospitality

Implementing Business Intelligence at Indiana University Using Microsoft BI Tools

UNIFY YOUR (BIG) DATA

Microsoft End to End Business Intelligence Boot Camp

Chapter 1 DECISION SUPPORT SYSTEMS AND BUSINESS INTELLIGENCE

Redefining Role of Business Analyst in the paradigm of Big Data in Healthcare

Technology and Trends for Smarter Business Analytics

Research Methods: Qualitative Approach

Computational Modeling and Simulation for Learning an Automation Concept in Programming Course

Transcription:

on doing fine work in statistics Old Huxley Building RSS 2013 1

"Much fine work in statistics involves minimal mathematics; some bad work in statistics gets by because of its apparent mathematical content. David Cox (1981), Theory and general principle in statistics, JRSS(A), 144, pp. 289-297. 2

Wordle of Ron Kenett CV 3

The challenge of solving real world problems with mathematical tools and statistical thinking 4

Mathematical Tools Statistical Thinking 5

Mathematical Tools Statistical Thinking Real World Problems The mathematical statistician 6

Mathematical Tools Statistical Thinking Real World Problems The proactive statistician, G. Hahn 7

Mathematical Tools Statistical Thinking 8

Warning We do not teach tools and methods for doing that This is not as the analyst of a single set of data, nor even as the designer and analyzer of a single experiment, but rather as a colleague working with an investigator throughout the whole course of iterative deductive-inductive investigation. 9

Problem elicitation RLD CED PSE InfoQ 1 2 3 4 5 Are we generating knowledge? (InfoQ) Data integration (ETL, data fusion) Problem elicitation (cognitive science) Visualization and data exploration (HMI) Statistical Thinking Causality (CS, BN) Are we making an impact? (PSE) Statistical education (concept science) Confidentiality Integrated Models Unstructured data (semantic data, networks) Reproducible research (Sweave, CFR Part11) Other Disciplines 10

1 Problem Elicitation Greenfield, T. (1987) Consultant's cameos: A chapter of encounters. pp 11-25 in Hand, D.J. and B.S. Everitt eds, The statistical consultant in action, Cambridge University Press 11

1 Problem Elicitation The teacher acts as a mentor in training the student in unstructured problem solving while the computer stores and retrieves information. This sets the mind free to do what it does best be inductively creative (Box, 1997) Most iterative investigations involve multi-dimensional learning: in particular, how to study the problem is a necessary precursor to how to solve the problem (Box, 2000) The Theory of Applied Statistics Kenett, R.S. (2012) A Note on the Theory of Applied Statistics. Available at SSRN: http://ssrn.com/abstract=2171179 12 or http://dx.doi.org/10.2139/ssrn.2171179

2 Visualization: Some (old) references The Visual Display of Quantitative Information, E. Tufte, Graphics Press, 1983. Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods, W. Cleveland and R. McGill, Journal of the American Statistical Association, 79 (387):531-554, 1984. Graphs in Scientific Publications, W. Cleveland, The American Statistician, 38 (4):261-269, 1984. How to Display Data Badly, H. Wainer, The American Statistician 38(2):137-147, 1984. Data-Based Graphics: Visual Display in the Decades to Come, J. Tukey, Statistical Science, 5(3): 327-339, 1990. Communicating Statistics, T. Greenfield, Journal of the Royal Statistical Society. Series A (Statistics in Society), 156 (2): 287-297, 1993. 13

2 Business Intelligence and Analytics Platforms Market Segment The dominant theme of the market in 2012 was that data discovery became a mainstream BI and analytic architecture. The market also saw increased activity in real time, content and predictive analytics. Gartner, 2013 14

2 Gartner s Magic Quadrant for Business Intelligence and Analytics Platforms 5 February, 2013. Analyst(s): K. Schlegel, R. Sallam, D/ Yuen, J. Tapadinhas 15

2 Obama Do they interact? Depression 16

2 The Simplex Depression RHS ^RHS LHS x 1 x 2 x 3 Bishop, Fienberg, & Holland, 1975 ^LHS x 3 x 4 x 1 D = x 1 x 4 x 2 x 3 = 0 Obama x 2 x 4 4 xi i i 1 1, 0 x, i 1...4.

2 not obama, depression Kenett, R. "On an Exploratory Analysis of Contingency Tables" Journal of the Royal Statistical Society, Series D, 32, pp. 395-403, 1983 obama, depression Dynamic simplex representation of two way interaction obama, not depression not obama, not depression 18

2 Linkage Disequilibrium D x x x x 4 2 3 x fg D 1 x2 (1 f ) g D x3 f (1 g) D x4 (1 f )(1 g) D f x x RHS ^RHS LHS x 1 x 2 ^LHS x 3 x 4 1 3 4 xi i i 1 g x x 1, 0 x, i 1...4. 1 2 D can be extended to more dimensions

2 Linkage Disequilibrium X f g De e where X ( x, x, x, x ) 1 2 3 4 f ( f,1 f ) g ( g,1 g) e (1, 1) 4 xi i i 1 f x x 1, 0 x, i 1...4. 1 3 g x x 1 2 An algebraic observation

2 Relative Linkage Disequilibrium D is the distance from the point corresponding to the contingency table in the simplex, to the surface D=0 in the e e direction. RLD D D M D M is the distance from the point corresponding to the contingency table on the surface D=0 in the e e direction, to the surface of the simplex, in that direction. If D 0 then if x3 x2 D then RLD D x D else RLD D x 3 2 else if x1 x4 D then RLD D x D else RLD D x 1 4

2 RLD D D M D 0 Kenett, R. and Salini, S., "Relative Linkage Disequilibrium Applications to Aircraft Accidents and Operational Risks". Trans. on Machine Learning and Data Mining, 1,(2), pp. 83-96, 2008. Implemented in arules R Package. Version 0.6-6, Mining Association Rules and Frequent Itemsets D M D

2 1998

2

2

2

2 1998

3 (ARL 0 and ARL 1 ) or (PFA and CED)? 28

3 (ARL Run charts of Y1, Y2, Y3, Y4 0 and ARL 1 ) or (PFA and CED)? 2 4 6 8 10 Y1 Y2 0.5σ 1σ 12 14 16 18 20 20 15 10 20 15 Y3 1.5σ 2σ Y4 5 10 5 2 4 6 8 10 12 14 16 18 20 Index Kenett, R.S. and Pollak, M. (2012) On Assessing the Performance of Sequential Procedures for Detecting a Change, Quality and Reliability Engineering International, 28, pp. 500 507 29

1.5σ 2σ 0.5σ 1σ 3 (ARL 0 and ARL 1 ) or (PFA and CED)? PFA= 2.3% CED=18.5 ARL 1 =27.8 PFA= 11.7% CED= 6.5 ARL 1 =14.6 PFA= 16.6% CED=4.3 ARL 1 =11.9 PFA= 19.8% CED=3.6 ARL 1 =10.9 Kenett, R.S. and Pollak, M. (2012) On Assessing the Performance of Sequential Procedures for Detecting a Change, Quality and Reliability Engineering International, 28, pp. 500 507 30

4 Are we making an impact? Practical Statistical Efficiency (PSE) PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} V{D} = value of the data actually collected V{M} = value of the statistical method employed V{P} = value of the problem to be solved P{S} = probability that the problem actually gets solved V{PS} = value of the problem being solved P{I} = probability the solution is actually implemented T{I} = time the solution stays implemented E{R} = expected number of replications Kenett, Coleman, Stewardson C. Eisenhart B. Hoadley A.B. Godfrey E.J.G. Pitman Kenett, R.S., Coleman, S.Y. and Stewardson, D. (2003). Statistical Efficiency: The Practical Perspective, Quality and Reliability Engineering International, 19: 265-272. 31

5 Are we generating knowledge? Information Quality (InfoQ) Knowledge Information Quality Data Quality Primary Data Analysis Quality Secondary Data Goals - Experimental - Experimental - Observational - Observational g X f U A specific analysis goal The available dataset An empirical analysis method A utility measure 1.Data resolution 2.Data structure 3.Data integration 4.Temporal relevance 5.Chronology of data and goal 6.Generalizability 7.Construct operationalization 8.Communication Kenett, R.S. and Shmueli, G. (2013) On Information Quality, http://ssrn.com/abstract=1464444 Journal of the Royal Statistical Society, Series A (with discussion), 176(4). What How 32

5 Big Data VVV and InfoQ 1.Data resolution 2.Data structure 3.Data integration 4.Temporal relevance 5.Chronology of data and goal 6.Generalizability 7.Construct operationalization 8.Communication Russom, P. (2011) Big Data Analytics, TDWI Best Practices Report, Q4. 33

5 Dimensions of Information Quality (InfoQ) 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Chronology of data and goal 6. Generalizability 7. Construct operationalization 8. Communication Mallows Hand Huber Cox Juran Box Deming Greenfield Kenett, R.S. and Shmueli, G. (2013) On Information Quality, http://ssrn.com/abstract=1464444 Journal of the Royal Statistical Society, Series A (with discussion), 176(4). 34

35 Modern Industrial Statistics with R, MINITAB and JMP

36

RSS Greenfield Medalists Year Name Year Name 1991 D Price 1992 1995 A Racine-Poon 1998 T P Davis & D M Grove R Caulcutt & M Gerson 2000 L Furlong 2005 Susan Lewis 2007 S J Morrison 2010 D Montgomery

Problem elicitation RLD CED PSE InfoQ 1 2 3 4 5 38