Decontextualizing + Assumptions = Fallacies: And It s Worse for Big Data

Size: px
Start display at page:

Download "Decontextualizing + Assumptions = Fallacies: And It s Worse for Big Data"

Transcription

1 Decontextualizing + Assumptions = Fallacies: And It s Worse for Big Data Michael Smithson Research School of Psychology The Australian National University

2 Setting the Scene This talk presents several commonplace myths about public data and problematic assumptions regarding models of such data. These problems are intensified and less corrigible in big data. The myths and assumptions fall under three headings: 1. Big data have integrity 2. Models of big data extract patterns that reflect underlying truths 3. Big data are better than small data

3 Big data have integrity? Accuracy? Data may be subject to recording distortion, recording errors, and/or measurement confounds. These may be worse for big data because big data often is an assemblage of multiple second-hand data sets taken out of context and used for purposes other than those originally intended. Distortion and error : Impact of shadow economies on official economic indicators (e.g., employment rates, inflation) Gaming the indicators (e.g., Australian universities and the ERA) Making it all up (e.g., the recent Canberra hospital records scandal) Measurement confounds: Differing or shifting criteria (e.g., definitions of crime, suicide in Catholic populations) Measurement contamination (e.g., webpage number of visits and dwell-times)

4 Big data have integrity? Precision? Big data often are sample data rather than population data, and the samples may not be representative of their referent populations. Nevertheless, decision makers and policy analysts usually treat sample data or estimates as though they are population data or estimates. Stability? Data often are not recorded just once, but re-recorded as better information becomes available or as errors are discovered. For example, in November 2012 the first official estimate of U.S. net employment increase was 146,000 new jobs. By the third revision that number had increased by 68% to 247,000. Completeness? Data collection schemes often are set up by groups who lack the necessary expertise. The Australian Transport Safety Bureau, e.g., collects lots of data on civil aviation flights that have resulted in an incident, but collects no data on incident-free flights.

5 Models of big data reveal underlying truths? Big data increasingly require automated data analysis, i.e., data mining. Data-mining has no quality control, beyond the assumptions built into its algorithms and post-hoc interpretations by humans. Spurious correlations There is no guarantee that a pattern (e.g., a correlation between two variables) uncovered by data-mining is meaningful or useful. Other unmeasured factors may render those correlations spurious. Autocorrelation may account for an apparent correlation over time. However, humans will make sense out of nearly anything.

6 Terms for uncertainty fall out of fashion in English-language books: As the plot below demonstrates, the terms ignorance", ignorant", unknown", uncertain", and falsehood" display a steady decline in relative frequency of occurrence in GoogleBooks, starting around 1830, and ending with a slight upturn at the start of the new century. logit Ignorance Ignorant Unknown Uncertain falsehood year

7 Terms for uncertainty fall out of fashion in English-language books: What is driving this? Could it be God? logit(god) year

8 Terms for uncertainty fall out of fashion in English-language books: It looks plausible; the correlations are very high. God Ignce. Ignt. Unkn. Uncrt. Falsh God Ignorance Ignorant Unknown Uncertain Falsehood Also, a search through the books containing such references reveals a potential link between mentions of these terms and references to God in the context of theological arguments.

9 Terms for uncertainty fall out of fashion in English-language books: However, The partial autocorrelation functions below suggest that all of these variables are strongly autocorrelated (AR(1) or AR(2) processes).

10 Terms for uncertainty fall out of fashion in English-language books: When autocorrelation is taken into account, the residual series no longer display strong correlations. The original correlations were inflated due to autocorrelation. God Ignce. Ignt. Unkn. Uncrt. Falsh God Ignorance Ignorant Unknown Uncertain Falsehood

11 Models of big data reveal underlying truths? Correlation ain t causation, but what if there s a time-lag? People tend to attribute causal status to X if X always precedes Y, and especially if they would like to attribute Y to X (e.g., my AFP experience). Nonstationarity Most data-mining procedures assume that the processes generating the data are stable over time (i.e., stationarity). This often may be untrue, and the changes in those processes unaccounted for.

12 Big data are better than small data? Big data can be big in at least three senses: Large samples, lots of different pieces of information, and more intense surveillance. These may not be unalloyed goods. Belief in small numbers Large samples indeed do give more accurate and precise estimates than small samples, ceteris paribus. But most non-statisticians don t know or understand this. Instead, they over-estimate how representative of a population a small sample is. More information can make us worse decision makers Several experimental studies have shown that people are more confident and make worse predictions when given additional, but irrelevant, data. This is especially problematic for unstructured fishing expeditions for data.

13 Big data are better than small data? Bigger data yield better models for predicting the future? One of the potential advantages of big data is long memory. Longer-term data-sets should give us a better model of the past, which in turn should enable us to better predict the future. But does it? Model inflation and nonstationarity pose problems here. Bigger data result in greater control? Bigger data clearly won t yield better control if underlying causes are not understood and/or are not malleable by us. Surveillance and data-gathering about people may destroy trust and privacy (which are means of control as well as social capital), and also may engender reactance among those under surveillance (and thus a loss of control).

Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index

Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index Rickard Nyman *, Paul Ormerod Centre for the Study of Decision Making Under Uncertainty,

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Sensitivity of an Environmental Risk Ranking System

Sensitivity of an Environmental Risk Ranking System Sensitivity of an Environmental Risk Ranking System SUMMARY Robert B. Hutchison and Howard H. Witt ANSTO Safety and Reliability CERES is a simple PC tool to rank environmental risks and to assess the cost-benefit

More information

Quality Factors in Big Data and Big Data Analytics and Their Legal Implications

Quality Factors in Big Data and Big Data Analytics and Their Legal Implications Quality Factors in Big Data and Big Data Analytics and Their Legal Implications Roger Clarke Xamax Consultancy, Canberra Visiting Professor in Computer Science, ANU and in Cyberspace Law & Policy, UNSW

More information

Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information. Via email to bigdata@ostp.

Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information. Via email to bigdata@ostp. 3108 Fifth Avenue Suite B San Diego, CA 92103 Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information Via email to bigdata@ostp.gov Big Data

More information

Integrated Resource Plan

Integrated Resource Plan Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 650-962-9670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1

More information

Business Intelligence and Decision Support Systems

Business Intelligence and Decision Support Systems Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley

More information

ABSTRACT OF THE DOCTORAL THESIS BY Cătălin Ovidiu Obuf Buhăianu

ABSTRACT OF THE DOCTORAL THESIS BY Cătălin Ovidiu Obuf Buhăianu ABSTRACT OF THE DOCTORAL THESIS BY Cătălin Ovidiu Obuf Buhăianu Thesis submitted to: NATIONAL UNIVERSITY OF PHYSICAL EDUCATION AND SPORTS, Bucharest, Romania, 2011 Thesis Advisor: Prof. Dr. Adrian Gagea

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

3. Data Analysis, Statistics, and Probability

3. Data Analysis, Statistics, and Probability 3. Data Analysis, Statistics, and Probability Data and probability sense provides students with tools to understand information and uncertainty. Students ask questions and gather and use data to answer

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Objectivity and the Measurement of Operational Risk. Dr. Lasse B. Andersen

Objectivity and the Measurement of Operational Risk. Dr. Lasse B. Andersen Objectivity and the Measurement of Operational Risk Dr. Lasse B. Andersen Background - The OpRisk Project Societal Safety & Risk Mng. Research group: 18 professors, 15 assoc. professors, 25 Ph.D students,

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

Forecasting. Sales and Revenue Forecasting

Forecasting. Sales and Revenue Forecasting Forecasting To plan, managers must make assumptions about future events. But unlike Harry Potter and his friends, planners cannot simply look into a crystal ball or wave a wand. Instead, they must develop

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Workshop Discussion Notes: Housing

Workshop Discussion Notes: Housing Workshop Discussion Notes: Housing Data & Civil Rights October 30, 2014 Washington, D.C. http://www.datacivilrights.org/ This document was produced based on notes taken during the Housing workshop of the

More information

How to Ensure Adequate Retirement Income from DC Pension Plans

How to Ensure Adequate Retirement Income from DC Pension Plans ISSN 1995-2864 Financial Market Trends OECD 2009 Pre-publication version for Vol. 2009/2 Private Pensions and 0B the Financial Crisis: How to Ensure Adequate Retirement Income from DC Pension Plans Pablo

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

Smarter Planet evolution

Smarter Planet evolution Smarter Planet evolution 13/03/2012 2012 IBM Corporation Ignacio Pérez González Enterprise Architect ignacio.perez@es.ibm.com @ignaciopr Mike May Technologies of the Change Capabilities Tendencies Vision

More information

Management Solution. Key Criteria for Maximizing Value and Reducing Risk. Author: Mark Bouchard WHITE PAPER

Management Solution. Key Criteria for Maximizing Value and Reducing Risk. Author: Mark Bouchard WHITE PAPER WHITE PAPER Demand More from Your Log Management Solution Key Criteria for Maximizing Value and Reducing Risk Author: Mark Bouchard 2009 AimPoint Group, LLC. All rights reserved. Introduction Every IT

More information

Fiduciary Duty in Support of Responsible Investment

Fiduciary Duty in Support of Responsible Investment CONVENING REPORT Fiduciary Duty in Support of Responsible Investment January 14, 2015 Introduction On January 14, 2015, the Initiative for Responsible Investment held a Convening to discuss Fiduciary Duty

More information

TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY

TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY Danubianu Mirela Stefan cel Mare University of Suceava Faculty of Electrical Engineering andcomputer Science 13 Universitatii Street, Suceava

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Recapturing CLIs. How a Diversified Data Strategy Can Help Card Issuers Restore Credit Line Increases and Boost Revenue. Michael Blix Analytic Expert

Recapturing CLIs. How a Diversified Data Strategy Can Help Card Issuers Restore Credit Line Increases and Boost Revenue. Michael Blix Analytic Expert Recapturing CLIs How a Diversified Data Strategy Can Help Card Issuers Restore Credit Line Increases and Boost Revenue Michael Blix Analytic Expert September 2013 Table of Contents 1 The CLI conundrum

More information

TIPS DATA QUALITY STANDARDS ABOUT TIPS

TIPS DATA QUALITY STANDARDS ABOUT TIPS 2009, NUMBER 12 2 ND EDITION PERFORMANCE MONITORING & EVALUATION TIPS DATA QUALITY STANDARDS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance

More information

Demand (Energy & Maximum Demand) Forecast - IRP 2010 O Parameter Overview sheet

Demand (Energy & Maximum Demand) Forecast - IRP 2010 O Parameter Overview sheet Demand (Energy & Maximum Demand) Forecast - IRP 2010 O Parameter Overview sheet This sheet is to be used as the primary stakeholder engagement tool. This document provides the information that will allow

More information

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado Does Big Data spell Big Trouble for Lawyers? Paul Karlzen Director HR Information & Analytics April 1, 2015 Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

More information

Mark Elliot October 2014

Mark Elliot October 2014 Final Report on the Disclosure Risk Associated with the Synthetic Data Produced by the SYLLS Team Mark Elliot October 2014 1. Background I have been asked by the SYLLS project, University of Edinburgh

More information

Insightful Analytics: Leveraging the data explosion for business optimisation. Top Ten Challenges for Investment Banks 2015

Insightful Analytics: Leveraging the data explosion for business optimisation. Top Ten Challenges for Investment Banks 2015 Insightful Analytics: Leveraging the data explosion for business optimisation 09 Top Ten Challenges for Investment Banks 2015 Insightful Analytics: Leveraging the data explosion for business optimisation

More information

Customer Perception and Reality: Unraveling the Energy Customer Equation

Customer Perception and Reality: Unraveling the Energy Customer Equation Paper 1686-2014 Customer Perception and Reality: Unraveling the Energy Customer Equation Mark Konya, P.E., Ameren Missouri; Kathy Ball, SAS Institute ABSTRACT Energy companies that operate in a highly

More information

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Prepared by Jim Gaetjens Presented to the Institute of Actuaries of Australia

More information

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

How to Prepare for your Deposition in a Personal Injury Case

How to Prepare for your Deposition in a Personal Injury Case How to Prepare for your Deposition in a Personal Injury Case A whitepaper by Travis Mayor, Attorney If you have filed a civil lawsuit in your personal injury case against the at fault driver, person, corporation,

More information

Pascal is here expressing a kind of skepticism about the ability of human reason to deliver an answer to this question.

Pascal is here expressing a kind of skepticism about the ability of human reason to deliver an answer to this question. Pascal s wager So far we have discussed a number of arguments for or against the existence of God. In the reading for today, Pascal asks not Does God exist? but Should we believe in God? What is distinctive

More information

Research Design. Recap. Problem Formulation and Approach. Step 3: Specify the Research Design

Research Design. Recap. Problem Formulation and Approach. Step 3: Specify the Research Design Recap Step 1: Identify and define the Problem or Opportunity Step 2: Define the Marketing Problem Management Problem Focus on symptoms Action oriented Marketing Problems Focus on causes Data oriented Problem

More information

Time Series Analysis

Time Series Analysis JUNE 2012 Time Series Analysis CONTENT A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken at regular intervals (days, months, years),

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

Alex Vidras, David Tysinger. Merkle Inc.

Alex Vidras, David Tysinger. Merkle Inc. Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

The Decline of the U.S. Labor Share. by Michael Elsby (University of Edinburgh), Bart Hobijn (FRB SF), and Aysegul Sahin (FRB NY)

The Decline of the U.S. Labor Share. by Michael Elsby (University of Edinburgh), Bart Hobijn (FRB SF), and Aysegul Sahin (FRB NY) The Decline of the U.S. Labor Share by Michael Elsby (University of Edinburgh), Bart Hobijn (FRB SF), and Aysegul Sahin (FRB NY) Comments by: Brent Neiman University of Chicago Prepared for: Brookings

More information

Power & Water Corporation. Review of Benchmarking Methods Applied

Power & Water Corporation. Review of Benchmarking Methods Applied 2014 Power & Water Corporation Review of Benchmarking Methods Applied PWC Power Networks Operational Expenditure Benchmarking Review A review of the benchmarking analysis that supports a recommendation

More information

ABA. History of ABA. Interventions 8/24/2011. Late 1800 s and Early 1900 s. Mentalistic Approachs

ABA. History of ABA. Interventions 8/24/2011. Late 1800 s and Early 1900 s. Mentalistic Approachs ABA Is an extension of Experimental Analysis of Behavior to applied settings Is not the same as modification Uses cognition in its approach Focuses on clinically or socially relevant s Is used in many

More information

The Future of the Advanced SOC

The Future of the Advanced SOC The Future of the Advanced SOC Developing a platform for more effective security management and compliance Steven Van Ormer RSA Technical Security Consultant 1 Agenda Today s Security Landscape and Why

More information

Executive Summary. Summary - 1

Executive Summary. Summary - 1 Executive Summary For as long as human beings have deceived one another, people have tried to develop techniques for detecting deception and finding truth. Lie detection took on aspects of modern science

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

The Four-Step Guide to Understanding Cyber Risk

The Four-Step Guide to Understanding Cyber Risk Lifecycle Solutions & Services The Four-Step Guide to Understanding Cyber Risk Identifying Cyber Risks and Addressing the Cyber Security Gap TABLE OF CONTENTS Introduction: A Real Danger It is estimated

More information

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research Innovations and Value Creation in Predictive Modeling David Cummings Vice President - Research ISO Innovative Analytics 1 Innovations and Value Creation in Predictive Modeling A look back at the past decade

More information

A Note on the Optimal Supply of Public Goods and the Distortionary Cost of Taxation

A Note on the Optimal Supply of Public Goods and the Distortionary Cost of Taxation A Note on the Optimal Supply of Public Goods and the Distortionary Cost of Taxation Louis Kaplow * Abstract In a recent article, I demonstrated that, under standard simplifying assumptions, it is possible

More information

Application of Predictive Model for Elementary Students with Special Needs in New Era University

Application of Predictive Model for Elementary Students with Special Needs in New Era University Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia

More information

Big Data: Uses and Limitations

Big Data: Uses and Limitations Big Data: Uses and Limitations Nathaniel Schenker Associate Director for Research and Methodology National Center for Health Statistics Centers for Disease Control and Prevention Presentation for discussion

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Abstract from the Journal of Alcohol and Clinical Experimental Research, 1987; 11 [5]: 416 23

Abstract from the Journal of Alcohol and Clinical Experimental Research, 1987; 11 [5]: 416 23 I would like to state from the outset, that I have no concerns when it comes to questioning the efficacy of 12-step-based treatments in the treatment of addiction. However, I have great concern when the

More information

Data Mining Report. DHS Privacy Office Response to House Report 108-774

Data Mining Report. DHS Privacy Office Response to House Report 108-774 Data Mining Report Response to House Report 108-774 Report to Congress on the Impact of Data Mining Technologies on Privacy and Civil Liberties Respectfully submitted Maureen Cooney Acting Chief Privacy

More information

How To Help Your Business With Benefits

How To Help Your Business With Benefits Myths and Misperceptions: What employee benefits can do for small businesses Brighter ideas in small business benefits Table of Contents Myths and Misperceptions: What Employee Benefits Can Do for Small

More information

The Business Credit Index

The Business Credit Index The Business Credit Index April 8 Published by the Credit Management Research Centre, Leeds University Business School April 8 1 April 8 THE BUSINESS CREDIT INDEX During the last ten years the Credit Management

More information

Decision Theory. 36.1 Rational prospecting

Decision Theory. 36.1 Rational prospecting 36 Decision Theory Decision theory is trivial, apart from computational details (just like playing chess!). You have a choice of various actions, a. The world may be in one of many states x; which one

More information

Example G Cost of construction of nuclear power plants

Example G Cost of construction of nuclear power plants 1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

More information

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London) Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London) 1 Forecasting: definition Forecasting is the process of making statements about events whose

More information

The human sex odds at birth after the atmospheric atomic bomb tests, after Chernobyl, and in the vicinity of nuclear facilities: Comment.

The human sex odds at birth after the atmospheric atomic bomb tests, after Chernobyl, and in the vicinity of nuclear facilities: Comment. Torture numbers, and they'll confess to anything. Gregg Easterbrook The human sex odds at birth after the atmospheric atomic bomb tests, after Chernobyl, and in the vicinity of nuclear facilities: Comment.

More information

American Statistical Association

American Statistical Association American Statistical Association Promoting the Practice and Profession of Statistics ASA Statement on Using Value-Added Models for Educational Assessment April 8, 2014 Executive Summary Many states and

More information

Big Bang and Steady State Theories - Past exam questions (6 mark)

Big Bang and Steady State Theories - Past exam questions (6 mark) Big Bang and Steady State Theories - Past exam questions (6 mark) (1) * Scientists believe that the Universe is expanding. Describe how careful observation of electromagnetic radiation from distant galaxies

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012. Abstract. Review session.

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012. Abstract. Review session. June 23, 2012 1 review session Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012 Review session. Abstract Quantitative methods in business Accounting

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Indiana Academic Standards Mathematics: Probability and Statistics

Indiana Academic Standards Mathematics: Probability and Statistics Indiana Academic Standards Mathematics: Probability and Statistics 1 I. Introduction The college and career ready Indiana Academic Standards for Mathematics: Probability and Statistics are the result of

More information

10.1 Determining What the Client Needs. Determining What the Client Needs (contd) Determining What the Client Needs (contd)

10.1 Determining What the Client Needs. Determining What the Client Needs (contd) Determining What the Client Needs (contd) Slide 10..1 CHAPTER 10 Slide 10..2 Object-Oriented and Classical Software Engineering REQUIREMENTS Seventh Edition, WCB/McGraw-Hill, 2007 Stephen R. Schach srs@vuse.vanderbilt.edu Overview Slide 10..3

More information

Making data predictive why reactive just isn t enough

Making data predictive why reactive just isn t enough Making data predictive why reactive just isn t enough Andrew Peterson, Ph.D. Principal Data Scientist Soltius NZ, Ltd. New Zealand 2014 Big Data and Analytics Forum 18 August, 2014 Caveats and disclaimer:

More information

Chicago Insurance Redlining - a complete example

Chicago Insurance Redlining - a complete example Chapter 12 Chicago Insurance Redlining - a complete example In a study of insurance availability in Chicago, the U.S. Commission on Civil Rights attempted to examine charges by several community organizations

More information

Gold. Mining for Information

Gold. Mining for Information Mining for Information Gold Data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way Joseph M. Firestone, Ph.D. During the late 1980s,

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Project Evaluation Guidelines

Project Evaluation Guidelines Project Evaluation Guidelines Queensland Treasury February 1997 For further information, please contact: Budget Division Queensland Treasury Executive Building 100 George Street Brisbane Qld 4000 or telephone

More information

Time Series Analysis: Basic Forecasting.

Time Series Analysis: Basic Forecasting. Time Series Analysis: Basic Forecasting. As published in Benchmarks RSS Matters, April 2015 http://web3.unt.edu/benchmarks/issues/2015/04/rss-matters Jon Starkweather, PhD 1 Jon Starkweather, PhD jonathan.starkweather@unt.edu

More information

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

[This document contains corrections to a few typos that were found on the version available through the journal s web page] Online supplement to Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67,

More information

Business Process Mining for Internal Fraud Risk Reduction: Results of a Case Study

Business Process Mining for Internal Fraud Risk Reduction: Results of a Case Study Business Process Mining for Internal Fraud Risk Reduction: Results of a Case Study Mieke Jans, Nadine Lybaert, and Koen Vanhoof Hasselt University, Agoralaan Building D, 3590 Diepenbeek, Belgium http://www.uhasselt.be/

More information

Five Myths of Active Portfolio Management. P roponents of efficient markets argue that it is impossible

Five Myths of Active Portfolio Management. P roponents of efficient markets argue that it is impossible Five Myths of Active Portfolio Management Most active managers are skilled. Jonathan B. Berk 1 This research was supported by a grant from the National Science Foundation. 1 Jonathan B. Berk Haas School

More information

Summary. January 2013»» white paper

Summary. January 2013»» white paper white paper A New Perspective on Small Business Growth with Scoring Understanding Scoring s Complementary Role and Value in Supporting Small Business Financing Decisions January 2013»» Summary In the ongoing

More information

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights

More information

Statistical Fallacies: Lying to Ourselves and Others

Statistical Fallacies: Lying to Ourselves and Others Statistical Fallacies: Lying to Ourselves and Others "There are three kinds of lies: lies, damned lies, and statistics. Benjamin Disraeli +/- Benjamin Disraeli Introduction Statistics, assuming they ve

More information

Topic #6: Hypothesis. Usage

Topic #6: Hypothesis. Usage Topic #6: Hypothesis A hypothesis is a suggested explanation of a phenomenon or reasoned proposal suggesting a possible correlation between multiple phenomena. The term derives from the ancient Greek,

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Chapter 1: Health & Safety Management Systems (SMS) Leadership and Organisational Safety Culture

Chapter 1: Health & Safety Management Systems (SMS) Leadership and Organisational Safety Culture Chapter 1: Health & Safety Management Systems (SMS) Leadership and Organisational Safety Culture 3 29 Safety Matters! A Guide to Health & Safety at Work Chapter outline Leadership and Organisational Safety

More information

Market Economies and the Price System

Market Economies and the Price System Market Economies and the Price System The Three Fundamental Economic Questions: WHAT is to be produced? HOW are these goods to be produced? FOR WHOM are the goods to be produced? Market Economies and the

More information

Proactively monitoring emerging risks through the analysis of occurrence and investigation data: Techniques used by the Australian Investigator

Proactively monitoring emerging risks through the analysis of occurrence and investigation data: Techniques used by the Australian Investigator Proactively monitoring emerging risks through the analysis of occurrence and investigation data: Techniques used by the Australian Investigator Stuart Godley Manager Research Investigations and Data Analysis

More information

Art or Science? Modeling and Challenges in the Post-Financial Crisis Economy

Art or Science? Modeling and Challenges in the Post-Financial Crisis Economy Art or Science? Modeling and Challenges in the Post-Financial Crisis Economy Emre Sahingur, Ph.D. Chief Risk Officer for Model Risk Fannie Mae May 2015 2011 Fannie Mae. Trademarks of Fannie Mae. 2015 Fannie

More information

Making critical connections: predictive analytics in government

Making critical connections: predictive analytics in government Making critical connections: predictive analytics in government Improve strategic and tactical decision-making Highlights: Support data-driven decisions using IBM SPSS Modeler Reduce fraud, waste and abuse

More information

Time series analysis as a framework for the characterization of waterborne disease outbreaks

Time series analysis as a framework for the characterization of waterborne disease outbreaks Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a

More information

Data Science and Prediction*

Data Science and Prediction* Data Science and Prediction* Vasant Dhar Professor Editor-in-Chief, Big Data Co-Director, Center for Business Analytics, NYU Stern Faculty, Center for Data Science, NYU *Article in Communications of the

More information

Challenger Retirement Income Research. How much super does a retiree really need to live comfortably? A comfortable standard of living

Challenger Retirement Income Research. How much super does a retiree really need to live comfortably? A comfortable standard of living 14 February 2012 Only for use by financial advisers How much super does a retiree really need to live comfortably? Understanding how much money will be needed is critical in planning for retirement One

More information

The State of Data Security Intelligence. Sponsored by Informatica. Independently conducted by Ponemon Institute LLC Publication Date: April 2015

The State of Data Security Intelligence. Sponsored by Informatica. Independently conducted by Ponemon Institute LLC Publication Date: April 2015 The State of Data Security Intelligence Sponsored by Informatica Independently conducted by Ponemon Institute LLC Publication Date: April 2015 Ponemon Institute Research Report The State of Data Security

More information

BIG Data. An Introductory Overview. IT & Business Management Solutions

BIG Data. An Introductory Overview. IT & Business Management Solutions BIG Data An Introductory Overview IT & Business Management Solutions What is Big Data? Having been a dominating industry buzzword for the past few years, there is no contesting that Big Data is attracting

More information

Predictive analytics. The rise and value of predictive analytics in enterprise decision making

Predictive analytics. The rise and value of predictive analytics in enterprise decision making WHITE PAPER Predictive analytics The rise and value of predictive analytics in enterprise decision making Give me a long enough lever and a place to stand, and I can move the Earth. Archimedes, 250 B.C.

More information

A better way to calculate equipment ROI

A better way to calculate equipment ROI page 1 A better way to calculate equipment ROI a West Monroe Partners white paper by Aaron Lininger Copyright 2012 by CSCMP s Supply Chain Quarterly (www.supplychainquarterly.com), a division of Supply

More information

Opening Remarks. Chairwoman Edith Ramirez Federal Trade Commission

Opening Remarks. Chairwoman Edith Ramirez Federal Trade Commission Welcome Opening Remarks Chairwoman Edith Ramirez Federal Trade Commission Presentation: Framing the Conversation Solon Barocas Princeton University Center for Information Technology Policy Big Data: A

More information

MAPPING DRUG OVERDOSE IN ADELAIDE

MAPPING DRUG OVERDOSE IN ADELAIDE MAPPING DRUG OVERDOSE IN ADELAIDE Danielle Taylor GIS Specialist GISCA, The National Key Centre for Social Applications of GIS University of Adelaide Roslyn Clermont Corporate Information Officer SA Ambulance

More information