# Big Data: a new era for Statistics

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Big Data: a new era for Statistics Richard J. Samworth Abstract Richard Samworth (1996) is a Professor of Statistics in the University s Statistical Laboratory, and has been a Fellow of St John s since 23. In 212 he was awarded a five-year Engineering and Physical Sciences Research Council Early Career Fellowship a grant worth 1.2million to study New challenges in high-dimensional statistical inference. Big Data is all the rage in the media these days. Few people seem to be able to define exactly what they mean by it, but there is nevertheless consensus that in fields as diverse as genetics, medical imaging, astronomy, social networks and commerce, to name but a few, modern technology allows the collection and storage of data on scales unimaginable only a decade ago. Such a data deluge creates a huge range of challenges and opportunities for statisticians, and the subject is currently undergoing an exciting period of rapid growth and development. Hal Varian, Chief Economist at Google, famously said in 29, I keep saying the sexy job in the next ten years will be statisticians. This might raise eyebrows among those more familiar with Mark Twain s lies, damned lies and statistics, but there s no doubt that recent high-profile success stories such as Nate Silver s predictions of the 212 US presidential election have given statisticians a welcome image makeover. Let me begin by describing a simple, traditional statistical problem, in order to contrast it with today s situation. In the 192s, an experiment was carried out to understand the relationship between a car s initial speed, x, and its stopping distance, y. The stopping distances of fifty cars, having a range of different initial speeds, were measured and are plotted in Figure 1(a). Our understanding of the physics of car braking suggests that y ought to depend on x in a quadratic way, though from the figure we see that we can t expect an exact fit to the data. We therefore model the relationship as y = ax+bx 2 +ǫ, where ǫ represents the statistical error. Our aim is to estimate the unknown parameters a and b, which reflect the strength of the dependence of y on each of x and x 2. Here we don t need to 1

2 Stopping distance (ft) (a) (b) Stopping distance (ft) Initial speed (mph) Initial speed (mph) Figure 1: Panel (a) gives the stopping distances of 5 cars having a range of different initial speeds. Panel (b) also shows the fitted curve, as well as a 95 per cent prediction interval for the stopping distance of a car having an initial speed of 21mph. include a constant term in the quadratic, because a car with zero initial speed doesn t take long to stop. We would like to choose our estimates of a and b in such a way that the curve ax+bx 2 reflects the trend seen in the data. For any such curve, we can imagine drawing vertical lines from each data point to the curve, and a standard way to estimate a and b is to choose them to minimise the sum of the squares of the lengths of these lines. For a statistician, this is a straightforward problem to solve, yielding estimates â = 1.24 and ˆb =.9 of a and b respectively, and the fitted curve displayed in Figure 1(b). From this, we can predict that a car with initial speed 21mph would take 65.8ft to stop. In fact, we can also quantify the uncertainty in this prediction: with 95 per cent probability, a car with this initial speed would take between 34.9ft and 96.6ft to stop. (Incidentally, a modern car would typically take around 43ft to stop at that initial speed.) Of course, one can easily imagine that the stopping distance of a car would depend on many factors that weren t recorded in this experiment: the weather and tyre conditions, the state of the road, the make of car, and so on. Such additional information should allow us to refine our model, and make more accurate predictions, with less uncertainty. My point is that, by contrast with the 192s, in today s world we can often record a whole raft of variables whose values may influence the response of interest. In genetics, for instance, 2

3 Figure 2: The left panel shows a photograph of a typical microarray. The right panel shows the complexity of the output from a typical microarray experiment. microarrays (see Figure 2) are used in laboratories to measure simultaneously the expression levels of many thousands of genes in order to study the effects of a treatment or disease. An initial statistical model analogous to our car model, then, would require at least one variable for each gene. Interestingly, for microarray data, there may still be only around fifty replications of the experiment, though many other modern applications have vast numbers of replications too. Suddenly, estimating the unknown parameters in the model is not so easy. The method of least squares we used with our car data, for example, can t be used when we have more variables than replications. What saves us here is a belief in what is often called sparsity: most of the genes should be irrelevant for the particular treatment or disease under study. The statistical challenge, then, is that of variable selection which variables do I need in my model, and which can I safely discard? Many methods for finding these few important needles among the huge haystack of variables have been proposed over the last two decades. One could simply look for variables that are highly correlated with the response, or use the exotically-named Lasso (Tibshirani, 1996), which can be regarded as a modification of the least squares estimate. In Shah and Samworth (213), we gave a very general method for improving the performance of any existing variable selection method: instead of applying it once to the whole dataset, we showed that there are advantages to applying it to several random subsamples of the data, each of half the original sample size, eventually choosing the variables that are chosen on a high proportion of the subsamples. We were able to prove bounds that allow the practitioner to choose the threshold for this proportion to control the trade-off between false negatives and false positives. 3

4 Another problem I ve worked on recently is classification. Imagine that a doctor wants to diagnose diabetes. On a sample of diabetics, she makes measurements that she thinks are relevant for determining whether or not someone has the disease. She also makes the same measurements on a sample of non-diabetics. So when a new patient arrives for diagnosis, she again takes the same measurements. On what basis should she classify (diagnose) the new individual as coming from the diabetic or non-diabetic population? From a statistical point of view, the problem is the same as that encountered by banks that have to decide whether or not to give someone a loan, or an filter that has to decide whether a message is genuine or spam. One can imagine that an experienced doctor might have a notion of distance between any two individuals sets of measurements. So one very simple method of classification would be to assign the new patient to the group of his nearest neighbour (i.e. the person closest according to the doctor s distance) among all n people, say, in our clinical trial. But intuitively, we might feel there was too much chance about whether or not the nearest neighbour happened to be diabetic, so a slightly more sophisticated procedure would look at the patient s k nearest neighbours, and would assign him to the population having at least half of those k nearest neighbours. In Hall, Park and Samworth (28), we derived the optimal choice of k, in the sense of minimising the probability of misclassifying thenew individual. For those interested, it shouldbechosen proportional to n 4/(d+4), where d is the number of measurements made on each individual. An obvious drawback of the k-nearest neighbour classifier is that it gives equal importance to the group associated with the nearest neighbour as it does the kth nearest neighbour. This observation prompts us to consider weighted nearest neighbours, with weights that decay as one moves further from the individual to be classified. In Samworth (212), I derived the optimal weighting scheme, as well as a formula for the improvement attainable over the unweighted k nearest neighbour classifier. It s between a 5 per cent and 1 per cent improvement when d 15, which might not seem like much, until it s you that requires the diagnosis! Five years ago, I set up the Statistics Clinic, where once a fortnight any member of the University can come and receive advice on their statistical problems from one of a team of helpers (mainly my PhD students and post-docs). The sheer range of subjects covered and the diversity of problems they present to us provide convincing evidence that statistics is finally being recognised for its importance in making rational decisions in an uncertain world. The twenty-first century is undoubtedly the information age even Mark Twain would agree! Professor Richard J. Samworth 4

5 References P. Hall, B. U. Park and R. J. Samworth Choice of neighbor order in nearest-neighbor classification, Annals of Statistics, 36 (28), R. J. Samworth Optimal weighted nearest neighbour classifiers, Annals of Statistics, 4 (212), R. D. Shah and R. J. Samworth Variable selection with error control: Another look at Stability Selection, Journal of the Royal Statistical Society, Ser. B, 75 (213), R. Tibshirani Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Ser. B, 58 (1996),

### MACHINE LEARNING BASICS WITH R

MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### 1. How different is the t distribution from the normal?

Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

### Local classification and local likelihoods

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

### International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### Validation of measurement procedures

Validation of measurement procedures R. Haeckel and I.Püntmann Zentralkrankenhaus Bremen The new ISO standard 15189 which has already been accepted by most nations will soon become the basis for accreditation

### Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

### Simple Linear Regression

STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

### 1 Uncertainty and Preferences

In this chapter, we present the theory of consumer preferences on risky outcomes. The theory is then applied to study the demand for insurance. Consider the following story. John wants to mail a package

### Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

### STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS 1. If two events (both with probability greater than 0) are mutually exclusive, then: A. They also must be independent. B. They also could

### Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index

Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index Rickard Nyman *, Paul Ormerod Centre for the Study of Decision Making Under Uncertainty,

### Simple Random Sampling

Source: Frerichs, R.R. Rapid Surveys (unpublished), 2008. NOT FOR COMMERCIAL DISTRIBUTION 3 Simple Random Sampling 3.1 INTRODUCTION Everyone mentions simple random sampling, but few use this method for

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Estimating the Average Value of a Function

Estimating the Average Value of a Function Problem: Determine the average value of the function f(x) over the interval [a, b]. Strategy: Choose sample points a = x 0 < x 1 < x 2 < < x n 1 < x n = b and

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Section 1.1. Introduction to R n

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

### 6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

6 EXTENDING ALGEBRA Chapter 6 Extending Algebra Objectives After studying this chapter you should understand techniques whereby equations of cubic degree and higher can be solved; be able to factorise

### Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

### Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

Chapter 23 Squares Modulo p Revised Version of Chapter 23 We learned long ago how to solve linear congruences ax c (mod m) (see Chapter 8). It s now time to take the plunge and move on to quadratic equations.

### Reasoning with Uncertainty More about Hypothesis Testing. P-values, types of errors, power of a test

Reasoning with Uncertainty More about Hypothesis Testing P-values, types of errors, power of a test P-Values and Decisions Your conclusion about any null hypothesis should be accompanied by the P-value

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

### Health Care and Life Sciences

Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,

### CHAOS LIMITATION OR EVEN END OF SUPPLY CHAIN MANAGEMENT

CHAOS LIMITATION OR EVEN END OF SUPPLY CHAIN MANAGEMENT Michael Grabinski 1 Abstract Proven in the early 196s, weather forecast is not possible for an arbitrarily long period of time for principle reasons.

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Equilibrium: Illustrations

Draft chapter from An introduction to game theory by Martin J. Osborne. Version: 2002/7/23. Martin.Osborne@utoronto.ca http://www.economics.utoronto.ca/osborne Copyright 1995 2002 by Martin J. Osborne.

### Breakthrough Lung Cancer Treatment Approved Webcast September 9, 2011 Renato Martins, M.D., M.P.H. Introduction

Breakthrough Lung Cancer Treatment Approved Webcast September 9, 2011 Renato Martins, M.D., M.P.H. Please remember the opinions expressed on Patient Power are not necessarily the views of Seattle Cancer

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Model Selection. Introduction. Model Selection

Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model

### CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

### A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### Introduction. example of a AA curve appears at the end of this presentation.

1 Introduction The High Quality Market (HQM) Corporate Bond Yield Curve for the Pension Protection Act (PPA) uses a methodology developed at Treasury to construct yield curves from extended regressions

### Sensitivity of Hospital Clinic queues to patient non-attendance. Neville J. Ford, Paul D. Crofts, and Richard H. T. Edwards

ISSN 1360-1725 UMIST Sensitivity of Hospital Clinic queues to patient non-attendance Neville J. Ford, Paul D. Crofts, and Richard H. T. Edwards Numerical Analysis Report No. 383 A report in association

### Homework 6 Solutions

Math 17, Section 2 Spring 2011 Assignment Chapter 20: 12, 14, 20, 24, 34 Chapter 21: 2, 8, 14, 16, 18 Chapter 20 20.12] Got Milk? The student made a number of mistakes here: Homework 6 Solutions 1. Null

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Gaussian Classifiers CS498

Gaussian Classifiers CS498 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary

### Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### The Contextualization of Project Management Practice and Best Practice

The Contextualization of Project Management Practice and Best Practice Claude Besner PhD, University of Quebec at Montreal Brian Hobbs PhD, University of Quebec at Montreal Abstract This research aims

### What Is Probability?

1 What Is Probability? The idea: Uncertainty can often be "quantified" i.e., we can talk about degrees of certainty or uncertainty. This is the idea of probability: a higher probability expresses a higher

### Why You Should Consider Graduate Study in the UCLA Department of Biostatistics

Why You Should Consider Graduate Study in the UCLA Department of Biostatistics Biostatistics as an academic discipline Natural direction for individuals with strengths in applied mathematics Linear algebra,

### Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

Straightening Data in a Scatterplot Selecting a Good Re-Expression What Is All This Stuff? Here s what is included: Page 3: Graphs of the three main patterns of data points that the student is likely to

### Better decision making under uncertain conditions using Monte Carlo Simulation

IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

### AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

### Finding funding for your research and carrying it out

Finding funding for your research and carrying it out Prof John Durodola ICONS 2010 John Durodola Department of Mechanical Engineering Presentation Overview 1. Background 2. Motivation for research 3.

### Application Project: Diabetes Data

Background: Application Project: Diabetes Data Joshua Bradley, Joshua Brulé In this diabetes study (by Dr. Monifa Vaughn Cooke), each patient was instructed to test their blood glucose a prescribed number

### Schools Value-added Information System Technical Manual

Schools Value-added Information System Technical Manual Quality Assurance & School-based Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control

### An analysis of the 2003 HEFCE national student survey pilot data.

An analysis of the 2003 HEFCE national student survey pilot data. by Harvey Goldstein Institute of Education, University of London h.goldstein@ioe.ac.uk Abstract The summary report produced from the first

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Teaching Business Statistics through Problem Solving

Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook

### Chapter 5 Estimating Demand Functions

Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### The Big Picture. Correlation. Scatter Plots. Data

The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

### Predicting daily incoming solar energy from weather data

Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting

### Medical research has led to

Transforming the patient care experience through research and innovation Medical research has led to improvements in care that would have been unimaginable to a physician practicing a century ago. The

### Pascal is here expressing a kind of skepticism about the ability of human reason to deliver an answer to this question.

Pascal s wager So far we have discussed a number of arguments for or against the existence of God. In the reading for today, Pascal asks not Does God exist? but Should we believe in God? What is distinctive

### Factoring Polynomials

Factoring Polynomials Hoste, Miller, Murieka September 12, 2011 1 Factoring In the previous section, we discussed how to determine the product of two or more terms. Consider, for instance, the equations

### Discrete Structures for Computer Science

Discrete Structures for Computer Science Adam J. Lee adamlee@cs.pitt.edu 6111 Sennott Square Lecture #20: Bayes Theorem November 5, 2013 How can we incorporate prior knowledge? Sometimes we want to know

### Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

### Reputation Network Analysis for Email Filtering

Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu

### Odgers Berndtson Board Survey. Among CEOs in Denmark s largest corporations

Boards and CEOs preparing for growth Almost half of the CEOs in Denmark s largest corporations consider the financial crisis to be over and expect positive growth in the near future. This calls for preparation

### Module 5. Attitude to risk. In this module we take a look at risk management and its importance. TradeSense Australia, June 2011, Edition 10

Attitude to risk Module 5 Attitude to risk In this module we take a look at risk management and its importance. TradeSense Australia, June 2011, Edition 10 Attitude to risk In the previous module we looked

### The Evaluation of Screening Policies for Diabetic Retinopathy using Simulation

The Evaluation of Screening Policies for Diabetic Retinopathy using Simulation R. Davies 1, S.C. Brailsford 1, P.J. Roderick 2 and C.R.Canning 3 1 School of Management, University of Southampton, UK 2

### A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

### Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

### Annual General Meeting. Fresenius Medical Care AG & Co. KGaA

Annual General Meeting Fresenius Medical Care AG & Co. KGaA Speech to the shareholders by Rice Powell May 12, 2016 The spoken word shall prevail. 1 Fellow shareholders, shareholder representatives, ladies

### 9. Sampling Distributions

9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

### There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

### Fairer schools funding. Arrangements for 2015 to 2016

Fairer schools funding Arrangements for 2015 to 2016 July 2014 Contents Introduction 3 Chapter 1 Fairer funding for schools 5 Chapter 2 Long-term reform of high needs and early years funding 10 Chapter

### Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

### COS 116 The Computational Universe Laboratory 11: Machine Learning

COS 116 The Computational Universe Laboratory 11: Machine Learning In last Tuesday s lecture, we surveyed many machine learning algorithms and their applications. In this lab, you will explore algorithms

### Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

parent ROADMAP MATHEMATICS SUPPORTING YOUR CHILD IN HIGH SCHOOL HS America s schools are working to provide higher quality instruction than ever before. The way we taught students in the past simply does

### TAKING PART IN CANCER TREATMENT RESEARCH STUDIES

For more infomation about Cancer Clinical Trials at Upstate Cancer Center please call Upstate Connect 1.800.464.8668 TAKING PART IN CANCER TREATMENT RESEARCH STUDIES Information provided by: National Cancer

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### High School Algebra Reasoning with Equations and Inequalities Solve equations and inequalities in one variable.

Performance Assessment Task Quadratic (2009) Grade 9 The task challenges a student to demonstrate an understanding of quadratic functions in various forms. A student must make sense of the meaning of relations

### Lecture 13. Understanding Probability and Long-Term Expectations

Lecture 13 Understanding Probability and Long-Term Expectations Thinking Challenge What s the probability of getting a head on the toss of a single fair coin? Use a scale from 0 (no way) to 1 (sure thing).

### Market Efficiency: Definitions and Tests. Aswath Damodaran

Market Efficiency: Definitions and Tests 1 Why market efficiency matters.. Question of whether markets are efficient, and if not, where the inefficiencies lie, is central to investment valuation. If markets

### Abstract. Medical websites online are starting to take over the roles of doctors by providing ample

1 Abstract The dangers of medical websites and the addiction surrounding them is explored in this paper. Medical websites online are starting to take over the roles of doctors by providing ample information

### ATTITUDE TO RISK. In this module we take a look at risk management and its importance. MODULE 5 INTRODUCTION PROGRAMME NOVEMBER 2012, EDITION 18

INTRODUCTION PROGRAMME MODULE 5 ATTITUDE TO RISK In this module we take a look at risk management and its importance. NOVEMBER 2012, EDITION 18 CONTENTS 3 6 RISK MANAGEMENT 2 In the previous module we

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Physics 182 - Fall 2014 - Experiment #1 1 Experiment #1, Analyze Data using Excel, Calculator and Graphs. 1 Purpose (5 Points, Including Title. Points apply to your lab report.) Before we start measuring

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### \$2 4 40 + ( \$1) = 40

THE EXPECTED VALUE FOR THE SUM OF THE DRAWS In the game of Keno there are 80 balls, numbered 1 through 80. On each play, the casino chooses 20 balls at random without replacement. Suppose you bet on the

### ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS

PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING 200 2 ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS Jeff