The Random Sampling Road to Reasonableness. Reduce Risk and Cost by Employing a Complete and Integrated Validation Process

Size: px
Start display at page:

Download "The Random Sampling Road to Reasonableness. Reduce Risk and Cost by Employing a Complete and Integrated Validation Process"

Transcription

1 The Random Sampling Road to Reasonableness Reduce Risk and Cost by Employing a Complete and Integrated Validation Process By: Michael R. Wade Planet Data Executive Vice President Chief Technology Officer June, 2012

2 2 Page Table of Contents Introduction 3 A Few Random Thoughts 3 Why It Works 4 Confidence Level 6 Confidence Interval 9 Conclusion Planet Data Solutions All Rights Reserved Since 2001 Planet Data has provided high-quality Discovery Management Services and Solutions to its clients. Planet Data today is also recognized for our strides in technology development, including leveraging The Cerulean Engine to create our proprietary processing platform, now known as Exego. All data and examples presented in this white paper are intended for illustrative purposes only. Sampling rates and methodologies are always dictated by each individual case. We try to provide quality information, but we make no claims, promises or guarantees about the accuracy, completeness, or adequacy of the information contained herein. This white paper does not constitute legal advice in any jurisdiction. No part of this document may be reproduced or distributed without written consent and credit acknowledgement. The views expressed here are our own. Contact: Laura Marques Vice President, Marketing and Communications LMarques@PlanetDS.com

3 3 Page Introduction This paper was primarily inspired by the upsurge in discussions about predictive coding workflow and how to verify that the process is truly effective. In discussions with clients and other practitioners it became quite clear that there is a wide range of understanding and opinions about how to best employ Random Sampling techniques for validation purposes. This paper will address some fundamental aspects of Random Sampling to assist users in understanding the basic concepts, and better utilize these techniques. In particular Random Sampling can be used to reduce cost and risk when it is properly incorporated into your ESI workflow. In order to achieve these goals, practitioners must have a clear understanding on the impact of the two basic settings, Confidence Level and Confidence Interval. A Few Random Thoughts For a Lawyer or IT professional, random events are what we often fear most. They are by definition unpredictable, and for the most part uncontrollable. So it seems contradictory when we utilize the concept of randomness to validate and draw scientifically definable conclusions about the composition of large data sets. Random events are what we often fear most unpredictable uncontrollable Many people are highly skeptical of Random Sampling results. In many cases this is due to gross misuses of polling information in public settings, as well as a lack of comfort with the mathematics behind Random Sampling. As a result we are not always comfortable on how to incorporate these techniques into a defensible and reasonable litigation strategy. This paper will address several important concepts that should help overcome this distrust and put us on the path to using random sampling techniques in everyday practice. First, it seems counter intuitive to most of us that looking at randomly selected documents from a large set of documents could tell us very much about the collection. Second, the application of Confidence Levels and Intervals is not clearly understood - particularly in terms of exactly what is being measured and how to build a sound and defensible document review strategy based upon those values. 1 This calculation is based upon a 95% Confidence Level and a +/- 5% Confidence Interval and a population size of one million documents.

4 4 Page A Confidence Level is simply stating that we expect the observed number from a random sample to be within the Confidence Interval that percentage of the time (on average). The Confidence Interval defines within what margin of error we expect the real value to be from the observed value (e.g. +/- 5%). For example, when should we use a higher Confidence Level (CL) vs. a smaller Confidence Interval (CI)? NOTE: For the purposes of this article we will be measuring the number of documents that are responsive to some unknown criteria. Why It Works One of the most common questions concerning Random Sampling is how can a small sample from a very large number of documents give us information that is largely representative of the entire population? In basic terms the answer is because we are measuring a very simple property that has two possible values (yes/no) and that taking random samples to estimate this property falls within what is known as a normal distribution when it is properly implemented. In basic terms, the answer is because we are measuring a very simple property that has two possible values (yes/no). A key requirement for Random Sampling is that every document in the population has an equal chance of being selected. On average we would expect to see about the same percentage of responsive documents in our random sample as we would in the overall population (within the specified margin of error or CI). Figure 1 Ten trials of 384 samples at 95% CL and +/- 5% CI

5 5 Page Figure 1 illustrates the result of performing ten trials of 384 random samples (95% CL and +/- 5% CI) against a population of 250,000 documents. Of this population, 50% are known (a priori) to be responsive. The aqua blue background shows the margin of error (or Confidence Interval) that you would use to predict the window that the actual value would fall within. As this chart shows, in all ten trials the actual population was within the +/- 5% margin that we specified. If a person were to run more trials we would expect that over time approximately 5% of the trials would predict a window where the true value would fall outside of the margin of error (CI). By taking a sample of only 384 documents, we can predict that the actual number of responsive documents in this population falls between 43.96% and 53.96% (using the first sample which returned 48.96% as the number of responsive documents) with a 95% CL. Since we know that the actual value is 50% in this case, we can see that this prediction holds true. Just as importantly, the results of the random sample trials follow what is known as a Normal Distribution. You may remember the famous Bell Curve in Figure 2 from high school statistics, and now you will see why that lesson was actually worthwhile! Figure 2: Normal Distribution Curve It is important to note that approximately 68% of the measurements (responsive/nonresponsive) from your random sample trials will fall within one standard deviation of the actual value in your overall population. This means that your results tend to fall fairly closely around the actual value. To illustrate this with real world data, we have run 50,000 random trials against a population of one million documents and plotted a frequency diagram (i.e., shows the percentage of responsive documents as measured by each random sample trial and the number of times that each particular measured percentage occurred). As can be seen in Figure 3, the results are a Normal Distribution or a Bell Curve. This further illustrates how the actual results of Random Sampling match what the theory predicts. In fact, the results fall within the statistical range as specified by the CL and CI.

6 6 Page This result is in agreement with what is commonly referred to as the Central Limit Theorem. 2 Figure 3 - Frequency Distribution of 50,000 Random Sampling trials against one million documents where 20% are responsive using a 95% CL and +/- 2.5% CI. Confidence Level (CL) The Confidence Level is often poorly understood, and because of this, sampling validation decisions are often based upon faulty assumptions. Let s begin by discussing exactly what the CL means and how it impacts our decision making. A CL of 95% is simply stating that when a series of random samples (trials) are taken, we expect on average that 95% of those measurements will fall within the CI (e.g. +/- 2.5%) around the actual true value. Or put another way, the actual number will fall within the CI of the observed measure of the random sample trial 3. This seems pretty simple, but there are some assumptions that many of us are making without even realizing it. For example, if only ONE random sample is taken and it finds that 0% of the documents are responsive, can you be sure that this value falls within the CI of the actual value? As always, it depends upon what sure means. In this case, we used a 95% CL so we can say that 95% of the time the actual value will fall with the CI of the observed measurement NOTE: Values that fall outside of the CI are referred to as outliers.

7 7 Page Unfortunately this also means that there is a 1 in 20 chance (on average) that this measurement is an outlier and that the actual number could be very different from what this particular random sample trial would predict. This may be too high of a risk that the predicted value could be an outlier depending upon the importance of the data being measured. To mitigate this risk, you could perform additional random sample trials. For example, what are the odds that two random samples would both produce an outlier? Mathematically the odds would be 5% x 5%, or 0.25% (using a 95% CL). So by taking two random samples we have reduced the chances that we have an outlier to 0.25% (which is 1 out of 400 times). Additional sampling would reduce the odds even further. By taking two random samples we have reduced the chances that we have an outlier. Additional sampling would reduce this further. It is imperative to understand that you will never know when the random sample that you are taking will produce an outlier. You will only know that if you take multiple samples the number of responsive documents predicted should fall within the +/- CI percentage of the actual value at the specified CL percentage rate. It never means that the first sample taken is a good result, or even the second, only that over time we would expect that 95% of the random sample trials would fall within the CI% of the actual value. Lesson learned: perform more than one random sample trial when possible. Figure 4 Depicts 100 trials against one million documents with 95% CL and +/- 5% CI; with 20% of documents actually responsive.

8 8 Page In Figure 4 we ran 100 trials at a 95% CL and found that five of the results were outliers (exactly as predicted by the math). It is important to note that it would not be uncommon to have only three outliers or even six outliers when running this test. Remember, the 95% is a prediction that holds true over a large number of samples and is NOT a guarantee that it will be EXACTLY 95 out of every 100. If we are concerned and want to make sure that we reduce the risk of an outlier, we can change the CL for example to 99%. When you increase the CL percentage you will notice that the results of the random samples tend to cluster more closely around the actual value. We have reduced the likelihood that any one random sample will be an outlier from 5% to 1% (this is five times better). Figure 5 illustrates how we ran 100 trials using a 99% CL and +/- 5% and saw zero outliers. Notice that while the CI is still +/- 5%, the results from each random sample are more closely grouped around the 20% level (the actual number of responsive documents in this test set). Figure trials against one million documents using 99% CL and +/- 5% CI; 20% were actually responsive. To achieve this more precise result, we increased our sample sizes minimally, from 246 documents to 424 documents 4. In effect, we reduced the likelihood of an outlier result from a random sample by a factor of 5x. 4 When the actual proportion of responsive documents is known, the formula used to calculate the sample size will incorporate that proportion. However, when the proportion is unknown, as is normally the case, you must use 50%, which is the worst case scenario. This is why in this figure the sample size was 246 instead of 384 documents because the proportion of 20% was already known. This means that when sampling populations where the proportion is significantly different than 50%, the actual CL and CI are better than what was specified.

9 9 Page This was achieved by only slightly increasing the number of documents that we had to review. As before, if we perform more than one random sample against the document population, we can greatly reduce the chance that our predicted value was not an outlier (i.e. outside of the CI window from the actual value). Confidence Interval (CI) The Confidence Interval (CI), or margin of error, may be the most important concept that has to be understood to successfully employ Random Sampling techniques. This parameter sets the range of possible values that the actual number of responsive documents is likely to fall within. In simpler terms, it lets us know how close to the actual number of responsive documents in the full population that we can claim to be! So if we are sampling against one million documents using a 95% CL and a +/-5%, and our random sample predicts that 20% of those documents are responsive, we can still only say that we are 95% certain that the ACTUAL NUMBER of responsive documents lie between 150,000 and 250,000 documents (a range of 100,000 documents). The Confidence Interval may be the most important concept that has to be understood to successfully employ random sampling techniques. Now think about the case where only 1% of the documents are actually responsive. Due to the fact that the actual number is much smaller than the CI, its percentage difference between what we measure and what the actual value is could be significantly different on a percentage basis. It is possible that your prediction of the actual number of documents could be off by a factor of five (because the CI window is so much larger than the actual value) and the prediction would still fall within the CI. There are three primary methods for dealing with this issue: First, you can decrease the CI percentage. As an example you could reduce it from +/- 5% to +/- 1%. However, this has the impact of significantly increasing the number of documents that will need to be reviewed. For example, with a population size of one million documents, you would have to review 16,317 documents to achieve a 99% CL with a +/- 1% CI vs. 663 documents to achieve a 99% CL with a +/- 5% CI.

10 10 Page This change will reduce the window from a range of 100,000 documents to a range of 20,000 documents. While this is still a large range, particularly when we are looking for documents that have a low frequency of occurrence, it is still a significantly better prediction. The second method is the use of Judgmental Sampling. For example, if we know that the responsive documents are most likely going to be found within two custodians who have a total of 10,000 documents between them, we can take a sample of just that set of documents. If we use a 99% CL and a +/- 1% CI, you would have to sample 6,329 documents of the 10,000 - but the margin of error would only be 200 documents. You can then sample (from all documents save from these two custodians) the remaining population using a higher CI (or even the same) to confirm the assumption that the responsive documents fall predominately within the two custodians. Use iterative random sampling to significantly reduce the overall risk of leaving behind responsive materials or being misled by an outlier. Finally, you can use iterative Random Sampling to significantly reduce the overall risk of leaving behind responsive materials or being misled by an outlier. A typical method to look for responsive documents is to take an initial sample using a fairly low CL (e.g. 95%) and with moderate CI (+/- 5%) to search for responsive documents 5. Based on the actual number of responsive documents found, new search criteria is then developed (using any combination of keyword search, metadata filtering, concept search, and document similarity). These responsive documents can be removed from the population and a new round of sampling is performed. If any new responsive documents are found, you repeat the entire process. After multiple rounds are finished and no new responsive documents are found, you can then do an additional round of Random Sampling using tighter statistics (e.g. 99% CL and 2% CI). If at this point more responsive documents are found, you can then repeat the entire process. Because sampling with the lowered constraints does not involve looking at a large number of documents, this is still a very efficient process. The final sampling rounds are done at higher CL s and lower CI s to ensure that we were not missing anything in the other sampling rounds. 5 The CL and CI used in this example are for illustrative purposes only. Every activity in a case must be weighed against the risk associated with getting it wrong.

11 11 Page Conclusion Random Sampling without being incorporated into an overall workflow and strategy has limited value. However, by using it to help validate all parts of your ESI and Review processes, you can greatly reduce both costs and risk at the same time. When you understand how the Confidence Level and Confidence Interval affects the outcome of sampling, you will be ready to employ these techniques throughout the entire process. Random Sampling is not only used to protect against missing data, it can also be used to ensure that you are using efficient processes to find responsive documents from the very beginning of the case. Instead of just testing to see if anything was left behind, use Random Sampling to test how effective the searching methodology is in finding responsive documents. Before sending large numbers of documents for review (or deciding which documents will NOT be reviewed) take samples and form statistically valid opinions about the effectiveness of the techniques employed. For example, if only 5% of the documents being returned by a sample are responsive, we can be fairly certain that the process used to find those documents can be significantly improved. In conclusion, combining Random Sampling with a strong and repeatable workflow is the key to good results and a defensible process. Contact: Michael R. Wade Mike@PlanetDS.com About the Author: Mr. Wade has led several developmental efforts in the information, knowledge and document management areas. In 1988 Mr. Wade was CTO and principal of Switzerland s Tecomac AG, a company that focused on developing knowledge management solutions for large European corporations and private banks. During his time with Tecomac, he secured several patents for data compression technologies. Mr. Wade became involved in the litigation support industry more than a decade ago as one of the founders and CTO of Cerulean LLC, which was acquired by Planet Data in Mr. Wade has a B.S. in Accounting with a Minor in computer Science from Virginia Tech.

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Problem of the Month: Fair Games

Problem of the Month: Fair Games Problem of the Month: The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards:

More information

Sample Size Issues for Conjoint Analysis

Sample Size Issues for Conjoint Analysis Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

McKinsey Problem Solving Test Top Tips

McKinsey Problem Solving Test Top Tips McKinsey Problem Solving Test Top Tips 1 McKinsey Problem Solving Test You re probably reading this because you ve been invited to take the McKinsey Problem Solving Test. Don t stress out as part of the

More information

What Have I Learned In This Class?

What Have I Learned In This Class? xxx Lesson 26 Learning Skills Review What Have I Learned In This Class? Overview: The Learning Skills review focuses on what a learner has learned during Learning Skills. More importantly this lesson gives

More information

Chapter 8: Quantitative Sampling

Chapter 8: Quantitative Sampling Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or

More information

Experimental Analysis

Experimental Analysis Experimental Analysis Instructors: If your institution does not have the Fish Farm computer simulation, contact the project directors for information on obtaining it free of charge. The ESA21 project team

More information

Software Metrics. Lord Kelvin, a physicist. George Miller, a psychologist

Software Metrics. Lord Kelvin, a physicist. George Miller, a psychologist Software Metrics 1. Lord Kelvin, a physicist 2. George Miller, a psychologist Software Metrics Product vs. process Most metrics are indirect: No way to measure property directly or Final product does not

More information

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

WRITING PROOFS. Christopher Heil Georgia Institute of Technology WRITING PROOFS Christopher Heil Georgia Institute of Technology A theorem is just a statement of fact A proof of the theorem is a logical explanation of why the theorem is true Many theorems have this

More information

Teaching & Learning Plans. Plan 1: Introduction to Probability. Junior Certificate Syllabus Leaving Certificate Syllabus

Teaching & Learning Plans. Plan 1: Introduction to Probability. Junior Certificate Syllabus Leaving Certificate Syllabus Teaching & Learning Plans Plan 1: Introduction to Probability Junior Certificate Syllabus Leaving Certificate Syllabus The Teaching & Learning Plans are structured as follows: Aims outline what the lesson,

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Significant Figures, Propagation of Error, Graphs and Graphing

Significant Figures, Propagation of Error, Graphs and Graphing Chapter Two Significant Figures, Propagation of Error, Graphs and Graphing Every measurement has an error associated with it. If you were to put an object on a balance and weight it several times you will

More information

MATH 140 Lab 4: Probability and the Standard Normal Distribution

MATH 140 Lab 4: Probability and the Standard Normal Distribution MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes

More information

z-scores AND THE NORMAL CURVE MODEL

z-scores AND THE NORMAL CURVE MODEL z-scores AND THE NORMAL CURVE MODEL 1 Understanding z-scores 2 z-scores A z-score is a location on the distribution. A z- score also automatically communicates the raw score s distance from the mean A

More information

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS CENTRAL LIMIT THEOREM (SECTION 7.2 OF UNDERSTANDABLE STATISTICS) The Central Limit Theorem says that if x is a random variable with any distribution having

More information

Five Core Principles of Successful Business Architecture

Five Core Principles of Successful Business Architecture Five Core Principles of Successful Business Architecture Authors: Greg Suddreth and Whynde Melaragno Strategic Technology Architects (STA Group, LLC) Sponsored by MEGA Presents a White Paper on: Five Core

More information

8. THE NORMAL DISTRIBUTION

8. THE NORMAL DISTRIBUTION 8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

Performance Assessment Task Bikes and Trikes Grade 4. Common Core State Standards Math - Content Standards

Performance Assessment Task Bikes and Trikes Grade 4. Common Core State Standards Math - Content Standards Performance Assessment Task Bikes and Trikes Grade 4 The task challenges a student to demonstrate understanding of concepts involved in multiplication. A student must make sense of equal sized groups of

More information

Application of Simple Random Sampling 1 (SRS) in ediscovery

Application of Simple Random Sampling 1 (SRS) in ediscovery Manuscript submitted to the Organizing Committee of the Fourth DESI Workshop on Setting Standards for Electronically Stored Information in Discovery Proceedings on April 20, 2011. Updated May 18, 2011.

More information

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015 Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation

More information

Planet Data parlays software and services into e-discovery's new legal model

Planet Data parlays software and services into e-discovery's new legal model Planet Data parlays software and services into e-discovery's new legal model Analyst: David Horrigan 19 Jun, 2012 Not that long ago, the components of e-discovery came in separate silos. E-discovery service

More information

Understanding Options: Calls and Puts

Understanding Options: Calls and Puts 2 Understanding Options: Calls and Puts Important: in their simplest forms, options trades sound like, and are, very high risk investments. If reading about options makes you think they are too risky for

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Conn Valuation Services Ltd.

Conn Valuation Services Ltd. CAPITALIZED EARNINGS VS. DISCOUNTED CASH FLOW: Which is the more accurate business valuation tool? By Richard R. Conn CMA, MBA, CPA, ABV, ERP Is the capitalized earnings 1 method or discounted cash flow

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

More information

360 feedback. Manager. Development Report. Sample Example. name: email: date: sample@example.com

360 feedback. Manager. Development Report. Sample Example. name: email: date: sample@example.com 60 feedback Manager Development Report name: email: date: Sample Example sample@example.com 9 January 200 Introduction 60 feedback enables you to get a clear view of how others perceive the way you work.

More information

Risk Analysis and Quantification

Risk Analysis and Quantification Risk Analysis and Quantification 1 What is Risk Analysis? 2. Risk Analysis Methods 3. The Monte Carlo Method 4. Risk Model 5. What steps must be taken for the development of a Risk Model? 1.What is Risk

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Non-random/non-probability sampling designs in quantitative research

Non-random/non-probability sampling designs in quantitative research 206 RESEARCH MET HODOLOGY Non-random/non-probability sampling designs in quantitative research N on-probability sampling designs do not follow the theory of probability in the choice of elements from the

More information

COMPETITIVE INTELLIGENCE

COMPETITIVE INTELLIGENCE COMPETITIVE INTELLIGENCE GOVOREANU Alexandru MORA Andreea ŞERBAN Anca Abstract: There are many challenges to face in this century. It s an era of information. Those who have the best information are going

More information

The Importance of Statistics Education

The Importance of Statistics Education The Importance of Statistics Education Professor Jessica Utts Department of Statistics University of California, Irvine http://www.ics.uci.edu/~jutts jutts@uci.edu Outline of Talk What is Statistics? Four

More information

Ratios and Proportional Relationships: Lessons 1-6

Ratios and Proportional Relationships: Lessons 1-6 Unit 7-1 Lesson 1-6 Ratios and Proportional Relationships: Lessons 1-6 Name Date Classwork Book Math 7: Mr. Sanford Lessons 1-6: Proportional Relationship Lesson 1-1 Lesson 1: An Experience in Relationships

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Chapter 3 Review Math 1030

Chapter 3 Review Math 1030 Section A.1: Three Ways of Using Percentages Using percentages We can use percentages in three different ways: To express a fraction of something. For example, A total of 10, 000 newspaper employees, 2.6%

More information

Making reviews more consistent and efficient.

Making reviews more consistent and efficient. Making reviews more consistent and efficient. PREDICTIVE CODING AND ADVANCED ANALYTICS Predictive coding although yet to take hold with the enthusiasm initially anticipated is still considered by many

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

The Office of Public Services Reform The Drivers of Satisfaction with Public Services

The Office of Public Services Reform The Drivers of Satisfaction with Public Services The Office of Public Services Reform The Drivers of Satisfaction with Public Services Research Study Conducted for the Office of Public Services Reform April - May 2004 Contents Introduction 1 Executive

More information

abc Mark Scheme Statistics 6380 General Certificate of Education 2006 examination - January series SS02 Statistics 2

abc Mark Scheme Statistics 6380 General Certificate of Education 2006 examination - January series SS02 Statistics 2 Version 1.0: 0106 General Certificate of Education abc Statistics 6380 SS0 Statistics Mark Scheme 006 examination - January series Mark schemes are prepared by the Principal Examiner and considered, together

More information

Confidence Intervals

Confidence Intervals Confidence Intervals I. Interval estimation. The particular value chosen as most likely for a population parameter is called the point estimate. Because of sampling error, we know the point estimate probably

More information

Take value-add on a test drive. Explore smarter ways to evaluate phone data providers.

Take value-add on a test drive. Explore smarter ways to evaluate phone data providers. White Paper Take value-add on a test drive. Explore smarter ways to evaluate phone data providers. Employing an effective debt-collection strategy with the right information solutions provider helps increase

More information

How 4K UHDTV, 3G/1080p and 1080i Will Shape the Future of Sports Television Production How the production formats of today will migrate to the future

How 4K UHDTV, 3G/1080p and 1080i Will Shape the Future of Sports Television Production How the production formats of today will migrate to the future How 4K UHDTV, 3G/1080p and 1080i Will Shape the Future of Sports Television Production How the production formats of today will migrate to the future Original research from Josh Gordon Group sponsored

More information

Sampling and Sampling Distributions

Sampling and Sampling Distributions Sampling and Sampling Distributions Random Sampling A sample is a group of objects or readings taken from a population for counting or measurement. We shall distinguish between two kinds of populations

More information

BBC Learning English Talk about English Business Language To Go Part 1 - Interviews

BBC Learning English Talk about English Business Language To Go Part 1 - Interviews BBC Learning English Business Language To Go Part 1 - Interviews This programme was first broadcast in 2001. This is not a word for word transcript of the programme This series is all about chunks of language

More information

Fundamentals Explained

Fundamentals Explained Fundamentals Explained Purpose, values and method of Scouting Item Code FS140099 July 13 Edition no 2 (103297) 0845 300 1818 Fundamentals Explained This document contains detailed information on Fundamentals

More information

SAMPLING: MAKING ELECTRONIC DISCOVERY MORE COST EFFECTIVE

SAMPLING: MAKING ELECTRONIC DISCOVERY MORE COST EFFECTIVE SAMPLING: MAKING ELECTRONIC DISCOVERY MORE COST EFFECTIVE Milton Luoma Metropolitan State University 700 East Seventh Street St. Paul, Minnesota 55337 651 793-1246 (fax) 651 793-1481 Milt.Luoma@metrostate.edu

More information

Decision Analysis. Here is the statement of the problem:

Decision Analysis. Here is the statement of the problem: Decision Analysis Formal decision analysis is often used when a decision must be made under conditions of significant uncertainty. SmartDrill can assist management with any of a variety of decision analysis

More information

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method Send Orders for Reprints to reprints@benthamscience.ae 254 The Open Cybernetics & Systemics Journal, 2014, 8, 254-258 Open Access A Robustness Simulation Method of Project Schedule based on the Monte Carlo

More information

Evaluating New Cancer Treatments

Evaluating New Cancer Treatments Evaluating New Cancer Treatments You ve just heard about a possible new cancer treatment and you wonder if it might work for you. Your doctor hasn t mentioned it, but you want to find out more about this

More information

Predictive Coding Defensibility

Predictive Coding Defensibility Predictive Coding Defensibility Who should read this paper The Veritas ediscovery Platform facilitates a quality control workflow that incorporates statistically sound sampling practices developed in conjunction

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Math Journal HMH Mega Math. itools Number

Math Journal HMH Mega Math. itools Number Lesson 1.1 Algebra Number Patterns CC.3.OA.9 Identify arithmetic patterns (including patterns in the addition table or multiplication table), and explain them using properties of operations. Identify and

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Handouts for teachers

Handouts for teachers ASKING QUESTIONS THAT ENCOURAGE INQUIRY- BASED LEARNING How do we ask questions to develop scientific thinking and reasoning? Handouts for teachers Contents 1. Thinking about why we ask questions... 1

More information

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Math 132. Population Growth: the World

Math 132. Population Growth: the World Math 132 Population Growth: the World S. R. Lubkin Application If you think growth in Raleigh is a problem, think a little bigger. The population of the world has been growing spectacularly fast in the

More information

Evaluating teaching. 6.1 What is teacher evaluation and why is it important?

Evaluating teaching. 6.1 What is teacher evaluation and why is it important? 6 Evaluating Just as assessment that supports is critical for student, teacher evaluation that focuses on providing accurate evidence of practice and supports improvement is central for teachers. Having

More information

Social Survey Methods and Data Collection

Social Survey Methods and Data Collection Social Survey Social Survey Methods and Data Collection Zarina Ali June 2007 Concept of Survey & Social Survey A "survey" can be anything from a short paper- and-pencil feedback form to an intensive one-on

More information

Easy Casino Profits. Congratulations!!

Easy Casino Profits. Congratulations!! Easy Casino Profits The Easy Way To Beat The Online Casinos Everytime! www.easycasinoprofits.com Disclaimer The authors of this ebook do not promote illegal, underage gambling or gambling to those living

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Curve Fitting Best Practice

Curve Fitting Best Practice Enabling Science Curve Fitting Best Practice Part 5: Robust Fitting and Complex Models Most researchers are familiar with standard kinetics, Michaelis-Menten and dose response curves, but there are many

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

CHAPTER 4 DIMENSIONAL ANALYSIS

CHAPTER 4 DIMENSIONAL ANALYSIS CHAPTER 4 DIMENSIONAL ANALYSIS 1. DIMENSIONAL ANALYSIS Dimensional analysis, which is also known as the factor label method or unit conversion method, is an extremely important tool in the field of chemistry.

More information

A Practitioner s Guide to Statistical Sampling in E-Discovery. October 16, 2012

A Practitioner s Guide to Statistical Sampling in E-Discovery. October 16, 2012 A Practitioner s Guide to Statistical Sampling in E-Discovery October 16, 2012 1 Meet the Panelists Maura R. Grossman, Counsel at Wachtell, Lipton, Rosen & Katz Gordon V. Cormack, Professor at the David

More information

= δx x + δy y. df ds = dx. ds y + xdy ds. Now multiply by ds to get the form of the equation in terms of differentials: df = y dx + x dy.

= δx x + δy y. df ds = dx. ds y + xdy ds. Now multiply by ds to get the form of the equation in terms of differentials: df = y dx + x dy. ERROR PROPAGATION For sums, differences, products, and quotients, propagation of errors is done as follows. (These formulas can easily be calculated using calculus, using the differential as the associated

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Mathematics Higher Level

Mathematics Higher Level Mathematics Higher Level for the IB Diploma Exam Preparation Guide Paul Fannon, Vesna Kadelburg, Ben Woolley, Stephen Ward INTRODUCTION ABOUT THIS BOOK If you are using this book, you re probably getting

More information

UNIT 2 Braille Lesson Plan 1 Braille

UNIT 2 Braille Lesson Plan 1 Braille Codes and Ciphers Activity Mathematics Enhancement Programme Lesson Plan 1 Braille 1 Introduction T: What code, designed more than 150 years ago, is still used extensively today? T: The system of raised

More information

The MetLife Survey of

The MetLife Survey of The MetLife Survey of Preparing Students for College and Careers Part 2: Teaching Diverse Learners The MetLife Survey of the American Teacher: Preparing Students for College and Careers The MetLife Survey

More information

Logo Symmetry Learning Task. Unit 5

Logo Symmetry Learning Task. Unit 5 Logo Symmetry Learning Task Unit 5 Course Mathematics I: Algebra, Geometry, Statistics Overview The Logo Symmetry Learning Task explores graph symmetry and odd and even functions. Students are asked to

More information

Tools for Addressing the People Dynamics of Change. Wade Jack Bluemark Management Consultants

Tools for Addressing the People Dynamics of Change. Wade Jack Bluemark Management Consultants Tools for Addressing the People Dynamics of Change Wade Jack Bluemark Management Consultants 1 Overview 1. Realization vs. Installation 2. Building a platform for People Change 3. Improving stakeholder

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

For Your ediscovery... Software

For Your ediscovery... Software For Your ediscovery... Software is not enough Leading Provider of Investigatory and Litigation Support Services for Corporations, Governmental Agencies and Am Law Firms Worldwide. Our People Make the

More information

JHSPH HUMAN SUBJECTS RESEARCH ETHICS FIELD TRAINING GUIDE

JHSPH HUMAN SUBJECTS RESEARCH ETHICS FIELD TRAINING GUIDE JHSPH HUMAN SUBJECTS RESEARCH ETHICS FIELD TRAINING GUIDE This guide is intended to be used as a tool for training individuals who will be engaged in some aspect of a human subject research interaction

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information

An Introduction to Number Theory Prime Numbers and Their Applications.

An Introduction to Number Theory Prime Numbers and Their Applications. East Tennessee State University Digital Commons @ East Tennessee State University Electronic Theses and Dissertations 8-2006 An Introduction to Number Theory Prime Numbers and Their Applications. Crystal

More information

SAMPLING DISTRIBUTIONS

SAMPLING DISTRIBUTIONS 0009T_c07_308-352.qd 06/03/03 20:44 Page 308 7Chapter SAMPLING DISTRIBUTIONS 7.1 Population and Sampling Distributions 7.2 Sampling and Nonsampling Errors 7.3 Mean and Standard Deviation of 7.4 Shape of

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Elaboration of Scrum Burndown Charts.

Elaboration of Scrum Burndown Charts. . Combining Control and Burndown Charts and Related Elements Discussion Document By Mark Crowther, Empirical Pragmatic Tester Introduction When following the Scrum approach a tool frequently used is the

More information

POLYNOMIAL FUNCTIONS

POLYNOMIAL FUNCTIONS POLYNOMIAL FUNCTIONS Polynomial Division.. 314 The Rational Zero Test.....317 Descarte s Rule of Signs... 319 The Remainder Theorem.....31 Finding all Zeros of a Polynomial Function.......33 Writing a

More information

Measuring Electric Phenomena: the Ammeter and Voltmeter

Measuring Electric Phenomena: the Ammeter and Voltmeter Measuring Electric Phenomena: the Ammeter and Voltmeter 1 Objectives 1. To understand the use and operation of the Ammeter and Voltmeter in a simple direct current circuit, and 2. To verify Ohm s Law for

More information

The Envision process Defining tomorrow, today

The Envision process Defining tomorrow, today The Envision process Defining tomorrow, today Because life and the markets change over time, you need an investment plan that helps you know exactly where you stand now, tomorrow, and in the years to come.

More information

Introduction... 3. Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

Introduction... 3. Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups... 1 Table of Contents Introduction... 3 Quantitative Data Collection Methods... 4 Interviews... 4 Telephone interviews... 5 Face to face interviews... 5 Computer Assisted Personal Interviewing (CAPI)...

More information

EFFECTIVE STRATEGIC PLANNING IN MODERN INFORMATION AGE ORGANIZATIONS

EFFECTIVE STRATEGIC PLANNING IN MODERN INFORMATION AGE ORGANIZATIONS EFFECTIVE STRATEGIC PLANNING IN MODERN INFORMATION AGE ORGANIZATIONS Cezar Vasilescu and Aura Codreanu Abstract: The field of strategic management has offered a variety of frameworks and concepts during

More information

Figure 1. A typical Laboratory Thermometer graduated in C.

Figure 1. A typical Laboratory Thermometer graduated in C. SIGNIFICANT FIGURES, EXPONENTS, AND SCIENTIFIC NOTATION 2004, 1990 by David A. Katz. All rights reserved. Permission for classroom use as long as the original copyright is included. 1. SIGNIFICANT FIGURES

More information

oxford english testing.com

oxford english testing.com oxford english testing.com The Oxford Online Placement Test: The Meaning of OOPT Scores Introduction oxford english testing.com The Oxford Online Placement Test is a tool designed to measure test takers

More information