Testing Random- Number Generators



Similar documents
Random-Number Generation

Hypothesis testing - Steps

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Lecture 8. Confidence intervals and the central limit theorem

Chapter 3 RANDOM VARIATE GENERATION

Confidence Intervals for Cp

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Digital Signature. Raj Jain. Washington University in St. Louis

Simple Linear Regression Inference

Math 115 Spring 2011 Written Homework 5 Solutions

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Jitter Measurements in Serial Data Signals

Simple Regression Theory II 2010 Samuel L. Baker

Plot the following two points on a graph and draw the line that passes through those two points. Find the rise, run and slope of that line.

Confidence Intervals for One Standard Deviation Using Standard Deviation

Normality Testing in Excel

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Lecture 5 : The Poisson Distribution

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Part 2: Analysis of Relationship Between Two Variables

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

E3: PROBABILITY AND STATISTICS lecture notes

Zeros of Polynomial Functions

Descriptive Statistics

Lecture 2. Summarizing the Sample

4. Continuous Random Variables, the Pareto and Normal Distributions

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Homework 11. Part 1. Name: Score: / null

Joint Exam 1/P Sample Exam 1

How To Check For Differences In The One Way Anova

Systems of Linear Equations

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Introduction to Quantitative Methods

Section 1.1 Linear Equations: Slope and Equations of Lines

Probability and Random Variables. Generation of random variables (r.v.)

PCHS ALGEBRA PLACEMENT TEST

Regression III: Advanced Methods

On Correlating Performance Metrics

Nonparametric Tests for Randomness

PTC Thermistor: Time Interval to Trip Study

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

1 The Brownian bridge construction

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Weight of Evidence Module

1 The Line vs Point Test

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Review of Fundamental Mathematics

Using Excel for inferential statistics

START Selected Topics in Assurance

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Algebra I Vocabulary Cards

MBA 611 STATISTICS AND QUANTITATIVE METHODS

5: Magnitude 6: Convert to Polar 7: Convert to Rectangular

Confidence Intervals for Exponential Reliability

Introduction to Regression and Data Analysis

Projects Involving Statistics (& SPSS)

Non Parametric Inference

Confidence Intervals for the Difference Between Two Means

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

II. DISTRIBUTIONS distribution normal distribution. standard scores

3.1 Solving Systems Using Tables and Graphs

Autocovariance and Autocorrelation

PROBABILITY AND SAMPLING DISTRIBUTIONS

Algebra 1 Course Information

Vector and Matrix Norms

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

3.4 Statistical inference for 2 populations based on two samples

MTH 140 Statistics Videos

Chapter 4. Probability Distributions

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

How Does My TI-84 Do That

Chapter G08 Nonparametric Statistics

Regression Analysis: A Complete Example

Metric Spaces. Chapter Metrics

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Common Core State Standards for Mathematics Accelerated 7th Grade

Simple linear regression

Time Series Analysis

LINEAR EQUATIONS IN TWO VARIABLES

Additional sources Compilation of sources:

Description. Textbook. Grading. Objective

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

What are the place values to the left of the decimal point and their associated powers of ten?

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

EQUATIONS and INEQUALITIES

TOPIC 4: DERIVATIVES

Data Preprocessing. Week 2

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013

Transcription:

Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/ 27-1

Overview 1. Chi-square test 2. Kolmogorov-Smirnov Test 3. Serial-correlation Test 4. Two-level tests 5. K-dimensional uniformity or k-distributivity 6. Serial Test 7. Spectral Test 27-2

Testing Random-Number Generators Goal: To ensure that the random number generator produces a random stream.! Plot histograms! Plot quantile-quantile plot! Use other tests! Passing a test is necessary but not sufficient! Pass Good Fail Bad! New tests Old generators fail the test! Tests can be adapted for other distributions 27-3

Chi-Square Test! Most commonly used test! Can be used for any distribution! Prepare a histogram of the observed data! Compare observed frequencies with theoretical k = Number of cells o i = Observed frequency for ith cell e i = Expected frequency! D=0 Exact fit! D has a chi-square distribution with k-1 degrees of freedom. Compare D with χ 2 [1-α; k-1] Pass with confidence α if D is less 27-4

! 1000 random numbers with x 0 = 1! Example 27.1! Observed difference = 10.380! Observed is Less Accept IID U(0, 1) 27-5

Chi-Square for Other Distributions! Errors in cells with a small e i affect the chi-square statistic more! Best when e i 's are equal. Use an equi-probable histogram with variable cell sizes! Combine adjoining cells so that the new cell probabilities are approximately equal.! The number of degrees of freedom should be reduced to k-r-1 (in place of k-1), where r is the number of parameters estimated from the sample.! Designed for discrete distributions and for large sample sizes only Lower significance for finite sample sizes and continuous distributions! If less than 5 observations, combine neighboring cells 27-6

Kolmogorov-Smirnov Test! Developed by A. N. Kolmogorov and N. V. Smirnov! Designed for continuous distributions! Difference between the observed CDF (cumulative distribution function) F o (x) and the expected cdf F e (x) should be small. 27-7

Kolmogorov-Smirnov Test! K + = maximum observed deviation below the expected cdf! K - = minimum observed deviation below the expected cdf! K + < K [1-α;n] and K - < K [1-α;n] Pass at α level of significance.! Don't use max/min of Fe(x i )-F o (x i )! Use F e (x i+1 )-F o (x i ) for K -! For U(0, 1): F e (x)=x! F o (x) = j/n, where x > x 1, x 2,..., x j-1 27-8

Example 27.2 30 Random numbers using a seed of x 0 =15:! The numbers are: 14, 11, 2, 6, 18, 23, 7, 21, 1, 3, 9, 27, 19, 26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28, 22, 4, 12, 5, 15. 27-9

Example 27.2 (Cont) The normalized numbers obtained by dividing the sequence by 31 are: 0.45161, 0.35484, 0.06452, 0.19355, 0.58065, 0.74194, 0.22581, 0.67742, 0.03226, 0.09677, 0.29032, 0.87097, 0.61290, 0.83871, 0.51613, 0.54839, 0.64516, 0.93548, 0.80645, 0.41935, 0.25806, 0.77419, 0.32258, 0.96774, 0.90323, 0.70968, 0.12903, 0.38710, 0.16129, 0.48387. 27-10

Example 27.2 (Cont)! K [0.9;n] value for n = 30 and a = 0.1 is 1.0424! Observed<Table Pass 27-11

Chi-square vs. K-S K S Test 27-12

Serial-Correlation Test! Nonzero covariance Dependence. The inverse is not true! R k = Autocovariance at lag k = Cov[x n, x n+k ]! For large n, R k is normally distributed with a mean of zero and a variance of 1/[144(n-k)]! 100(1-α)% confidence interval for the autocovariance is: For k 1 Check if CI includes zero! For k = 0, R 0 = variance of the sequence Expected to be 1/12 for IID U(0,1) 27-13

Example 27.3: Serial Correlation Test 10,000 random numbers with x 0 =1: 27-14

Example 27.3 (Cont)! All confidence intervals include zero All covariances are statistically insignificant at 90% confidence. 27-15

Two-Level Tests! If the sample size is too small, the test results may apply locally, but not globally to the complete cycle.! Similarly, global test may not apply locally! Use two-level tests Use Chi-square test on n samples of size k each and then use a Chi-square test on the set of n Chi-square statistics so obtained Chi-square on Chi-square test.! Similarly, K-S on K-S! Can also use this to find a ``nonrandom'' segment of an otherwise random sequence. 27-16

! k-dimensional Uniformity k-distributivity! Chi-square uniformity in one dimension Given two real numbers a 1 and b 1 between 0 and 1 such that b 1 > a 1! This is known as 1-distributivity property of u n.! The 2-distributivity is a generalization of this property in two dimensions: For all choices of a 1, b 1, a 2, b 2 in [0, 1], b 1 >a 1 and b 2 >a 2 27-17

! k-distributed if: k-distributivity (Cont)! For all choices of a i, b i in [0, 1], with b i >a i, i=1, 2,..., k.! k-distributed sequence is always (k-1)-distributed. The inverse is not true.! Two tests: 1. Serial test 2. Spectral test 3. Visual test for 2-dimensions: Plot successive overlapping pairs of numbers 27-18

! Tausworthe sequence generated by: Example 27.4! The sequence is k- distributed for k up to d /l e, that is, k=1.! In two dimensions: Successive overlapping pairs (x n, x n+1 ) 27-19

Example 27.5! Consider the polynomial:! Better 2-distributivity than Example 27.4 27-20

Serial Test! Goal: To test for uniformity in two dimensions or higher.! In two dimensions, divide the space between 0 and 1 into K 2 cells of equal area x n +1 x n 27-21

Serial Test (Cont)! Given {x 1, x 2,, x n }, use n/2 non-overlapping pairs (x 1, x 2 ), (x 3, x 4 ), and count the points in each of the K 2 cells.! Expected= n/(2k 2 ) points in each cell.! Use chi-square test to find the deviation of the actual counts from the expected counts.! The degrees of freedom in this case are K 2-1.! For k-dimensions: use k-tuples of non-overlapping values.! k-tuples must be non-overlapping.! Overlapping number of points in the cells are not independent chi-square test cannot be used! In visual check one can use overlapping or non-overlapping.! In the spectral test overlapping tuples are used.! Given n numbers, there are n-1 overlapping pairs, n/2 nonoverlapping pairs. 27-22

Spectral Test! Goal: To determine how densely the k-tuples {x 1, x 2,, x k }can fill up the k-dimensional hyperspace.! The k-tuples from an LCG fall on a finite number of parallel hyper-planes.! Successive pairs would lie on a finite number of lines! In three dimensions, successive triplets lie on a finite number of planes. 27-23

Example 27.6: Spectral Test Plot of overlapping pairs! All points lie on three straight lines.! Or: 27-24

Example 27.6 (Cont)! In three dimensions, the points (x n, x n-1, x n-2 ) for the above generator would lie on five planes given by: Obtained by adding the following to equation Note that k+k 1 will be an integer between 0 and 4. 27-25

Spectral Test (More)! Marsaglia (1968): Successive k-tuples obtained from an LCG fall on, at most, (k!m) 1/k parallel hyper-planes, where m is the modulus used in the LCG.! Example: m = 2 32, fewer than 2,953 hyper-planes will contain all 3-tuples, fewer than 566 hyper-planes will contain all 4- tuples, and fewer than 41 hyper-planes will contain all 10- tuples. Thus, this is a weakness of LCGs.! Spectral Test: Determine the max distance between adjacent hyper-planes.! Larger distance worse generator! In some cases, it can be done by complete enumeration 27-26

Example 27.7! Compare the following two generators:! Using a seed of x 0 =15, first generator:! Using the same seed in the second generator: 27-27

Example 27.7 (Cont)! Every number between 1 and 30 occurs once and only once Both sequences will pass the chi-square test for uniformity 27-28

! First Generator: Example 27.7 (Cont) 27-29

Example 27.7 (Cont)! Three straight lines of positive slope or ten lines of negative slope! Since the distance between the lines of positive slope is more, consider only the lines with positive slope.! Distance between two parallel lines y=ax+c 1 and y=ax+c 2 is given by! The distance between the above lines is or 9.80. 27-30

! Second Generator: Example 27.7 (Cont) 27-31

Example 27.7 (Cont)! All points fall on seven straight lines of positive slope or six straight lines of negative slope.! Considering lines with negative slopes:! The distance between lines is: or 5.76.! The second generator has a smaller maximum distance and, hence, the second generator has a better 2-distributivity.! The set with a larger distance may not always be the set with fewer lines. 27-32

Example 27.7 (Cont)! Either overlapping or non-overlapping k-tuples can be used. " With overlapping k-tuples, we have k times as many points, which makes the graph visually more complete.the number of hyper-planes and the distance between them are the same with either choice.! With serial test, only non-overlapping k-tuples should be used.! For generators with a large m and for higher dimensions, finding the maximum distance becomes quite complex. See Knuth (1981) 27-33

Summary 1. Chi-square test is a one-dimensional test Designed for discrete distributions and large sample sizes 2. K-S test is designed for continuous variables 3. Serial correlation test for independence 4. Two level tests find local non-uniformity 5. k-dimensional uniformity = k-distributivity tested by spectral test or serial test 27-34

Homework 27! Submit detailed answer to Exercise 27.3. Print 10,000 th number also. 27-35

Exercise 27.1 Generate 10,000 numbers using a seed of x 0 =1 in the following generator: Classify the numbers into ten equal size cells and test for uniformity using the chi-square test at 90% confidence. 27-36

Exercise 27.2 Generate 15 numbers using a seed of x 0 =1 in the following generator: Perform a K-S test and check whether the sequence passes the test at a 95% confidence level. 27-37

Exercise 27.3 Generate 10,000 numbers using a seed of x 0 =1 in the following LCG: Perform the serial correlation test of randomness at 90% confidence and report the result. 27-38

Exercise 27.4 Using the spectral test, compare the following two generators Which generator has a better 2-distributivity? 27-39