Hypothesis testing using complex survey data
|
|
|
- Ruby Stokes
- 10 years ago
- Views:
Transcription
1 Hypotesis testig usig complex survey data A Sort Course preseted by Peter Ly, Uiversity of Essex i associatio wit te coferece of te Europea Survey Researc Associatio Prague, 5 Jue 007 1
2 1. Objective: Simple Hypotesis Tests Survey data are ofte used to test ypoteses. Hypoteses of iterest are typically complex, ivolvig several variables, for example: - Differeces i pay betwee me ad wome i urba areas ca be explaied by differeces i occupatio, ours worked ad legt of time i post But i tis course te examples we will use will be simple ypoteses. Te ideas exted to more complex ypoteses. Cosider te followig questio, wic is asked o te Europea Social Survey (ESS): Geerally speakig, would you say tat most people ca be trusted, or tat you ca t be too careful i dealig wit people? Please tell me o a score of 0 to 10, were 0 meas you ca t be too careful ad 10 meas tat most people ca be trusted. Most people ca be trusted You ca t be too careful (Do t kow) We migt be iterested i weter te mea score give i reply to tis questio (ppltrst) differs betwee atios. If te mea score is iger i oe atio ta aoter, te we migt coclude tat people i te first atio are more trustig ta people i te secod atio. Te mea scores give i te Czec Republic (), Hugary (), Sloveia (), Frace () ad Portugal () by ESS roud 1 respodets (00-03) were as follows: It would appear tat te Frec are te most trustig amogst tese five atios, wit te Sloveias te least trustig. But are tese differeces i meas sigificat? I oter words, are we cofidet tat tey reflect true differeces i meas betwee te respective populatios as a wole? To aswer tis questio, we eed more iformatio ta just te meas temselves. To see just wat iformatio we eed, we must cosider samplig teory.
3 Frequecy. Revisio of some basic samplig teory Samplig teory allows us to make statemets about te precisio of a sample estimate. Essetially, tese are statemets about ow likely it is tat a sample estimate falls witi a particular distace of te true populatio value of wic it is a estimate. Tis likeliood - or probability depeds solely o te sample desig. A sample desig, D, defies a large set of possible samples tat could be selected. For a particular estimator, E e.g. mea score o te ESS trust questio eac of tose samples will provide a estimate. Te estimates will vary over te samples. Te complete set of possible estimates is kow as te samplig distributio of estimator E uder sample desig D. For most sample desigs used i social surveys ad for may of te kids of estimators i wic we are typically iterested, samplig distributios are approximately ormally distributed, meaig tat tey ave a bell sape: Estimate Te ormal distributio as some useful properties. It is symmetric. Ad tere is a kow relatiosip betwee te distace from te cetre of te distributio (i terms of stadard deviatios) ad te proportio of te area uder te curve covered. For example, plus or mius 1.96 stadard deviatios covers 95% of te area uder te curve. I te case of a samplig distributio, tis meas tat 95% of te samples tat migt be selected uder desig D will produce a estimate tat is witi 1.96 stadard deviatios of te true populatio value (assumig tat te samplig distributio is cetred o te true value). So, to make a precisio statemet of te form, tere is a 95% cace tat te true value is witi plus or mius z uits of our sample estimate, we eed oly to be able to estimate te stadard deviatio of te samplig distributio of te estimator oterwise kow as te stadard error of te estimate. Tis is te extra iformatio tat we eed i order to assess weter observed differeces i meas are sigificat. 3
4 Let s cosider te case of simple radom samplig (SRS). It is a somewat artificial case as SRS is rarely used i practice. But it is useful, for tree reasos: - Te teory is relatively simple, so it is a comfortable place to start; - SRS provides a stadard desig wic we ca use as a becmark, agaist wic to compare oter more realistic - desigs; - Muc data aalysis software carries out calculatios uder te assumptio tat te data are from a SRS eiter by default or as te oly optio. We sould try to uderstad wat our software is doig. SRS is a sample desig were every uit i te study populatio as a equal, ad idepedet, probability of selectio. Note tat may of te features ofte used i practical sample desig, suc as stratificatio, clusterig ad te use of variable samplig fractios, are ot permissible witi te defiitio of SRS. Stratificatio ad clusterig bot cause selectio probabilities to be depedet; variable samplig fractios cause selectio probabilities to be uequal. If we select a SRS of uits from a populatio of N uits, te (samplig) variace of te sample mea of a variable y will be: S Vary 1 N - (1) were S Var y N i1 y i y N 1 ad yi i y 1. I most data aalysis software, if you request te variace of a mea, tis is te quatity tat will be estimated (by default). I fact, te term 1 - kow as te fiite N populatio correctio - will almost certaily be igored, as te software does ot kow N, te size of te populatio. Igorig tis term usually makes o differece as te value of tis term is typically very close to 1.0. Ad S will most likely be estimated by its sample aalogue, s. So te estimate provided by te software will be: s Var ˆ y - () Te stadard error is te square root of te variace, so te estimated stadard error is simply te square root of te estimated variace as i (). 4
5 3. Testig Differeces i Mea Scores Te estimated stadard errors of te mea trust scores (assumig SRS) are: Natio Mea Std. Err So ow we ca estimate 95% cofidece itervals aroud te meas, as tese are plus or mius 1.96 stadard errors. Our software gives us: Natio Mea Std. Err. [95% Cof. Iterval] But ow does tis elp us to assess weter te meas are differet from oe aoter? Well, if we compare te cofidece itervals for ad we see tat tey do ot overlap at all. So it seems very ulikely tat te true values for tose two coutries are te same. But if we compare, say, ad we fid tat te itervals overlap (sligtly). So we still caot be sure weter te differece is sigificat. We eed to state a formal ypotesis. We usually do tis i terms of a ull ypotesis, for example: H 0 : Y Y We te carry out a test to determie weter te data cotai evidece to reject te ull ypotesis. If te test rejects te ull ypotesis te we would say tat we ave evidece tat te meas for ad differ. A appropriate test for a differece i meas is a Wald test. We ca ask our software to perform tis for us: [ppltrst] - [ppltrst] = 0: F(1, 30970) = 5.87; Prob > F = So, tere appears to be a probability of oly , or 1.54%, tat we would ave observed a differece i meas at least as large as te oe actually observed, if te true meas were te same. We migt say tat at te 0.05 level we would reject te ull ypotesis of equal meas i ad. So, te Frec are more trustig ta te Czecs! Te figure below sows te estimated cofidece itervals for te mea trust score for all five atios: 5
6 Low er F-test results of comparisos betwee ad eac of te oter coutries are as follows: [ppltrst] - [ppltrst] = 0: F(1, 30970) = 5.87; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 5; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 8.19; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 8.99; Prob > F = Variable Samplig Fractios However, te estimates preseted so far all assume tat te sample i eac atio is SRS. I fact, te ESS sample desig is ot SRS i ay of tese atios (see Ly et al 007). ad bot selected teir ESS roud 1 sample from teir atioal populatio register, eablig tem to select persos wit equal probabilities. But te oter tree atios used sample desigs i wic selectio probabilities varied betwee uits (persos). I all tree cases te uits listed ad selected were addresses or ouseolds rater ta persos. Te, i te field, iterviewers would radomly select oe perso at te address to iterview. Tis results i persos livig aloe avig greater selectio probabilities ta persos livig i -perso ouseolds, etc. We a sample desig ivolves variable samplig fractios, desig weigts sould be used i order to permit desig-ubiased estimatio. Desig weigts simply make eac observatio cotribute to te estimate i iverse proportio to its selectio probability. If ouseolds were selected wit equal probabilities ad te oe perso selected at radom at eac ouseold te, compared to persos livig aloe, tose livig i -perso ouseolds would receive a relative desig weigt of.0, tose i 3-perso ouseolds a weigt of 3.0, ad so o. A weigted sample mea (estimate of populatio mea) would be calculated as follows: 6
7 y i1 i w i w y i i - (3) We sould take te desig weigts ito accout i estimatig te mea trust scores. If we ask our software to estimate meas usig (3), avig specified wic variable o te data set cotais te desig weigt, w i, we obtai: Natio Mea Std. Err. [95% Cof. Iterval] Note tat te estimates for bot ad are exactly te same as before, but for te oter tree atios bot te estimate of te mea ad te widt of te cofidece itervals ave caged. Tese cages also affect te results of our tests of differeces, wic are ow as follows: [ppltrst] - [ppltrst] = 0: F(1, 30970) = 9; Prob > F = 0.03 [ppltrst] - [ppltrst] = 0: F(1, 30970) = 5.73; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 1.98; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) =11.49; Prob > F = It seems tat by igorig te desig weigts, as we did earlier, we were over-estimatig te sigificace of te differeces betwee ad bot ad, but uder-estimatig te sigificace of te differece betwee ad. Tis ca be see i te plot of te estimated cofidece itervals, usig weigted data: Low er
8 Te itervals for bot ad ow overlap wit tat for more ta before, wile te iterval for overlaps wit less ta before. 5. Some More Samplig Teory I fact, desig weigts affect ot oly estimates of meas but also te variace of tose estimates. Tis ca be see i te expressio for te variace of a mea uder stratified simple radom samplig, as we ca tik of te weigtig classes as strata (compare tis wit expressio (1)): H N S Var y 1 1 N N - (4) Note tat te desig weigts are populatio correctios), we ca rewrite tis as: H N w ad tat N N 1 H w 1 y S Var - (5) H w 1, so (if we igore te fiite Tis ca be estimated from te survey data provided we kow te desig weigts for eac sample uit (te s provide estimates of S ). We ca ask our software to estimate stadard errors ad cofidece itervals takig ito accout te desig weigts: Natio Mea Std. Err. [95% Cof. Iterval] Note tat te estimates of stadard error are ow larger ta i te previous aalysis for te tree atios tat do ot ave equal-probability desigs. Te stadard error estimate as icreased by a factor of 1.39 for, 1.11 for ad 1.11 for. Tese factors may be referred to as mis-specificatio factors : te factor by wic te stadard error is uderestimated due to mis-specifyig te data structure. Te mis-specificatio factor is closely related to, toug ot idetical to, te desig factor. Te desig factor due to te use of variable samplig fractios is te icrease i stadard errors relative to a SRS. Te tests of differeces are ow as follows: 8
9 [ppltrst] - [ppltrst] = 0: F(1, 30970) = 3.71; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 5.06; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 1.7; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) =10.6; Prob > F = Te P-values ave icreased i all cases. I particular, te P-value for te - differece is ow larger ta 0.05, so we would o loger reject at tis level te ull ypotesis of equal meas i ad. Remember tat te P-value for tis compariso was oly i our iitial aalysis were we igored desig weigts completely. Agai, we ca see tis grapically, as te cofidece itervals for ad clearly overlap more ta i te previous aalyses: Lower Clusterig Te use of variable samplig fractios (ad ece desig weigts) is ot te oly way i wic te ESS sample desigs differ from SRS. I all five coutries, multi-stage samples are selected, resultig i samples tat are clustered. Tis as te potetial to affect stadard errors of estimates. I geeral, if clusters are more omogeeous ta te overall populatio, wic is ofte te case, sample clusterig will icrease te size of stadard errors. Te form of te variace of a mea gets complicated if we ave bot variable samplig fractios ad a multi-stage clustered desig (see, e.g., StataCorp 005, p.61), but te approximate effect of a clustered desig is to icrease te variace by a factor of: Deff cy * y 1 b 1 - (6) were b * is a weigted mea cluster sample size ad y is te itra-cluster correlatio for y (see Kis 1965, pp ; Ly & Gabler 005). 9
10 If we ask our software to take ito accout te sample clusterig as well as te desig weigts, we get te followig estimates: Natio Mea Std. Err. [95% Cof. Iterval] [ppltrst] - [ppltrst] = 0: F(1, 30970) =.65; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 3.04; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 0.73; Prob > F = [ppltrst] - [ppltrst] = 0: F(1, 30970) = 6.55; Prob > F = Low er Wat we observe is tat if we take te relevat features of te sample desig ito accout, te mea for is ot sigificatly differet from te mea for, or at te 0.05 level. It is differet from te mea for at te 0.05 level, but ot at te 0.01 level. Tis cotrasts sarply wit te results tat we obtaied wit our aïve aalysis, assumig SRS. I tat case it seemed tat all four of te differeces were sigificat at te 0.05 level ad two of tem at te 0.01 level. Takig te sample desig correctly ito accout alters te coclusios! Furtermore, we ave see tat te differeces i te estimates of stadard errors are partly due to te effect of variable samplig fractios ad partly due to te effect of clusterig of te samplig so it is importat to take bot tese factors ito accout. 10
11 7. Aoter Example Aoter questio o te ESS (ppllp) as a similar structure to te oe aalysed above, but a differet topic: Do you tik tat most people would try to take advatage of you if tey got te cace, or would tey try to be fair? Most people Most people would try to would try to (Do t take advatage be fair kow) of me If we ru equivalet aalyses to tose preseted above, agai usig ESS roud 1 data, we obtai te followig results: 7.1: Results assumig SRS Natio Mea Std. Err. [95% Cof. Iterval] [ppllp] - [ppllp] = 0: F(1, 30970) = 9.6; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 5.95; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 3.63; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 10.59; Prob > F = Low er
12 Differeces betwee te mea for ad ad appear igly sigificat (P<0.01); te differece wit appears sigificat at te 0.05 level (P=0.015) ad te differece wit is almost sigificat at te 0.05 level (P=0.064). 7.: Results usig weigted meas but assumig SRS i variace estimatio Natio Mea Std. Err. [95% Cof. Iterval] [ppllp] - [ppllp] = 0: F(1, 30970) = 3.77; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 5.85; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 0.7; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 10.47; Prob > F = Lower Te mai cage ere is tat te weigted mea for is iger ta te uweigted mea, wit te result tat te mea for o loger appears sigificatly differet from tat for. 7.3: Results takig accout of weigtig, but ot clusterig, i variace estimatio Natio Mea Std. Err. [95% Cof. Iterval]
13 [ppllp] - [ppllp] = 0: F(1, 30970) = 18.88; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 5.15; Prob > F = 0.03 [ppllp] - [ppllp] = 0: F(1, 30970) = 0.18; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 9.3; Prob > F = Low er P-values ave icreased for all four tests, but te differeces are ulikely to affect coclusios. 7.4: Results takig accout of bot weigtig ad clusterig Natio Mea Std. Err. [95% Cof. Iterval] Low er
14 [ppllp] - [ppllp] = 0: F(1, 30970) = 15.38; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 3.0; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 0.10; Prob > F = [ppllp] - [ppllp] = 0: F(1, 30970) = 6.39; Prob > F = I tis example, te most dramatic impact of mis-specificatio is to over-state te differece i meas betwee ad. However, tis is maily caused by failure to apply desig weigts i estimatig te mea: te aalysis i sectio 7. already sowed o sigificat differece betwee ad, eve witout takig te desig ito accout. Te oter oticeable impact of mis-specificatio is to over-state te evidece of a differece betwee ad. Tis is caused etirely by te failure to estimate te variace of te estimates correctly (P=0.016 i 7., cf. P=0.074 i 7.4). 8. A Tird Example: Cage Betwee Rouds Here we are iterested i testig weter te mea score cages betwee rouds 1 (00-03) ad (004-05) of te ESS. We carry out te estimatio i te same four ways as previously, for te variable ppltrst for Luxembourg: 8.1: Results assumig SRS Roud Mea Std. Err. [95% Cof. Iterval] [ppltrst]1 - [ppltrst] = 0: F(1, 30970) = 5.89; Prob > F = : Results usig weigted meas but assumig SRS i variace estimatio Roud Mea Std. Err. [95% Cof. Iterval] [ppltrst]1 - [ppltrst] = 0: F(1, 30970) = 4.05; Prob > F = : Results takig accout of weigtig, but ot clusterig, i variace estimatio Roud Mea Std. Err. [95% Cof. Iterval] [ppltrst]1 - [ppltrst] = 0: F(1, 30970) =.93; Prob > F =
15 Te sample desig i Luxembourg was uclustered, so tere is o eed to take ito accout clusterig. I tis example, te test of a differece i meas, correctly takig ito accout te sample desig, provides o evidece at te 0.05 level of a differece (P=0.087). But igorig te weigts i variace estimatio would suggest evidece of a reductio i trustig betwee ESS rouds 1 ad (P=0.044). Ad additioally igorig te weigts i estimatig te meas would suggest eve stroger evidece of a reductio (P=0.015). 9. Some Commets o Software Implemetatio Te aalyses preseted ere were carried out i Stata. Te commads are quite simple to implemet, usig te SVY commads to take ito accout te sample desig. It is ecessary to ave a variable tat cotais te desig weigt (dweigt) ad a variable tat idicates te cluster, or primary samplig uit (psuit). For comparig te mea of ppltrst betwee te five coutries: svyset [pw = dweigt], psu(psuit) svy: mea ppltrst if (set==1 & essroud==1), over(ctcode) test [ppltrst]4 = [ppltrst]9 test [ppltrst]4 = [ppltrst]1 test [ppltrst]4 = [ppltrst]19 test [ppltrst]4 = [ppltrst]1 For comparig te mea of ppltrst betwee rouds 1 ad for Luxembourg: svyset [pw = dweigt] svy: mea ppltrst if ctcode==15, over(essroud) test [ppltrst]1 = [ppltrst] Similar commads are available i SPSS (i te Advaced Statistics module) ad i SUDAAN. Refereces Kis L (1965) Survey Samplig. New York: Wiley. Ly P & Gabler S (005) Approximatios to b * i te predictio of desig effects due to clusterig, Survey Metodology 31, Ly P, Häder S, Gabler S & Laaksoe S (007) Metods for acievig equivalece of samples i cross-atioal surveys: te Europea Social Survey experiece, Joural of Official Statistics 3, StataCorp (005) Stata Survey Data Referece Maual Release 9. Stata Press: College Statio, Texas. 15
Hypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
Analyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval
Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio
Confidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
PSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
Determining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about
A STRATIFIED SAMPLING PLAN FOR BILLING ACCURACY IN HEALTHCARE SYSTEMS
A STRATIFIED SAMPLING PLAN FOR BILLING ACCURACY IN HEALTHCARE SYSTEMS Jiracai Buddakulsomsiri a Partaa Partaadee b Swatatra Kacal a a Departmet of Idustrial ad Maufacturig Systems Egieerig, Uiversity of
5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
Output Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals
Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of
Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean
1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.
Center, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
I. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.
Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).
One-sample test of proportions
Oe-sample test of proportios The Settig: Idividuals i some populatio ca be classified ito oe of two categories. You wat to make iferece about the proportio i each category, so you draw a sample. Examples:
Practice Problems for Test 3
Practice Problems for Test 3 Note: these problems oly cover CIs ad hypothesis testig You are also resposible for kowig the samplig distributio of the sample meas, ad the Cetral Limit Theorem Review all
Chapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
Case Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
15.075 Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011
15.075 Exam 3 Istructor: Cythia Rudi TA: Dimitrios Bisias November 22, 2011 Gradig is based o demostratio of coceptual uderstadig, so you eed to show all of your work. Problem 1 A compay makes high-defiitio
CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)
CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:
Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
Quadrat Sampling in Population Ecology
Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may
1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
Lesson 15 ANOVA (analysis of variance)
Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi
Confidence Intervals
Cofidece Itervals Cofidece Itervals are a extesio of the cocept of Margi of Error which we met earlier i this course. Remember we saw: The sample proportio will differ from the populatio proportio by more
Research Method (I) --Knowledge on Sampling (Simple Random Sampling)
Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact
Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
Statistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
Chapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
Lesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig
Math C067 Sampling Distributions
Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters
THE TWO-VARIABLE LINEAR REGRESSION MODEL
THE TWO-VARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part
MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)
MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:
LECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)
No-Parametric ivariate Statistics: Wilcoxo-Ma-Whitey 2 Sample Test 1 Ma-Whitey 2 Sample Test (a.k.a. Wilcoxo Rak Sum Test) The (Wilcoxo-) Ma-Whitey (WMW) test is the o-parametric equivalet of a pooled
OMG! Excessive Texting Tied to Risky Teen Behaviors
BUSIESS WEEK: EXECUTIVE EALT ovember 09, 2010 OMG! Excessive Textig Tied to Risky Tee Behaviors Kids who sed more tha 120 a day more likely to try drugs, alcohol ad sex, researchers fid TUESDAY, ov. 9
Subject CT5 Contingencies Core Technical Syllabus
Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value
Confidence intervals and hypothesis tests
Chapter 2 Cofidece itervals ad hypothesis tests This chapter focuses o how to draw coclusios about populatios from sample data. We ll start by lookig at biary data (e.g., pollig), ad lear how to estimate
Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error
STA 2023 Practice Questios Exam 2 Chapter 7- sec 9.2 Formulas Give o the test: Case parameter estimator stadard error Estimate of stadard error Samplig Distributio oe mea x s t (-1) oe p ( 1 p) CI: prop.
Measures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
Normal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
Now here is the important step
LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"
Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu
Multi-server Optimal Badwidth Moitorig for QoS based Multimedia Delivery Aup Basu, Iree Cheg ad Yizhe Yu Departmet of Computig Sciece U. of Alberta Architecture Applicatio Layer Request receptio -coectio
Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).
BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly
Empirical Study on the Second-stage Sample Size
ASA Sectio o Survey Researc etods Epirical Study o te Secod-stage Saple Size a iu, ary Batcer, Rya Petska ad Ay uo a iu, Erst & oug P, 5 Coecticut Ave, W, Wasigto, C 0036 Abstract I a typical researc settig,
THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
A Mathematical Perspective on Gambling
A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal
Confidence Intervals for Linear Regression Slope
Chapter 856 Cofidece Iterval for Liear Regreio Slope Itroductio Thi routie calculate the ample ize eceary to achieve a pecified ditace from the lope to the cofidece limit at a tated cofidece level for
Chapter 5: Basic Linear Regression
Chapter 5: Basic Liear Regressio 1. Why Regressio Aalysis Has Domiated Ecoometrics By ow we have focused o formig estimates ad tests for fairly simple cases ivolvig oly oe variable at a time. But the core
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
Systems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: [email protected] Supervised
Present Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC [email protected] May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
Incremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?
5.4 Amortizatio Questio 1: How do you fid the preset value of a auity? Questio 2: How is a loa amortized? Questio 3: How do you make a amortizatio table? Oe of the most commo fiacial istrumets a perso
GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.
GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all
Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS
Uit 8: Iferece for Proortios Chaters 8 & 9 i IPS Lecture Outlie Iferece for a Proortio (oe samle) Iferece for Two Proortios (two samles) Cotigecy Tables ad the χ test Iferece for Proortios IPS, Chater
Chapter 14 Nonparametric Statistics
Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they
Chapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
INVESTMENT PERFORMANCE COUNCIL (IPC)
INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks
Central Limit Theorem and Its Applications to Baseball
Cetral Limit Theorem ad Its Applicatios to Baseball by Nicole Aderso A project submitted to the Departmet of Mathematical Scieces i coformity with the requiremets for Math 4301 (Hoours Semiar) Lakehead
Properties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
CHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
Solving Logarithms and Exponential Equations
Solvig Logarithms ad Epoetial Equatios Logarithmic Equatios There are two major ideas required whe solvig Logarithmic Equatios. The first is the Defiitio of a Logarithm. You may recall from a earlier topic:
Hypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
Modified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:
A Test of Normality Textbook Referece: Chapter. (eighth editio, pages 59 ; seveth editio, pages 6 6). The calculatio of p values for hypothesis testig typically is based o the assumptio that the populatio
Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
Maximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
Soving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
Decomposition of Gini and the generalized entropy inequality measures. Abstract
Decompositio of Gii ad the geeralized etropy iequality measures Stéphae Mussard LAMETA Uiversity of Motpellier I Fraçoise Seyte LAMETA Uiversity of Motpellier I Michel Terraza LAMETA Uiversity of Motpellier
THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY
- THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY BY: FAYE ENSERMU CHEMEDA Ethio-Italia Cooperatio Arsi-Bale Rural developmet Project Paper Prepared for the Coferece o Aual Meetig
CHAPTER 3 THE TIME VALUE OF MONEY
CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all
The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find
1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.
The Stable Marriage Problem
The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV [email protected] 1 Itroductio Imagie you are a matchmaker,
Descriptive Statistics
Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote
Valuing Firms in Distress
Valuig Firms i Distress Aswath Damodara http://www.damodara.com Aswath Damodara 1 The Goig Cocer Assumptio Traditioal valuatio techiques are built o the assumptio of a goig cocer, I.e., a firm that has
TI-83, TI-83 Plus or TI-84 for Non-Business Statistics
TI-83, TI-83 Plu or TI-84 for No-Buie Statitic Chapter 3 Eterig Data Pre [STAT] the firt optio i already highlighted (:Edit) o you ca either pre [ENTER] or. Make ure the curor i i the lit, ot o the lit
Sequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION
www.arpapress.com/volumes/vol8issue2/ijrras_8_2_04.pdf CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION Elsayed A. E. Habib Departmet of Statistics ad Mathematics, Faculty of Commerce, Beha
Amendments to employer debt Regulations
March 2008 Pesios Legal Alert Amedmets to employer debt Regulatios The Govermet has at last issued Regulatios which will amed the law as to employer debts uder s75 Pesios Act 1995. The amedig Regulatios
INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology
Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology
Estimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth
Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,
CDs Bought at a Bank verses CD s Bought from a Brokerage. Floyd Vest
CDs Bought at a Bak verses CD s Bought from a Brokerage Floyd Vest CDs bought at a bak. CD stads for Certificate of Deposit with the CD origiatig i a FDIC isured bak so that the CD is isured by the Uited
Chapter 5: Inner Product Spaces
Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples
Department of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships
Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the
COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS
COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat
