Chapter Determnaton of approprate Sample Sze Dscusson of ths chapter s on the bass of two of our publshed papers Importance of the sze of sample and ts determnaton n the context of data related to the schools of Guwahat whch was publshed n the Bulletn of the Gauhat Unversty Mathematcs Assocaton Vol. 1, 01 & An nvestgaton on effect of bas on determnaton of sample sze on the bass of data related to the students of schools of Guwahat whch was publshed n the Internatonal Journal of Appled Mathematcs and Statstcal Scences Vol., Issue 1, 013 In survey studes, once data are collected, the most mportant objectve of a statstcal analyss s to draw nferences about the populaton usng sample nformaton. "How bg a sample s requred?" s one of the most frequently asked questons by the nvestgators. If the sample sze s not taken properly, conclusons drawn from the nvestgaton may not reflect the real stuaton for the whole populaton. So, n ths chapter we have dscussed Importance of the sze of sample and the method of determnaton of a sample sze along wth the procedure of samplng n relaton to our study. If there s any effect of bas on determnaton of sample sze.00 Introducton: In spte of the applcaton of scentfc method and refnement of research technques, tools and desgns, educatonal research has not attaned the perfecton and scentfc status of physcal scences. Therefore, there s a great necessty to study properly about dfferent tools and technques of research methodology. Whle studyng a partcular phenomenon, the researchers of ths feld face a problem at the begnnng as 19
what may be the representatve sample. Very few research artcles are there whch deals wth the ssue of determnaton of sample sze. Sample sze calculaton for a study, from a populaton has been shown n many books e.g. Cochran (1977), Mark (005) and Sngh and Chaudhury (1985). The am of the calculaton s to determne an adequate sample sze whch can estmate results for the whole populaton wth a good precson. In other words, one has to draw nference or to generalze about the populaton from the sample data. The nference to be drawn s related to some parameters of the populaton such as the mean, standard devaton or some other features lke the proporton of an attrbute occurrng n the populaton. It s to be noted that a parameter s a descrptve measure of some characterstcs of the populaton whereas f the descrptve measure s computed from the observatons n the sample t s called a statstc. Parameter s constant for a populaton, but the correspondng statstc may vary from sample to sample. Statstcal nference generally adopts one of the two technques, namely, the estmaton of populaton parameters or testng of a hypothess. The process of obtanng an estmate of the unknown value of a parameter by a statstc s known as estmaton [39, 71, 86]. There are two types of estmatons vz. pont estmaton and nterval estmaton. If the nference about the populaton s to be drawn on the bass of the sample, the sample must conform to certan crtera: the sample must be representatve of the whole populaton [7, 64]. The queston arses as to what s a representatve sample and how such a sample can be selected from a populaton. The computaton of the approprate sample sze s generally consdered to be one of the most mportant steps n statstcal study. But t s observed that n most of the studes ths partcular step has been overlooked. The sample sze computaton must be done approprately because f the sample sze s not approprate for a partcular study then the nference drawn from the sample wll not be authentc and t mght lead to some wrong conclusons [49]. 0
Agan, when we draw nference about parameter from statstc, some knd of error arses. The error whch arses due to only a sample beng used to estmate the populaton parameters s termed as samplng error or samplng fluctuatons. Whatever may be the degree of cautousness n selectng sample, there wll always be a dfference between the parameter and ts correspondng estmate. A sample wth the smallest samplng error wll always be consdered a good representatve of the populaton. Bgger samples have lesser samplng errors. When the sample survey becomes the census survey, the samplng error becomes zero. On the other hand, smaller samples may be easer to manage and have less non-samplng error. Handlng of bgger samples s more expensve than smaller ones. The non-samplng error ncreases wth the ncrease n sample sze [116]. Fg.1,.: Fgures showng relatonshp between samplng error and sample sze 1
There are varous approaches for computng the sample sze [5, 57, 117]. To determne the approprate sample sze, the basc factors to be consdered are the level of precson requred by users, the confdence level desred and degree of varablty. ) Level of Precson : Sample sze s to be determned accordng to some pre assgned degree of precson. The degree of precson s the margn of permssble error between the estmated value and the populaton value. In other words, t s the measure of how close an estmate s to the actual characterstc n the populaton. The level of precson may be termed as samplng error. Accordng to W.G.Cochran (1977), precson desred may be made by gvng the amount of errors that are wllng to tolerate n the sample estmates. The dfference between the sample statstc and the related populaton parameter s called the samplng error. It depends on the amount of rsk a researcher s wllng to accept whle usng the data to make decsons. It s often expressed n percentage. If the samplng error or margn of error s ±5%, and 70% unt n the sample attrbute some crtera, then t can be concluded that 65% to 75% of unts n the populaton have attrbuted that crtera. Hgh level of precson requres larger sample szes and hgher cost to acheve those samples. ) Confdence level desred : The confdence or rsk level s ascertaned through the well establshed probablty model called the normal dstrbuton and an assocated theorem called the Central Lmt theorem. The probablty densty functon (p. d. f) of the normal dstrbuton wth parameters µ and σ s gven by ( ) ( x µ ) 1 = < < σ π σ p x e x where, µ s the mean and σ s the standard devaton.
In general, the normal curve results whenever there are a large number of ndependent small factors nfluencng the fnal outcome. It s for ths reason that many practcal dstrbutons, be t the dstrbuton of annual ranfall, the weght at brth of babes, the heghts of ndvduals etc. are all more or less normal, f suffcently large number of tems are ncluded n the populaton. The sgnfcance of the normal curve s much more than ths. It can be shown that even when the orgnal populaton s not normal, f we draw samples of n tems from t and obtan the dstrbuton of the sample means, we notce that the dstrbuton of the sample means become more and more normal as the sample sze ncreases. Ths fact s proved mathematcally n the Central Lmt theorem. The theorem says that f we take samples of sze n from any arbtrary populaton (wth any arbtrary dstrbuton) and calculate x, then samplng dstrbuton of x wll approach the normal dstrbuton as the sample sze n ncreases wth mean σ µ and standard error n.e. x ~ N µ, σ n A sample statstc s employed to estmate the populaton parameter. If more than one sample s drawn from the same populaton, then all the sample statstcs devate n one way or the other from the populaton parameter. In the case of large samples, where n >30, the dstrbuton of these sample statstc s a normal dstrbuton. Generally, a queston arses that how much should a sample statstc mss the populaton parameter so that t may be taken as a trustworthy estmate of the parameter. The confdence level tells how confdent one can be that the error toleraton does not exceed what was planned for n the precson specfcaton. Usually 95% and 99% of probablty are taken as the two known degrees of confdence for specfyng the nterval wthn whch one may ascertan the exstence of populaton parameter (e.g. mean). 95% confdence level means f an nvestgator takes 100 ndependent samples from the same populaton, then 95 out of the 100 samples wll provde an estmate wthn the precson set by hm. Agan, f the level of 3
confdence s 99%, then t means out of 100 samples 99 cases wll be wthn the error of tolerances specfed by the precson. In case of normal dstrbuton, the curve s sad to extend from -3σ dstance on the left to +3σ dstance on the rght. A well known result of the dstrbuton theory says that f X ~ N (, ) X µ µ σ then Z = s a standard normal varate.e. Z ~ N ( 0, 1). σ Whle calculatng the sample sze, the desred confdence level s specfed by the z value. The z-value s a pont along the abscssa of the standard normal dstrbuton. It s known from the table of normal curve that 95 percent of the total area of the curve falls wthn the lmts ±1.96σ, where σ s the standard devaton of the dstrbuton and 99 percent of that fall wthn the lmts ±.58σ. In other words, 95% of the area under the normal curve s specfed by the z-value of 1.96 and z- value of.58 wll specfy 99% of the cases under the normal curve. These wll represent confdence levels of 95% and 99% respectvely. Fg.3: Standard Normal Curve ) Degree of varablty: The degree of varablty n the attrbutes beng measured refers to the dstrbuton of attrbutes n the populaton. The more heterogeneous a populaton, the larger the 4
sample sze requred to be, to obtan a gven level of precson. For less varable (more homogeneous) populaton, smaller sample szes works ncely. Note that a proporton of 50% ndcates a greater level of varablty than that of 0% or 80%. Ths s because 0% and 80% ndcate that a large majorty do not or do, respectvely, have the attrbute of nterest. Because a proporton of 0.5 ndcates the maxmum varablty n a populaton, t s often used n determnng a more conservatve sample sze..01 Strateges for determnng sample sze: To determne a representatve sample sze from the target populaton, dfferent strateges can be used accordng to the necessty of the research work. Use of varous formulae for determnaton of requred sample szes under dfferent stuatons s one of the most mportant strateges. There are dfferent formulae for determnaton of approprate sample sze when dfferent technques of samplng are used. Here, we wll dscuss about the formulae for determnng representatve sample sze when smple random samplng technque s used. Smple random samplng s the most common and the smplest method of samplng. Each unt of the populaton has the equal chance of beng drawn n the sample. Therefore, t s a method of selectng n unts out of a populaton of sze N by gvng equal probablty to all unts. (a) Formula for proportons: ) Cochran s formula for calculatng sample sze when the populaton s nfnte: Cochran (1977) developed a formula to calculate a representatve sample for proportons as n z pq 0 = (.1) e 5
where, n 0 s the sample sze, z s the selected crtcal value of desred confdence level, p s the estmated proporton of an attrbute that s present n the populaton, q = 1 p and e s the desred level of precson []. For example, suppose we want to calculate a sample sze of a large populaton whose degree of varablty s not known. Assumng the maxmum varablty, whch s equal to 50% ( p =0.5) and takng 95% confdence level wth ±5% precson, the calculaton for requred sample sze wll be as follows-- p = 0.5 and hence q =1-0.5 = 0.5; e = 0.05; z =1.96 So, 0 ( 1.96) ( 0.5) ( 0.5) ( 0.05) n = =384.16=384 Agan, takng 99% confdence level wth ±5% precson, the calculaton for requred sample sze wll be as follows-- p = 0.5 and hence q =1-0.5 = 0.5; e = 0.05; z =.58 So, (. 58) ( 0. 5)( 0. 5) ( 0. 05) n = = 665. 64 = 666 0 Followng table shows sample szes for dfferent confdence level and precson. Table.1 Sample sze calculated for dfferent confdence level and precson Confdence level Sample sze (n 0 ) e =.03 e =.05 e =.1 95% 1067 384 96 99% 1849 666 166 6
) Cochran s formula for calculatng sample sze when populaton sze s fnte: Cochran ponted out that f the populaton s fnte, then the sample sze can be reduced slghtly. Ths s due to the fact that a very large populaton provdes proportonally more nformaton than that of a smaller populaton. He proposed a correcton formula to calculate the fnal sample sze n ths case whch s gven below n = 1+ n 0 ( n ) 0 1 N (.) Here, n 0 s the sample sze derved from equaton (.1) and N s the populaton sze. Now, suppose we want to calculate the sample sze for the populaton of our study where, populaton sze s N = 13191. Accordng to the formula (.1), the sample sze n0 wll be 666 at 99% confdence level wth margn of error equal to (0.05). If N s neglgble then n 0 s a satsfactory approxmaton to the sample sze. But n ths case, the sample sze (666) exceeds 5% of the populaton sze (13191). So, we need to use the correcton formula to calculate the fnal sample sze. Here, N = 13191, n 0 = 666 (determned by usng (.1)) 666 n = = 634. 03 = 634 ( 666 1) 1+ 13191 But, f the sample sze s calculated at 95% confdence level wth margn of error equal to (0.05), the sample sze become 384 whch does not need correcton formula. So, n ths case the representatve sample sze for our study s 384. ) Yamane s formula for calculatng sample sze : Yamane (1967) suggested another smplfed formula for calculaton of sample sze from a populaton whch s an alternatve to Cochran s formula. Accordng to hm, for a 95% confdence level and p = 0. 5, sze of the sample should be 7
N n = (.3) 1 + N e ( ) where, N s the populaton sze and e s the level of precson [131]. Let ths formula be used for our populaton, n whch N =13191 wth ±5% precson. Assumng 95% confdence level and p =0.5, we get the sample sze as 13191 n = = 388 1+ 13191 05 (. ) To see whch formula gves a better measure of the sample sze, we calculated sample szes for dfferent schools from ther respectve populaton whch we gathered durng our nvestgaton. Table. and.3 respectvely shows the sample values whch were calculated by Yamane s formula and Cochran s formula and we have plotted those values n fg..3. The fgure.3 shows that values calculated through both the formulae are n qute good agreement. Table.: Sample szes calculated by Yamane s formula Sl. no. of schools Populaton sze,n Sample sze, n for 95% confdence level: ±5% ±7% ±10% 1 450 1 136 8 58 9 150 85 3 693 54 158 87 4 799 66 163 89 5 806 67 163 89 6 845 7 164 89 7 858 73 165 90 8 89 76 166 90 9 909 78 167 90 10 9 79 167 90 11 9 85 85 169 91 1 1009 87 170 91 8
13 1058 90 171 91 14 1073 9 171 91 15 1115 94 173 9 16 1167 99 174 9 17 1184 99 174 9 18 156 303 176 93 19 198 305 176 93 0 13 307 177 93 1 1584 319 181 94 1908 330 184 95 Sl.no. of schools Populaton sze,n Table.3 Sample szes calculated by Cochran s formula Sample sze, n at 95% confdence level: Sample sze, n at 99% confdence level: ±5% ±7% ±10% ±5% ±7% ±10% 1 450 08 137 79 69 194 11 58 31 146 83 311 15 130 3 693 48 153 84 340 8 134 4 799 59 158 86 364 39 137 5 806 59 158 86 364 39 137 6 845 65 159 86 37 43 138 7 858 65 161 86 374 43 139 8 89 69 161 86 381 46 141 9 909 70 16 87 385 48 141 10 9 70 16 87 387 48 141 11 9 85 76 163 87 396 53 14 1 1009 78 165 88 398 53 143 13 1058 8 166 88 409 56 143 14 1073 8 166 88 411 56 144 15 1115 86 168 88 416 6 144 16 1167 89 168 89 44 64 146 17 1184 91 169 89 47 64 146 18 156 95 169 89 435 68 147 19 198 95 170 90 441 70 147 0 13 98 170 90 444 70 148 1 1584 310 175 91 469 81 151 1908 30 178 91 493 88 15 9
350 300 50 00 150 100 50 0 450 799 858 9 1058 1167 198 1908 Fg..4 x axs populaton sze, y axs sample sze. Values are calculated accordng to Yamane s formula and values accordng to Cochran s formula. The uppermost par s for 5%, mddle one for 7% and the lower one for 10% level of sgnfcance We want to menton here that though other formulae are also avalable n dfferent lteratures, the above two formulae are used extensvely n comparson to the others. After calculatng the representatve sample sze the man am of an nvestgator s to fnd the proper method of selectng samples. Samplng s smply the process of learnng about the populaton on the bass of sample collected from the populaton. Sample s consttuted by a part or fracton of the populaton. Thus, n the samplng technque, nstead of every unt of the populaton, only a part of t s studed and the conclusons are drawn for the entre populaton on the bass of the sample..0 Comparatve study of two dfferent methods of allocaton: In our study, for selecton of samples, stratfed random samplng technque has been adopted. The three categores of schools such as Government and Government Provncalsed schools under SEBA (Secondary Educaton Board of Assam), 30
Permtted prvate schools under SEBA, Afflated prvate schools under CBSE (Central Board of Secondary Educaton) of Guwahat were consdered as the three strata. The sample from each stratum s taken through smple random samplng technque. The stratfcaton s done to produce a gan n precson n the estmates of characterstcs of the whole populaton. The stratfcaton was done followng the prncples that ) The strata (.e. categores of schools) are non-overlappng and together comprse the whole populaton. ) The strata (.e. categores of schools) are homogeneous wthn themselves wth respect to the characterstcs under study All the VIII standard students of government, prvate ncludng SEBA and CBSE schools of Guwahat formed the populaton of the study. Intally, we estmated the sze of sample from a total of 13191 students of class VIII at 95% confdence level wth ± 5% level of precson whch was found to be 384. Thus, the sample sze of 384 students of 13 selected schools to examne performance of students n mathematcs s consdered under the present study. Ths sample can be consdered representatve of the student populaton of Guwahat, wth students comng from a wde range of soco-economc backgrounds and from each of the four types of schools such as normal Co-Educatonal, Co-Educatonal segregated by gender, only Boys and only Grls schools. The allocaton of the samples to the dfferent categores of schools was carred out through both the proportonal allocaton method and optmum allocaton method of stratfed random samplng. A. Sample sze through proportonal allocaton method : The proportonal allocaton method was orgnally proposed by Bowley (196). In ths method, the samplng fracton, N n s same n all strata. Ths allocaton was used to obtan a sample that can estmate sze of the sample wth greater speed and a hgher degree of precson. The allocaton of a gven sample of sze n to dfferent stratum was done n proporton to ther szes..e. n the th stratum, 31
n N n N = =1,, 3. Where n represents sample sze, N represents populaton sze of the th strata and N represents the populaton sze. In our study, N = 13191; n = 384. B. The sample sze through optmum allocaton method : The allocaton of the sample unts to the dfferent stratum s determned wth a vew to mnmze the varance for a specfed cost of conductng the survey or to mnmze the cost for a specfed value of the varance. The cost functon s gven by C = a k + Where, a s the observed cost whch s constant, c s the average cost of surveyng one unt n the th stratum. Therefore, the requred sample sze n dfferent stratum s gven by Where, n = sample sze for the study, of the th stratum. n c c n = n (.4) k N S N S c N = populaton sze for the study, S =varance If the average cost of surveyng per unt (.e. c ) s the same n all the strata, then, the optmum allocaton becomes the Neyman allocaton. As cost of expendture such as prntng of questonnares, sendng and collectng of questonnares etc. for dfferent categores of schools durng the survey by us are almost the same, therefore, we can use Neyman allocaton n order to determne sze of sample for each categores of school. So, n our case, the sample sze n dfferent categores of schools s gven by a smplfed form of (.4) whch s gven by 3
Where, S N nns n =, N S = P Q s the populaton varance of the th N 1 stratum. N = populaton sze of th stratum, P = proporton of students who secured 50% or more mark n annual examnaton n th stratum th number of students n category of school who sec ured 50% or more marks n mathematcs = th total number of students n category of school and Q = 1 P. Followng table llustrates the dstrbuton of the szes of samples n dfferent strata for proportonal and optmum allocaton methods whch were calculated on the bass of above dscusson. Table.4: Dstrbuton of sample students by category of schools Categores of school Total students N n (Prop) n (Opt) Govt.(SEBA) 5609 163 181 Prvate(SEBA) 3498 10 106 Prvate(CBSE) 4084 119 97 TOTAL 13191 384 384 33
.03 Calculaton of varances: The formula to calculate varances of mean for dfferent samplng methods are gven below: ) For smple random samplng: Var s N n n N ( ˆµ ) = R where, s n = pq n 1 p = proporton of Mark n annual examnaton who secured 50% and above n mathematcs n all the selected schools, q = 1 p, N = populaton sze, n = sample sze. ) For stratfed random samplng: Var ( ˆ µ ) = N ( N n ) St 1 N S n where, S = N P Q N 1 N =Total populaton sze, stratum, N =populaton sze of th stratum, n =sample sze of th a) For proportonal allocaton: Var b) For optmum allocaton : N S N n ˆµ, N n N ( ) St ( prop ) = Var ( ( ) w S ) ( ) = ws ˆµ St opt where n N w = N N 34
Followng table shows the varances of all the schools through dfferent methods. Table.5 Table showng varances: Method Var ( µˆ ) R Var ( µˆ ) St( prop) Var ( µˆ ) St( opt) Varances 0.00060839 0.0004673 0.00046.04 Gan n effcency (GE) n stratfed random samplng over smple random samplng wthout replacement : In order to observe how the sample sze gets affected due to dfferent types of allocaton, an analyss on gan n effcency (GE) due to dfferent types of allocatons s utmost requred. 1) Gan n Effcency (GE) due to proportonal allocaton : GE prop Var = ( ˆ µ ) ( ˆ R Var µ ) ( St ) prop 0.00060839 0.0004673 = = 0.3017333 = 0. 30 Var( ˆ µ ) ( ) 0.0004673 St prop ) Gan n Effcency (GE) due to optmum allocaton : GE opt Var = ( ˆ µ ) ( ˆ R Var µ ) ( St ) opt 0.00060839 0.00046 = = 0.33913 = 0. 3 Var( ˆ µ ) ( ) 0.00046 St opt From the above results t can be sad that optmum allocaton provdes lttle better estmates as compared to proportonal allocaton. But, the most serous drawback of optmum allocaton s the absence of the knowledge of the populaton varances.e. S s of dfferent strata n advance. In that case, the calculatons are carred out by performng a plot survey and by drawng smple random samples wthout replacement from each stratum as suggested by P. V. Sukhatme (1935) [51]. 35
Due to the above mentoned drawback, the allocaton of sample sze to dfferent strata for our study has been calculated by proportonal allocaton method. As shown above, by usng ths method we have ganed an effcency of 0.30 over the smple random samplng. After examnng the gan n effcency (GE) for allocaton of sample sze to each category of school, students were selected randomly from dfferent schools wthn that category. In the present study, students were selected from each school by usng Cochran formula at 95%confdence level wth ±15% margn of error. Out of these 13 schools, 6 are from Government SEBA; 3 are from Prvate. SEBA and 4 are from Prvate CBSE schools. In case of Prvate CBSE schools total sample sze s 119. But when students of 4 schools are taken nto consderaton, t becomes 131. Hence, to make t 119, from each of the 4 schools three students were not taken nto account. Followng table llustrates the dstrbuton of the sample by gender and category of schools. Table.6 The dstrbuton of sample sze for class VIII students of dfferent schools of Guwahat Category Sl. Name of school Populaton Sample sze Allotted sample sze of schools No. sze (max) Boys Grls Total 1 Ulubar H.S. 95 30 16 14 30 Dspur Vdyalaya 88 9 16 13 9 SEBA 3 Ganesh Mandr 11 31 17 14 31 (Govt.) Vdyalaya 4 Noonmat M.E. 79 8 1 16 8 School 5 Uzan Bazaar 43 _ Grls School 6 Arya Vdyapeeth 46 3 3 _ 3 Hgh School SEBA 7 Nchol s School 15 3 10 3 (Pvt.) 8 Asom Jatya 00 36 6 10 36 Vdyalaya 9 Holy Chld School 170 34 _ 34 34 CBSE(Pvt.) 10 Gurukul Grammar School 154 34 14 17 31 11 Maharsh Vdya 160 34 17 14 31 Mandr School 36
1 Sarala Brla Gyan 115 31 13 15 8 Jyot 13 Shankar Academy 118 3 17 1 9 Total 193 191 384.05 Comparatve study of effect of bas n the context of data of our study: It s well known that durng the collecton of sample unts, both samplng and nonsamplng errors creep nto the process. The non samplng errors occur because the procedures of observaton (data collecton) may not be perfect and ther contrbutons to the total error of survey may be substantally large, whch may affect survey results adversely. On the other hand, the samplng errors arse because a part (sample) from the whole (populaton) s taken for observaton n the survey. Snce n our study sample sze s 384, whch s qute large, hence, by vrtue of the Central Lmt Theorem (CLT) we can use normal probablty table to calculate the effect of bas for the questonnares used n order to collect the data. The total error s expressed as: Total Error ( TE) = Mean Square Error ( MSE) = Varanceof mean+ Squareof Bas Agan, Bas s the dfference between the estmated value of populaton mean and sample mean. Even wth estmators that are un-based n probablty samplng, errors of measurement and non response may produce bases n the numbers that we compute from the data. To examne the effect of bas, let us suppose that the estmate ˆµ s normally dstrbuted about a mean m and s at a dstance B from the true populaton value µ. Therefore, the amount of bas s B = m µ. As a statement about the accuracy of the estmate, we declare that the probablty that the estmate ˆµ s n error by more than 1.96σ s 0.05. 37
Ths can be calculated wth the help of the followng transformaton ( µ m) 1 σ σ π e µ + 1.96σ dµ = φ ( µ + 1.96σ ) Now puttng µ m = σ t n above ntegral, we get lower lmt of the range of ntegraton for t, as µ m + 1.96 = 1.96 B, σ σ where, B = m µ s the amount of bas that occurs for adjustng the sample sze for each strata. Thus, we requre to calculate bas by consultng the normal probablty table wth the help of the followng: 1 σ π 1.96 e B σ t B dt = φ 1.96 σ In table.7, effect of a bas B on the probablty of an error greater than 1.96σ has been shown n tabular form. The calculatons were carred out usng the normal probablty table. Data provded n table.7 are plotted n fgure.4 where Praobablty of error (less than -1.96 σ) and (greater than 1.96 σ) are plotted aganst B/σ values (x-axs). Table.7 Effect of a Bas B on the probablty of an error greater than 1.96σ B/σ Probablty of error Total <-1.96σ >1.96σ 0.01 0.044 0.056 0.0500 38
0.03 0.033 0.068 0.0501 0.05 0.0 0.081 0.0503 0.07 0.01 0.094 0.0506 0.09 0.00 0.0307 0.0509 0.10 0.0197 0.0314 0.0511 0.5 0.0136 0.0436 0.057 0.40 0.0091 0.0594 0.0685 0.55 0.0060 0.0793 0.0853 0.70 0.0039 0.1038 0.1077 0.85 0.005 0.1335 0.1360 1.00 0.0015 0.1685 0.1700 1.50 0.0003 0.38 0.331 0.35 0.3 0.5 0. < -1.96 σ > 1.96 σ 0.15 0.1 0.05 0 0 0.5 1 1.5 Fg..5: Praobablty of error (less than -1.96 σ) and (greater than 1.96 σ) vs B/σ values (x-axs) (Generated from the above table) 39
It s known that n order to compare a based estmator wth an unbased estmator, or two estmators wth dfferent amounts of bas, a useful crteron s the mean square error (MSE) of the estmates, measured from the populaton values that are beng estmated. The relatonshp between MSE and Bas s gven by ( ˆ µ ) = ( ˆ µ ) + ( ) MSE Varance of Bas In the followng tables varances for dfferent categores of schools of Guwahat, ncluded n the sample are shown. Total sample sze and sample szes n dfferent strata has been calculated wth margn of error ±0.05. But, whle calculatng the sample szes n the 13 selected schools, the margn of error was taken to be ±0.15; because greater precson requres larger sample szes, whch s not practcable n case of selecton of sample from dfferent schools. For ths dfference n precson, some bas may occur n the process and hence t becomes very mportant to calculate the bas and ts effect. Table.8 Varances for dfferent categores of schools Strata Sample sze, n No. of students securng 50 or more P Q Varances SEBA Govt. 163 55.34.66.00134493 SEBA Pvt. 10 74.73.7.00189458 CBSE Pvt. 119 10.86.14.000990608 In the followng tables probablty of an absolute error 1 MSE and 1.96 MSE for dfferent categores of schools are gven. Below each table, graphs of MSE, 1 MSE and 1.96 MSE versus B/ σ values (n x axs) are shown 40
Tables showng probablty of an absolute error 1 MSE and 1.96 MSE Table.9: For SEBA Govt.: V=0.00134493, p=0.34, q=0.66 B σ MSE 1 MSE 1. 96 MSE 0.01 0.00384493 0.060075 0.11535 0.03 0.00384493 0.060075 0.11535 0.05 0.00384493 0.060075 0.11535 0.07 0.00394493 0.068087 0.13105 0.09 0.00394493 0.068087 0.13105 0.10 0.00394493 0.068087 0.13105 0.5 0.00444493 0.0666703 0.130674 0.40 0.00604493 0.077749 0.15388 0.55 0.00864493 0.099781 0.1837 0.70 0.019449 0.113776 0.3001 0.85 0.0198449 0.14087 0.76109 1.00 0.030449 0.173911 0.340865 1.50 0.105745 0.35184 0.63736 Fg.6 : For SEBA Govt.: B values (x-axs) vs MSE, 1 MSE and 1 96 σ. MSE 0.7 0.6 0.5 MSE 1 MSE 1.96 MSE 0.4 0.3 0. 0.1 0 0.01 0.07 0.5 0.7 1.5 41
Table.10: For SEBA Pvt. V=0.00189458, p=0.73, q=0.7 B σ MSE 1 MSE 1. 96 MSE 0.01 0.00439458 0.066916 0.1993 0.03 0.00439458 0.066916 0.1993 0.05 0.00439458 0.066916 0.1993 0.07 0.00449458 0.0670416 0.13140 0.09 0.00449458 0.0670416 0.13140 0.10 0.00449458 0.0670416 0.13140 0.5 0.00499458 0.070673 0.138518 0.40 0.00659458 0.08107 0.159166 0.55 0.00919458 0.0958884 0.187941 0.70 0.0134946 0.116166 0.7686 0.85 0.003946 0.1481 0.79907 1.00 0.0307946 0.175484 0.343948 1.50 0.10695 0.3608 0.639016 Fg.7 : For SEBA Pvt. B values (x-axs) vs MSE, 1 MSE and 1 96 σ. MSE 0.7 0.6 0.5 MSE 1 MSE 1.96 MSE 0.4 0.3 0. 0.1 0 0.01 0.07 0.5 0.7 1.5 4
Table.11: For CBSE Pvt. V=0.000990608, p=0.86, q=0.14 B σ MSE 1 MSE 1. 96 MSE 0.01 0.00349061 0.0590814 0.115799 0.03 0.00349061 0.0590814 0.115799 0.05 0.00349061 0.0590814 0.115799 0.07 0.00359061 0.059917 0.117447 0.09 0.00359061 0.059917 0.117447 0.10 0.00359061 0.059917 0.117447 0.5 0.00409061 0.0639579 0.15357 0.40 0.00569061 0.0754361 0.147855 0.55 0.0089061 0.091058 0.178463 0.70 0.015906 0.1108 0.1997 0.85 0.0194906 0.139609 0.73633 1.00 0.098906 0.17889 0.33886 1.50 0.105391 0.34639 0.63693 Fg.8 : For CBSE Pvt.: B values (x-axs) vs MSE, 1 MSE and 1 96 σ. MSE 0.7 0.6 0.5 MSE 1 MSE 1.96 MSE 0.4 0.3 0. 0.1 0 0.01 0.07 0.5 0.7 1.5 43
Table.1: For All the schools: V=0.0004673, p=0.60, q=0.40 B σ MSE 1 MSE 1. 96 MSE 0.01 0.009673 0.054479 0.1067668 0.03 0.009673 0.054479 0.1067668 0.05 0.009673 0.054479 0.1067668 0.07 0.0030673 0.055383 0.108551 0.09 0.0030673 0.055383 0.108551 0.10 0.0030673 0.055383 0.108551 0.5 0.0035673 0.059768 0.1170645 0.40 0.0051673 0.0718839 0.140894 0.55 0.0077673 0.08813 0.177391 0.70 0.010673 0.109851 0.153083 0.85 0.0189673 0.137718 0.699347 1.00 0.093673 0.1713689 0.335883 1.50 0.1048673 0.3383 0.6347111 Fg.9 : For All the schools B values (x-axs) vs MSE, 1 MSE and 1 96 σ. MSE 0.7 0.6 0.5 MSE 1 MSE 1.96 MSE 0.4 0.3 0. 0.1 0 0.01 0.07 0.5 0.7 1.5 44
The followng fgure shows the comparson between the MSE of dfferent categores mentoned above. Fg.10: Comparson for all the schools: B values (x-axs) vs MSE of all categores of schools σ 0.1 0.1 0.08 SEBA govt SEBA pvt CBSE pvt. All schools 0.06 0.04 0.0 0 0.05 0.1 0.55 1 Use of the MSE as crteron to determne the accuracy of an estmator amounts to regardng two estmates that have the same MSE are equvalent. It has been shown by Hansen, Hurwtz and Madow that f for B, MSE s less than one half, then the σ estmator can be consdered almost dentcal wth ts true value [5]. The tables.9,.10,.11,.1 and ther correspondng graphs n fgures.5,.6,.7 and.8 hghlghts ths crteron n case of our study. So, we can conclude that the effect of bas n our study s neglgble and the estmatons derved from the selected samples wll be n good agreement wth ther correspondng values for the whole populaton. 45
.06 Conclusons: There are dfferent formulae gven by dfferent educatonsts for the determnaton of approprate sample szes. The researcher should choose the formula accordng to ther needs and convenence. In choosng the rght one, the researcher has to take nto consderaton about the maxmum budget, tme lmt, nature of the study along wth desred level of precson, confdence level and varablty wthn the populaton of nterest. Usng an adequate sample along wth hgh qualty data collecton wll result n more relable and vald results. 46