Introduction to the article Degrees of Freedom.



Similar documents
Unit 11 Using Linear Regression to Describe Relationships

A technical guide to 2014 key stage 2 to key stage 4 value added measures

v = x t = x 2 x 1 t 2 t 1 The average speed of the particle is absolute value of the average velocity and is given Distance travelled t

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

Review of Multiple Regression Richard Williams, University of Notre Dame, Last revised January 13, 2015

MECH Statics & Dynamics

Assessing the Discriminatory Power of Credit Scores

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

T-test for dependent Samples. Difference Scores. The t Test for Dependent Samples. The t Test for Dependent Samples. s D

1) Assume that the sample is an SRS. The problem state that the subjects were randomly selected.

Method of Moments Estimation in Linear Regression with Errors in both Variables J.W. Gillard and T.C. Iles

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

Independent Samples T- test

MSc Financial Economics: International Finance. Bubbles in the Foreign Exchange Market. Anne Sibert. Revised Spring Contents

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

Support Vector Machine Based Electricity Price Forecasting For Electricity Markets utilising Projected Assessment of System Adequacy Data.

A Spam Message Filtering Method: focus on run time

A note on profit maximization and monotonicity for inbound call centers

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

Linear Momentum and Collisions

Solution of the Heat Equation for transient conduction by LaPlace Transform

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

6. Friction, Experiment and Theory

Mixed Method of Model Reduction for Uncertain Systems

Redesigning Ratings: Assessing the Discriminatory Power of Credit Scores under Censoring

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

MBA 570x Homework 1 Due 9/24/2014 Solution

Brand Equity Net Promoter Scores Versus Mean Scores. Which Presents a Clearer Picture For Action? A Non-Elite Branded University Example.

Heat transfer to or from a fluid flowing through a tube

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

Physics 111. Exam #1. January 24, 2014

BUILT-IN DUAL FREQUENCY ANTENNA WITH AN EMBEDDED CAMERA AND A VERTICAL GROUND PLANE

Profitability of Loyalty Programs in the Presence of Uncertainty in Customers Valuations

Performance of a Browser-Based JavaScript Bandwidth Test

Ohm s Law. Ohmic relationship V=IR. Electric Power. Non Ohmic devises. Schematic representation. Electric Power

Report b Measurement report. Sylomer - field test

Growing Self-Organizing Maps for Surface Reconstruction from Unstructured Point Clouds

TRADING rules are widely used in financial market as

Unobserved Heterogeneity and Risk in Wage Variance: Does Schooling Provide Earnings Insurance?

Senior Thesis. Horse Play. Optimal Wagers and the Kelly Criterion. Author: Courtney Kempton. Supervisor: Professor Jim Morrow

A) When two objects slide against one another, the magnitude of the frictional force is always equal to μ

Chapter 10 Stocks and Their Valuation ANSWERS TO END-OF-CHAPTER QUESTIONS

Chapter 10 Velocity, Acceleration, and Calculus

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Bidding for Representative Allocations for Display Advertising

G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

Problem 1: The Pearson Correlation Coefficient (r) between two variables X and Y can be expressed in several equivalent forms; one of which is

Analysis of Mesostructure Unit Cells Comprised of Octet-truss Structures

σ m using Equation 8.1 given that σ

Progress 8 measure in 2016, 2017, and Guide for maintained secondary schools, academies and free schools

1 Introduction. Reza Shokri* Privacy Games: Optimal User-Centric Data Obfuscation

Evaluating Teaching in Higher Education. September Bruce A. Weinberg The Ohio State University *, IZA, and NBER

Acceleration-Displacement Crash Pulse Optimisation A New Methodology to Optimise Vehicle Response for Multiple Impact Speeds

12.4 Problems. Excerpt from "Introduction to Geometry" 2014 AoPS Inc. Copyrighted Material CHAPTER 12. CIRCLES AND ANGLES

EXPERIMENT 11 CONSOLIDATION TEST

Project Management Basics

Empirical correlations of overconsolidation ratio, coefficient of earth pressure at rest and undrained strength

Research Article An (s, S) Production Inventory Controlled Self-Service Queuing System

Health Insurance and Social Welfare. Run Liang. China Center for Economic Research, Peking University, Beijing , China,

Cluster-Aware Cache for Network Attached Storage *

Morningstar Fixed Income Style Box TM Methodology

SENSING IMAGES. School of Remote Sensing and Information Engineering, Wuhan University, 129# Luoyu Road, Wuhan,

Exposure Metering Relating Subject Lighting to Film Exposure

Turbulent Mixing and Chemical Reaction in Stirred Tanks

Scheduling of Jobs and Maintenance Activities on Parallel Machines

AN OVERVIEW ON CLUSTERING METHODS

Growth and Sustainability of Managed Security Services Networks: An Economic Perspective

Multi-Objective Optimization for Sponsored Search

HUMAN CAPITAL AND THE FUTURE OF TRANSITION ECONOMIES * Michael Spagat Royal Holloway, University of London, CEPR and Davidson Institute.

FLUID MECHANICS. TUTORIAL No.4 FLOW THROUGH POROUS PASSAGES

A Life Contingency Approach for Physical Assets: Create Volatility to Create Value

A New Optimum Jitter Protection for Conversational VoIP

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

Performance of Multiple TFRC in Heterogeneous Wireless Networks


Pipe Flow Calculations

1D STEADY STATE HEAT

THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON CRITICAL ILLNESS INSURANCE: A SIMULATION STUDY BASED ON UK BIOBANK ABSTRACT KEYWORDS

Availability of WDM Multi Ring Networks

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

Simulation of Power Systems Dynamics using Dynamic Phasor Models. Power Systems Laboratory. ETH Zürich Switzerland

Three Phase Theory - Professor J R Lucas

STRUCTURAL DESIGN NOTES TOPIC C PRESSURE VESSEL STRESS ANALYSIS J. E. Meyer revision of August 1996

Transcription:

Introduction to the article Degree of Freedom. The article by Walker, H. W. Degree of Freedom. Journal of Educational Pychology. 3(4) (940) 53-69, wa trancribed from the original by Chri Olen, George Wahington High School, Cedar Rapid, Iowa. Chri ha made every attempt to reproduce the "look and feel" of the article a well a the article itelf, and did not attempt in any way to update the ymbol to more "modern" notation. Three typographical error were found in the paper. Thee error are noted in the paragraph below. The article, except for pagination and placement of diagram, i a it originally appear. The trancribed page are not numbered to avoid confuion with pagination in the original article. Typographical error: () In the ection on t-ditribution (the 7th of thee note) the lat entence hould read The curve i alway ymmetrical, but i le peaked than the normal when n i mall. () In the ection (b) Variance of Regreed Value about Total Mean (the th page of x thee note) x and y are revered in the expreion Y% M y = r ( X M x ). It hould y read Y% M = r ( X M ) y x x y (3) In the ection Tet Baed on Ratio of Two Variance (the 4th page of thee ( ) r r note), the econd entence, we may divide by obtaining N r ( r ) r ( N ) hould read we may divide by obtaining. N r r ( N ). r Another poible confuion to modern ear may come in the ection entitled "Fditribution and z-ditribution." The z-ditribution mentioned i NOT the tandardized normal ditribution, but i a ditribution known a "Fiher' z ditribution." A potential problem in reading thi file (other than not having Word!) i --[that]-- the equation, which were inerted uing MathType from Deign Science. Chri ued Math Type 4.0, and if you have anything le it could be a problem. A Math Type reader program can be downloaded from the web. --[ www.mathtype.com. Follow the path to upport. ]--

Degree of Freedom. Journal of Educational Pychology. 3(4) (940) 53-69 DEGREES OF FREEDOM HELEN M. WALKER Aociate Profeor of Education, Teacher College, Columbia Univerity A concept of central importance to modern tatitical theory which few textbook have attempted to clarify i that of "degree of freedom." For the mathematician who read the original paper in which tatitical theory i now making uch rapid advance, the concept i a familiar one needing no particular explanation. For the peron who i unfamiliar with N-dimenional geometry or who know the contribution to modern ampling theory only from econdhand ource uch a textbook, thi concept often eem almot mytical, with no practical meaning. Tippett, one of the few textbook writer who attempt to make any general explanation of the concept, begin hi account (p. 64) with the entence, "Thi conception of degree of freedom i not altogether eay to attain, and we cannot attempt a full jutification of it here; but we hall how it reaonablene and hall illutrate it, hoping that a a reult of familiarity with it ue the reader will appreciate it." Not only do mot text omit all mention of the concept but many actually give incorrect formula and procedure becaue of ignoring it. In the work of modern tatitician, the concept of degree of freedom i not found before "Student'" paper of 908, it wa firt made explicit by the writing of R. A. Fiher, beginning with hi paper of 95 on the ditribution of the correlation coefficient, and ha only within the decade or o received general recognition. Neverthele the concept wa familiar to Gau and hi atronomical aociate. In hi claical work on the Theory of the Combination of Obervation (Theoria Combinationi Obervationum Erroribu Minimi Obnoxiae) and alo in a work generalizing the theory of leat quare with reference to the combination of obervation (Ergänzung zur Theorie der den kleinten Fehlern unterworfen Combination der Beobachtungen, 86), he tate both in word and by formula that the number of obervation i to be decreaed by the number of unknown etimated from the data to erve a divior in etimating the tandard error x of a et of obervation, or in our terminology σ = where r i the number of N r parameter to be etimated from the data. The preent paper i an attempt to bridge the gap between mathematical theory and common practice, to tate a imply a poible what degree of freedom repreent, why the concept i important, and how the appropriate number may be readily determined. The treatment ha been made a non-technical a poible, but thi i a cae where the mathematical notion i impler than any non-mathematical interpretation of it. The paper

will be developed in four ection: (I) The freedom of movement of a point in pace when ubject to certain limiting condition, (II) The repreentation of a tatitical ample by a ingle point in N-dimenional pace, (III) The import of the concept of degree of freedom, and (IV) Illutration of how to determine the number of degree of freedom appropriate for ue in certain common ituation. I. THE FREEDOM OF MOVEMENT OF A POINT IN SPACE WHEN SUBJECT TO CERTAIN LIMITING CONDITIONS A a preliminary introduction to the idea, it may be helpful to conider the freedom of motion poeed by certain familiar object, each of which i treated a if it were a mere moving point without ize. A drop of oil liding along a coil pring or a bead on a wire ha only one degree of freedom for it can move only on a one-dimenional path, no matter how complicated the hape of that path may be. A drop of mercury on a plane urface ha two degree of freedom, moving freely on a two-dimenional urface. A moquito moving freely in three-dimenional pace, ha three degree of freedom. Conidered a a moving point, a railroad train move backward and forward on a linear path which i a one-dimenional pace lying on a two-dimenional pace, the earth' urface, which in turn lie within a three-dimenional univere. A ingle coördinate, ditance from ome origin, i ufficient to locate the train at any given moment of time. If we conider a four-dimenional univere in which one dimenion i of time and the other three dimenion of pace, two coördinate will be needed to locate the train, ditance in linear unit from a patial origin and ditance in time unit from a time origin. The train' path which had only one dimenion in a pace univere ha two dimenion in a pace-time univere. A canoe or an automobile move over a two-dimenional urface which lie upon a three-dimenional pace, i a ection of a three-dimenional pace. At any given moment, the poition of the canoe, or auto, can be given by two coördinate. Referred to a four-dimenional pace-time univere, three coördinate would be needed to give it location, and it path would be a pace of three dimenion, lying upon one of four. In the ame ene an airplane ha three degree of freedom in the uual univere of pace, and can be located only if three coördinate are known. Thee might be latitude, longitude, and altitude; or might be altitude, horizontal ditance from ome origin, and an angle; or might be direct ditance from ome origin, and two direction angle. If we conider a given intant of time a a ection through the pace-time univere, the airplane move in a four-dimenional path and can be located by four coördinate, the three previouly named and a time coördinate. The degree of freedom we have been conidering relate to the motion of a point, or freedom of tranlation. In mechanic freedom of rotation would be equally important. A point, which ha poition only, and no ize, can be tranlated but not rotated. A real canoe can turn over, a real airplane can turn on it axi or make a noe dive, and o thee real bodie have degree of freedom of rotation a well a of tranlation. The parallelim

between the ampling problem we are about to dicu and the movement of bodie in pace can be brought out more clearly by dicuing freedom of tranlation, and diregarding freedom of rotation, and that ha been done in what follow. If you are aked to chooe a pair of number (x, y) at random, you have complete freedom of choice with regard to each of the two number, have two degree of freedom. The number pair may be repreented by the coördinate of a point located in the x, y plane, which i a two-dimenional pace. The point i free to move anywhere in the horizontal direction parallel to the xx' axi, and i alo free to move anywhere in the vertical direction, parallel to the yy' axi. There are two independent variable and the point ha two degree of freedom. Now uppoe you are aked to chooe a pair of number whoe um i 7. It i readily apparent that only one number can be choen freely, the econd being fixed a oon a the firt i choen. Although there are two variable in the ituation, there i only one independent variable. The number of degree of freedom i reduced from two to one by the impoition of the condition x + y = 7. The point i not now free to move anywhere in the xy plane but i contrained to remain on the line whoe graph i x + y = 7, and thi line i a one-dimenional pace lying in the original two-dimenional pace. Suppoe you are aked to chooe a pair of number uch that the um of their quare i 5. Again it i apparent that only one number can be choen arbitrarily, the econd being fixed a oon a the firt i choen. The point repreented by a pair of number mut lie on a circle with center at the origin and radiu 5. Thi circle i a one-dimenional pace lying in the original two-dimenional plane. The point can move only forward or backward along thi circle, and ha one degree of freedom only. There were two number to be choen (N = ) ubject to one limiting relationhip (r = ) and the reultant number of degree of freedom i N r = =. Suppoe we imultaneouly impoe the two condition x + y = 7 and x + y = 5. If we olve thee equation algebraically we get only two poible olution, x = 3, y = 4, or x = 4, y = 3. Neither variable can be choen at will. The point, once free to move in two direction, i now contrained by the equation x + y = 7 to move only along a traight line, and i contrained by the equation x + y = 5 to move only along the circumference of a circle, and by the two together i confined to the interection of that line and circle. There i no freedom of motion for the point. N = and r =. The number of degree of freedom i N r = = 0. Conider now a point (x, y, z) in three-dimenional pace (N = 3). If no retriction are placed on it coördinate, it can move with freedom in each of three direction, ha three degree of freedom. All three variable are independent. If we et up the retriction x + y + z = c, where c i any contant, only two of the number can be freely choen, only two are independent obervation. For example, let x y z = 0. If now we chooe, ay, x = 7 and y = 9, then z i forced to be. The equation x y z = c i the equation of a plane, a two-dimenional pace cutting acro the original three-

dimenional pace, and a point lying on thi pace ha two degree of freedom. bn r = 3 =. g If the coördinate of the (x, y, z) point are made to conform to the condition x + y + z = k, the point will be forced to lie on the urface of a phere whoe center i at the origin and whoe radiu i k. The urface of a phere i a twodimenional pace. (N = 3, r =, N r = 3 =.). If both condition are impoed imultaneouly, the point can lie only on the interection of the phere and the plane, that i, it can move only along the circumference of a circle, which i a one-dimenional figure lying in the original pace of three dimenion. ( N r = 3 =.) Conidered algebraically, we note that olving the pair of equation in three variable leave u a ingle equation in two variable. There can be complete freedom of choice for one of thee, no freedom for the other. There i one degree of freedom. The condition x = y = z i really a pair of independent condition, x = y and x = z, the condition y = z being derived from the other two. Each of thee i the equation of a plane, and their interection give a traight line through the origin making equal angle with the three axe. If x = y = z, it i clear that only one variable can be choen arbitrarily, there i only one independent variable, the point i contrained to move along a ingle line, there i one degree of freedom. Thee idea mut be generalized for N larger than 3, and thi generalization i necearily abtract. Too ardent an attempt to viualize the outcome lead only to confuion. Any et of N number determine a ingle point in N-dimenional pace, each number providing one of the N coördinate of that point. If no relationhip i impoed upon thee number, each i free to vary independently of the other, and the number of degree of freedom i N. Every neceary relationhip impoed upon them reduce the number of degree of freedom by one. Any equation of the firt degree connecting the N variable i the equation of what may be called a hyperplane (Better not try to viualize!) and i a pace of N dimenion. If, for example, we conider only point uch that the um of their coördinate i contant, X = c, we have limited the point to an N pace. If we conider only point uch that ( X M) = k, the locu i the urface of a hyperhpere with center at the origin and raidu equal to k. Thi urface i called the locu of the point and i a pace of N r dimenion lying within the original N pace. The number of degree of freedom would be N r. II. THE REPRESENTATION OF A, STATISTICAL SAMPLE BY A POINT IN N-DIMENSIONAL SPACE If any N number can be repreented by a ingle point in a pace of N dimenion, obviouly a tatitical ample of N cae can be o repreented by a ingle ample point. Thi device, firt employed by R. A. Fiher in 95 in a celebrated paper ( Frequency

ditribution of the value of the correlation coefficient in ample from an indefinitely large population ) ha been an enormouly fruitful one, and mut be undertood by thoe who hope to follow recent development. Let u conider a ample pace of N dimenion, with the origin taken at the true population mean, which we will call µ o that X µ = x, X µ = x, etc., where X, X,... X N are the raw core of the N individual in the ample. Let M be the mean and the tandard deviation of a ample of N cae. Any et of N obervation determine a ingle ample point, uch a S. Thi point ha N degree of freedom if no condition are impoed upon it coördinate. All ample with the ame mean will be repreented by ample point lying on the hyper-plane ( X µ ) + ( X µ ) +... + ( XN µ ) = N( M µ ), or X = NM, a pace of N dimenion. If all cae in a ample were exactly uniform, the ample point would lie upon the line X µ = X µ = X µ = = X µ = M µ which i the line OR in Fig., a ( ) ( ) ( ) ( ) 3... N line making equal angle with all the coördinate axe. Thi line cut the plane X = NM at right angle at a point we may call A. Therefore, A i a point whoe coördinate are each equal to M µ. By a well-known geometric relationhip, Fig.

( µ ) ( µ ) ( µ ) OS = X + X +... + X N OA ( µ ) = N M OS = OA + AS ( µ ) ( µ ) AS = X N M = X NM = N Therefore, OA= ( M µ ) N and AS = N. The ratio OA OS i thu M µ and i proportional to the ratio of the amount by which a ample mean deviate from the population mean to it own tandard error. The fluctuation of thi ratio from ample to ample produce what i known a the t-ditribution. For computing the variability of the core in a ample around a population mean which i known a priori, there are available N degree of freedom becaue the point S move in N-dimenional pace about O; but for computing the variability of thee ame core about the mean of their own ample, there are available only N degree of freedom, becaue one degree ha been expended in the computation of that mean, o that the point S move about A in a pace of only N dimenion. Fiher ha ued thee patial concept to derive the ampling ditribution of the correlation coefficient. The full derivation i outide the cope of thi paper but certain apect are of interet here. When we have N individual each meaured in two trait, it i cutomary to repreent the N pair of number by a correlation diagram of N point in two-dimenional pace. The ame data can, however, be repreented by two point in N-dimenional pace, one point repreenting the N value of X and the other the N value of Y. In thi frame of reference the correlation coefficient can be hown to be equal to the coine of the angle between the vector to the two point, and to have N degree of freedom. III. THE IMPORT OF THE CONCEPT If the normal curve adequately decribed all ampling ditribution, a ome elementary treatie eem to imply, the concept of degree of freedom would be relatively unimportant, for thi number doe not appear in the equation of the normal curve, the hape of the curve being the ame no matter what the ize of the ample. In certain other important ampling ditribution -- a for example the Poion -- the ame thing i true, that the hape of the ditriution i independent of the number of degree of freedom involved. Modern tatitical analyi, however, make much ue of everal very important ampling ditribution for which the hape of the curve change with the effective ize of the ample. In the equation of uch curve, the number of degree of freedom appear a a parameter (called n in the equation which follow) and probability table built from thee curve mut be entered with the correct value of n. If a mitake i made in determining n from the data, the wrong probability value will be obtained from the table, and the ignificance of the tet employed will be wrongly interpreted. The

Chi-quare ditribution, the t-ditribution, and the F and z ditribution are now commonly ued even in elementary work, and the table for each of thee mut be entered with the appropriate value of n. Let u now look at a few of thee equation to ee the rôle played in them by the number of degree of freedom. In the formula which follow, C repreent a contant whoe value i determined in uch a way a to make the total area under the curve equal to unity. Although thi contant involve the number of degree of freedom, it doe not need to be conidered in reading probability table becaue, being a contant multiplier, it doe not affect the proportion of area under any given egment of the curve, but erve only to change the cale of the entire figure. Normal Curve. y = Ce x σ The number of degree of freedom doe not appear in the equation, and o the hape of the curve i independent of it. The only variable to be hown in a probability table are x/ and y or ome function of y uch a a probability value. Chi-quare. ( χ ) n x y = C e The number of degree of freedom appear in the exponent. When n =, the curve i J-haped. When n =, the equation reduce to y= Ce and ha the form of the poitive half of a normal curve. The curve i alway poitively kewed, but a n increae it become more and more nearly like the normal, and become approximately normal when n i 30 or o. A probability table mut take account of three variable, the ize of Chi-quare, the number of degree of freedom, and the related probability value. t-ditribution x y= C3 + ( n+ ) t n The number of degree of freedom appear both in the exponent and in the fraction t / n. The curve i alway ymmetrical, but i more peaked than the normal when n i

mall. Thi curve alo approache the normal form a n increae. A table of probability value mut be entered with the computed value of t and alo with the appropriate value of n. A few elected value will how the comparion between etimate of ignificance read from a table of the normal curve and a t-table. For a normal curve, the proportion of area in both tail of the curve beyond 3 i.007. For a t-ditribution the proportion i a follow: n 5 0 0 p.04.096.030.04.007 Again, for a normal curve, the point uch that.0 of the area i in the tail i.56 from the mean. For a t-ditribution, the poition of thi point i a follow: n 3 5 0 0 30 x / σ 63.6 9.9 5.8 4.0 3..8.75 F-ditribution and z-ditribution n F y= C and y = C 4 n + n 5 n + n z ( nf + n ) ( ne + n ) e nz In each of thee equation, which provide the table ued in analyi of variance problem, there occur not only the computed value of F (or of z), but alo the two parameter n and n, n being the number of degree of freedom for the mean quare in the numerator of F and n the number of degree of freedom for that in the denominator. Becaue a probability table mut be entered with all three, uch a table often how the value for elected probability value only. The table publihed by Fiher give value for p =.05, p =.0, and p =.00; thoe by Snedecor give p =.05 and p =0.

Sampling Ditribution of r. Thi i a complicated equation involving a parameter the true correlation in the population,?; the oberved correlation in the ample, r; and the number of degree of freedom. If ρ = 0 the ditribution i ymmetrical. If ρ 0 and n i large, the ditribution become normal. If ρ 0 and n i mall the curve i definitely kewed. David' Table of the Correlation Coefficient (Iued by the Biometrika Office, Univerity College, London, 938) mut be entered with all three parameter. IV. DETERMINING THE APPROPRIATE NUMBER OF DEGREES OF FREEDOM A univeral rule hold: the number of degree of freedom i alway equal to the number of obervation minu the number of neceary relation obtaining among thee obervation. In geometric term, the number of obervation i the dimenionality of the original pace and each relationhip repreent a ection through that pace retricting the ample point to a pace of one lower dimenion. Impoing a relationhip upon the obervation i equivalent to etimating a parameter from them. For example, the relationhip X = NM indicate that the mean of the population ha been etimated from obervaiton. The number of degree of freedom i alo equal to the number of independent obervation, which i the number of original obervation minu the number of parmeter etimated from them. Standard Error of a Mean. --Thi i σ mean = σ N when i known for the population. A i eldom known a priori, we are uually forced to make ue of the oberved tandard deviation in the ample, which we will call. In thi cae σ meanb N, one degree of freedom being lot becaue deviation have been taken around the ample mean, o that we have impoed one limiting relationhip, X = NM, and have thu retricted the ample point to a hyperplane of N dimenion. Without any reference to geometry, it can be hown by an algebraic olution that NB σ N. (The ymbol B i to be read "tend to equal" or "approximate.") Goodne of Fit of Normal Curve to a Set of Data.--The number of obervation i the number of interval in the frequency ditribution for which an oberved frequency i compared with the frequency to be expected on the aumption of a normal ditribution. If thi normal curve ha an arbitrary mean and tandard deviation agreed upon in advance, the number of degree of freedom with which we enter the Chi-quare table to tet goodne of fit i one le than the number of interval. In thi cae one retriction i impoed; namely f = f ' f a theoretical frequency. If, where f i an oberved and ' however, a i more common, the theoretical curve i made to conform to the oberved data in it mean and tandard deviation, two additional retriction are impoed; namely

fx = fx ' and f ( X M ) = f '( X M ), o that the number of degree of freedom i three le than the number of interval compared. It i clear that when the curve are made to agree in mean and tandard deviation, the dicrepancy between oberved and theoretical frequencie will be reduced, o the number of degree of freedom in relation to which that dicrepancy i interpreted hould alo be reduced. Relationhip in a Contingency Table.--Suppoe we wih to tet the exitence of a relationhip between trait A, for which there are three categorie, and trait B, for which there are five, a hown in Fig.. We have fifteen cell in the table, giving u fifteen obervation, inamuch a an "obervation" i now the frequency in a ingle cell. If we want to ak whether there i ufficient evidence to believe that in the population from which thi ample i drawn A and B are independent, we need to know the cell frequencie which would be expected under that hypothei. There are then fifteen comparion to be made between oberved frequencie and expected frequencie. But. are all fifteen of thee comparion independent? If we had a priori information a to how the trait would be ditributed theoretically, then all but one of the cell comparion would be independent, the lat cell frequency being fixed in order to make up the proper total of one hundred fifty, and the degree of freedom would be 5 = 4. Thi i the ituation Karl Pearon had in mind when he firt developed hi Chi-quare tet of goodne of fit, and Table XII in Vol. I of hi Table for Statitician and Biometrician i made up on the aumption that the number of degree of freedom i one le than the number of obervation. To ue it when that i not the cae we merely readjut the valuof n with which we enter the table. In practice we almot never have a priori etimate of theoretical frequencie, but mut obtain them from the obervation themelve, thu impoing retriction on the number of independent obervation and reducing the degree of freedom available for etimating reliability. In thi cae, if we etimate the theoretical frequencie from the f ' = 0 40 /50 and other in imilar fahion. data, we would etimate the frequency ( )( ) Getting the expected cell frequencie from the oberved marginal frequencie impoe the following relationhip: ( a) f + f + f + f + f = 40 3 4 5 f + f + f + f + f = 60 3 4 5 f3+ f3+ f33+ f43+ f53 = 50 () b f + f + f = 0 3 f + f + f = 0 3 f + f + f = 35 3 3 33 f + f + f = 30 4 4 43 f + f + f = 50 5 5 53 () c f + f +... + f + f +... + f = 50 5 53

A A A3 A A A3 B 3 5 0 B f f f3 0 B 3 6 0 B f f f3 0 B 3 30 35 3 B3 f3 f3 f33 35 B 9 4 7 30 4 B4 f4 f4 f43 30 B 3 7 5 45 5 B5 f5 f5 f53 45 40 60 50 50 40 60 50 50 FIG. -Oberved joint frequency ditri- FIG. 3.--Oberved marginal frequencie of bution of two trait A and B. two trait A and B. At firt ight, there eem to be nine relationhip, but it i immediately apparent that (c) i not a new one, for it can be obtained either by adding the three (a) equation or the five (b) equation. Alo any one of the remaining eight can be obtained by appropriate manipulation of the other even. There are then only even independent neceary relationhip impoed upon the cell frequencie by requiring them to add up to the oberved marginal total. Thu n = 5 7 = 8 and if we compute Chi-quare, we mut enter the Chi-quare table with eight degree of freedom. The ame reult can be obtained by noting that two entrie in each row and four in each column can be choen arbitrarily and there i then no freedom of choice for the remaining entrie. In general in a contingency table, if c = number of column and r = number of row, the number of degree of freedom i n = bc gb r g or n = rc br + c g. Variance in a Correlation Table.--Suppoe we have a catter diagram with c column, the frequencie in the variou column being n, n,... n c, the mean value of Y for the column being m, m,... m c, and the regreion value of Y etimated from X beingy %%% n, Y,... Y c. Thu for any given column, the um of the Y' i i fy = nm i i. For the entire table N = n+ n +... + n c, c n i NM = fy, o that NM = nm + nm +... + nm. Now we may be intereted in the variance of all the core about the total mean, of all the core about their own column mean, of all the core about the regreion line, of regreed value about the total mean, of column mean about the total mean, or of column mean about the regreion line, and we may be intereted in comparing two uch variance. It i neceary to know how many degree of freedom are available for uch comparion. c c

(a) Total Variance.--For the variance of all core about the total mean, thi i N = ( Y M ), we have N obervation and only one retriction; namely, N fy = NM. Thu there are N degree, of freedom. (b) Variance of Regreed Value about Total Mean.--The equation for the regreed x value being Y% M y = r ( X M x ), it i clear that a oon a x i known, y i alo y known. The ample point can move only on a traight line. There i only one degree of freedom available for the variance of regreed value. Y (c) Variance of Score about Regreion Line.--There are N reidual of the form Y r xy. % and their variance i the quare of the tandard error of etimate, or y ( ) There are N obervation and two retriction; namely, and ( Y ) = 0 f Y% ( ) = y( xy ) f Y Y% N r. Thu there are N degree of freedom available. (d) Variance of Score about Column Mean.--If from each core we ubtract not the regreion value but the mean of the column in which it tand, the variance of the E where E i the correlation ratio obtained from reidual thu obtained will be y ( ) the ample. There are N uch reidual. For each column we have the retriction n i fy = nm i i, making c retriction in all. The number of degree of freedom for the variance within column i therefore N c. (e) Variance of Column Mean about Total Mean--To compute thi variance we have c obervation, i.e., the mean of c column, retricted by the ingle relation c NM = nm, and therefore have c degree of freedom. The variance itelf can be proved to be i i y E, and repreent the variance among the mean of column, (f) Variance of Column Mean about Regreion Line.--If for each column we find the difference mi % Yi between the column mean and the regreion value, and then find

c f ( ) i mi Y% i, the reult will be ( E r ) which i a variance repreenting the N y departure of the mean from linearity. There i one uch difference for each column, giving u c obervation, and thee obervation are retricted by the two relationhip c c f i( m Y i % i ) = 0 and f ( m i i Y % i) = N y ( E r ). Therefore, we have c degree of freedom. The following cheme how thee relationhip in ummary form: Source of variation Formula Degree of Freedom (d) Score about column mean... ( E ) N c (e) Mean about total mean..... E c (a) Total..... N (c) Score about regreion line... (b) Regreed value about total mean... (a) Total... ( r ) r N N (d) Score about column mean.... (f) Column mean about regreion line... (c) Score about regreion line... (b) Regreed value about total mean... (f) Column mean about regreion line... (e) Column mean about total mean.. (b) Regreed value about total mean.. (f) Column mean about regreion line. (d) Score about column mean.. (a) Total.. ( E ) ( ) ( r ) E r r ( ) E r E r ( ) ( E ) E r N c c N c c c N c N It i apparent that thee variance have additive relationhip and that their repective degree of freedom have exactly the ame additive relationhip.

Tet Baed on Ratio of Two Variance.--From any pair of thee additive variance, we may make an important tatitical tet. Thu, to tet whether linear correlation exit ( ) ( ) r r r N in the population or not, we may divide by obtaining. To N r tet whether a relationhip meaureable by the correlation ratio exit in the population, E ( E ) E N c we may divide by obtaining. To tet whether c N c E c ( E r ) E r correlation i linear, we may divide by or may c r c ( ) ( ) r obtaining ( ) E r E E r N c divide by obtaining. In each cae, the reulting c N c E c value i referred to Snedecor' F-table which mut be entered with the appropriate number of degree of freedom for each variance. Or we may find the logarithm of the ratio to the bae e, take half of it, and refer the reult to Fiher z-table, which alo mut be entered with the appropriate number of degree of freedom for each variance. Partial Correlation.--For a coefficient of correlation of zero order, there are N degree of freedom. Thi i obviou, ince a traight regreion line can be fitted to any two point without reidual, and the firt two obervation furnih no etimate of the ize of r. For each variable that i held contant in a partial correlation, one additional degree of freedom i lot, o that for a correlation coefficient of the pth order, the degree of freedom are N p. Thi place a limit upon the number of meaningful interrelationhip which can be obtained from a mall ample. A an extreme illutration, uppoe twenty-five variable have been meaured for a ample of twenty-five cae only, and all the intercorrelation computed, a well a all poible partial correlation-- the partial of the twenty-third order will of neceity be either + or, and thu are meaningle. Each uch partial will be aociated with 5 3 degree of freedom. If the partial were not + or the error variance fantatic ituation. σ ( r ) N p would become infinite, a

BIBLIOGRAPHY Dawon, S.: An Introduction to the Computation of Statitic. Univerity of London Pre, 933, p. 4. No general dicuion. Give rule for χ only. Ezekiel, M.: Method of Correlation Analyi. John Wiley & Son, 930, p.. Fiher, R. A.: "Frequency ditribution of the value of the correlation coefficient in ample from an indefinitely large population." Biometrika, Vol. x, 95, pp. 507-5. Firt application of n-dimenional geometry to ampling theory. Fiher, R. A.: Statitical Method for Reearch Worker. Oliver and Boyd. Thi ha now gone through even edition. The term "degree of freedom" doe not appear in the index, but the concept occur contantly throughout the book. Goulden, C. H.: Method of Statitical Analyi. John Wiley and Son, Inc., 939. See index. Guilford, J. P.: Pychometric Method. McGraw-Hill, 936, p. 308. Mill, F. C.: Statitical Method Applied to Economic and Buine. Henry Holt & Co., nd ed., 938. See index. Rider, P.R.: "A urvey of the theory of mall ample." Annal of Mathematic. Vol. xxxi, 930, pp. 577-68. Publihed a a eparate monograph by Princeton Univerity Pre, $.00. Give geometric approach to ampling ditribution. Rider, P. R.: An Introduction to Modern Statitical Method. John Wiley and Son, Inc., 939. See index. While there i no general explanation of the meaning of degree of freedom, thi book give a careful and detailed explanation of how the number of degree of freedom i to be found in a large variety of ituation. Snedecor, G. W.: Statitical Method. Collegiate Pre, Inc., 937, 938. See index. Snedecor, G. W.: Calculation and Interpretation of Analyi of Variance and Covariance. Collegiate Pre, Inc., 934, pp. 9-0. Tippett, L. H. C.: The Method of Statitic. William and Norgate, Ltd., 93. One of the few attempt to treat the concept of degree of freedom in general term, but without geometric background, i made on page 64-65. Yule and Kendall: Introduction to the Theory of Statitic. Charle Griffin & Co. London, 937, pp. 45-46, 436.