1 A Short Introduction Prepared by Mirya Holman
2 There are three kinds of data Qualitative Quantitative Ordinal
3 Qualitative (also called ordinal) data is distinguished by being a set of unordered categories. Qualitative variables differ in quality, not quantity or magnitude Examples: Race, gender
4 Quantitative (or interval) data varies in magnitude. Each possible value of a quantitative variable is greater than or smaller than any other possible value. Examples: Education, income Qualitative data can either be discrete, if it can take on a finite number of values The number of visits to the dentist last year or continuous, if it can take an infinite continuum of possible real number values The number of minutes it takes to finish a book
5 Ordinal data consists of categorical scales that have a natural ordering of values It does not have defined interval distances between the values. Ordinal data is usually transformed into interval data, or data that contains categorical scales with a defined interval distances between the values Examples: Political identification (Strong Democrat to Strong Republican) or Class (low, middle, high).
6 Coding variables is a way to change qualitative data to quantitative data We normally do this to perform statistical analysis on the qualitative data Coding a variable consistently assigns a numerical value to qualitative trait Example: Gender is a qualitative trait (or a variable without a natural ordering) We can assign male and female each a numerical value (say, zero and one). Now we have numbers to do statistics with!
7 We code the variables for 3 primary reasons: 1: We can run statistical models 2: Our computer programs will understand the variables 3: Accountability we can run models blind, or without knowing what variables stand for, in order to reduce programming / author bias.
8 Say that we want to look at employment discrimination settlements We are interested in whether the type of representation has an effect on the outcome of the case. We look at four types: Pro se, EEOC, appointed council, and other. Now, these are qualitative data. But! We want to know what effect the type of representation has on the amount received in a settlement
9 So we assign consistent numerical values to each type of representation, so that Pro se = 1 EEOC = 2 Appointed council = 3 Other = 4
10 Now we can run an ANOVA test, which will statistically compare the mean settlement amount for each type representation, and determine whether the differences are statistically significant. NOTE: Statistically significant, in this and many other applications, means that any difference you find can be attributed to differences within the data, and cannot be attributed to chance.
11 Asbestos cases: I want to investigate whether the nature of asbestos litigation changed between 1992 and How? By Coding!
12 Example 2 What is the process? Step 1: Each case is entered into a spreadsheet, including information on the number of plaintiffs, the number of defendants, the award amount (if any), the type of award(s), the claim, etc. Step 2: Each time we deal with a qualitative element of the case, we transform that into a quantitative descriptor Step 3: We can run statistical analysis on the data
13 Example 2 A How To: This is what the data looks like when we enter it in: Case # Plaintiff Defendant Award Type of Award Claim DAVID and Susan TAYLOR JOHN CRANE INC compensatory, loss of consortium mesothelioma 01L781 James and Terry Crawford ACandS Inc., et al compensatory, punitive, loss of consortiumesothelioma Andrew and Marietta Prebehall Harbison & Walker Co wrongful death, loss of consortium Lung cancer This is in qualitative form!
14 Example 2 We want to code the data, to transform it into quantitative data so, let s start with the claim: We decide that we are going to consistently assign each type of claim a numerical identifier: Case # PlaintiffDefendAward Type of Award Claim Claim DAVID JOHN C compensatory, loss of consortium mesothelioma 1 01L781 James anacands compensatory, punitive, loss of consortiumesothelioma Andrew Harbiso wrongful death, loss of consortium Lung cancer 3 The number we assign does not matter as much as the consistency in which we assign the code.
15 Example 2 Next, we tackle damages. Here it is easier to make separate columns for each type of damage, and then indicate with a 0/1 whether that damage was awarded: Award Type of Award Compensatory Punative Loss of consortium wrongful death ####### compensatory, loss of consortium E+07 compensatory, punitive, loss of consortium ,000 wrongful death, loss of consortium
16 Example 2 We can leave the damages amount alone, since it is already in numerical form We can transform the plaintiffs, by coding the number of defendants or the type of plaintiffs. Case # Plaintiff Num_plt Type_plt Defendant Award DAVID and Susan TAYLOR 2 2 JOHN CRANE L781 James and Terry Crawford 2 2 ACandS Inc., et Andrew and Marietta Prebehalla 2 2 Harbison & Wal Here, all our plaintiffs are married couples, so there are 2 plaintiffs, and we give them a code of 2. We could, for example, give a single plaintiff a code of 1 and a surviving spouse, who is suing for the estate, a code of 3.
17 Example 2 Codebook! When we are coding, it is important to keep track of what we code, and how we code it. This is usually kept in a codebook, which documents what each variable means. So, for the asbestos cases, our codebook would include: Type_plt = Type of plaintiff. 1= single plaintiff. 2= married plaintiffs. 3=surviving spouse, suing on behalf of the estate.
18 Example 2 Now we have the data in a form which allows us to model or manipulate it, in order to better understand trends and relationships.
19 Final thoughts In order to code correctly, we MUST: Be Consistent in our coding i.e. if female =1 once, female =1 always Know what you are coding! Coding is NOT an exact science in most circumstances Knowing the context can help you determine where to put a case / plaintiff / award when it does not exactly fit your categories When in doubt, have someone code a sample of your data, and see the level of consistency. Keep track of what you do! Use a codebook! This is an intuitive process, and everyone makes mistakes! Take your time!