SURVEY OF ADVANCED ENVIRONMENTAL PROTECTION PROGRAMS IN TEXAS REBECCA JANE COSNER, B.S. A THESIS STATISTICS

Size: px
Start display at page:

Download "SURVEY OF ADVANCED ENVIRONMENTAL PROTECTION PROGRAMS IN TEXAS REBECCA JANE COSNER, B.S. A THESIS STATISTICS"

Transcription

1 SURVEY OF ADVANCED ENVIRONMENTAL PROTECTION PROGRAMS IN TEXAS by REBECCA JANE COSNER, B.S. A THESIS IN STATISTICS Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Approved Robert L. Paige Chairperson of the Committee Brian Gerber Accepted John Borrelli Dean of the Graduate School May, 2005

2 ACKNOWLEDGEMENTS I would like to take the time to recognize a few very important people who helped me in this process. First, Dr. Paige, thank you very much for your insight, patience, hard work and dedication to this thesis project. To Dr. Gerber, thanks for helping me to work through some of the areas that I was unsure about in regards to political science. Finally, I am so grateful to have a wonderful family. I love you very much. ii

3 CONTENTS ACKNOWLEDGEMENTS ii ABSTRACT v LIST OF TABLES vii LIST OF FIGURES viii I. INTRODUCTION Overview of Advanced Environmental Programs in Texas Methodology II. SURVEY TOOLS Overview of Questionnaire Design The AEP Survey Instrument III. EXPLORATORY DATA ANALYSES IV. SAMPLING WEIGHTS Overview of Sampling Weights Calculation Sampling Weights for Non-Participating Cities Sampling Weights for Participating Cities Final Sampling Weights V. WALD STATISTICS Introduction to Wald Statistics Wald Test of Independence Between Occupation and Participation.. 24 VI. LOGISTIC REGRESSION Overview Of Logistic Regression Model Selection Logistic Regression Results First Model Chosen Second Model Chosen Hosmer-Lemeshow Goodness of Fit Marginal Response Curves with Confidence Bands iii

4 6.8 Odds and Odds Ratios VII. CONCLUSIONS BIBLIOGRAPHY APPENDIX A THE SURVEY INSTRUMENT B MAPLE PROGRAM C SAS PROGRAM iv

5 ABSTRACT Environmental policy is an important public policy issue in the state of Texas. With the task of implementing and enforcing the goals and standards of both the federal and state minimum environmental standards in over 1,000 cities, the state is faced with the need to implement effective programs that allow local governments the flexibility and control that is essential to local government participation. The federal government, under the direction of the Environmental Protection Agency (EPA) sets forth a set of minimum standards that must be monitored. Each state is charged with the responsibility of establishing its own set of standards that, at a minimum, meet the federal guidelines. Texas works to meet or exceed federal standards through many types of programs, one of which is the Clean Texas program. The Clean Texas program as well as other environmental quality programs fall under the general umbrella of advanced environmental programs (AEP s). AEP s are any kind of voluntary program that is set up for entities (private as well as public) in which to participate. This participation entails a system of tradeoffs in the sense that each participating entity agrees to perform at better-than-minimum standards in exchange for less regulation. By less regulation, we mean that cities hope for less inspection, less punitive actions if and when shortfalls occur, and overall more leniency. In this project, we investigate some of the underlying reasons that contribute to a local governments willingness to participate in Clean Texas. At the same time, we will also investigate reasons why the majority of Texas local governments choose not to participate in AEP s. We perform an analysis of results for a survey that was conducted in Fall 2003 by the Early Survey Lab at Texas Tech University. Data was collected via a computer-based questionnaire. The survey instrument was constructed by Dr. Brian Gerber after discussions and class sessions where myself and graduate students in a graduate public administration class, PAUD 5333, Environmental Policy and Administration met to discuss techniques for survey design. Then the v

6 survey was pre-tested, revised, and administered by the ESL. Data collection was conducted under the direction of Dr. Brian Gerber, Department of Political Science, Texas Tech University and Mr. Brian Cannon, of the ESL. We performed a census of participating local governments and collected a random sample of non-participating local governments. We conducted a complete statistical analysis of this data set in an attempt to answer key questions that arise pertaining to the issue of why local governments choose to participate in AEPs. We will look to see if there is any inherent differences between participating local governments and non-participating local governments. We found that revenue capacity, the region within the state where a city is located, the city s TCEQ rating, population density, as well as a derived variable, MATCHVAL, which will be discussed in detail below, are significant predictors in our first model for participation in an AEP. In our second model, we see that a city s LCV score, revenue capacity, and POPVAL again a derived variable which will be discussed below are significant predictors in determining a city s participation in an AEP. Additionally, we see a significant difference between the principal decision maker about environmental issues within a city depending on whether that city is a participant in an AEP or not. vi

7 LIST OF TABLES 3.1 Counts for Policy Maker in Non-Participating Cities Counts for Policy Maker in Participating Cities Percentages by the Three Occupational Stratum Observed and Expected Counts with the Three Occupational Stratum Percentages by Two Occupational Stratum Observed and Expected Counts with the Two Occupational Stratum Groupings by City Size POPVAL Values Percentages by the Three Occupational Stratum Percentages by the Two occupational Stratum Sampling Weights by City Size and Region Table of Proportions for the AEP Data Set vii

8 LIST OF FIGURES 6.1 Residual Plots for our First Model Residual Plots for our Second Model Plot of Marginal Population Density Model with 95% Confidence Bands Plot of Marginal TCEQ Rating Model with 95% Confidence Bands Plot of Marginal LCV Score Model with 95% Confidence Bands Plot of Marginal Revenue Capacity Model with 95% Confidence Bands 37 viii

9 CHAPTER I INTRODUCTION My master s thesis research project was designed to address two important research questions. What factors influence a local government to participate in an Advanced Environmental Program (AEP)? What factors likely contribute to the sustainability of such program participation? These questions were posed to a graduate political science class, Environmental Policy and Administration, PAUD 5333, by Dr. Brian Gerber in fall The focus of the class was to study environmental issues in the administrative levels of city, state, and/or local governments in the state of Texas. Throughout the course of the semester, the class looked in depth at environmental policies, governmental practices related to environmental mandates, the mandates themselves, government agencies that are responsible for drafting and enforcing policies for the protection of our environment, as well as the local governmental compliance with environmental policies, local governments that participate in AEPs and how that participation is played out. In addition, the group worked on a research project of their own that would help them understand the complexities of city government participation in environmental programs. This project involved the collection of survey data upon which I performed my statistical analysis. 1.1 Overview of Advanced Environmental Programs in Texas Over the last several years, a number of studies have been done in the area of environmental standards and compliance within the private sector. However, within the public sector, specifically, local Texas city governments, reform attempts have been the source of much debate between high level city officials and the governing agencies. While some have found the changes to be beneficial, others have seen 1

10 these attempts as infringing on their freedom to address problems such as pollution and any other environmental hazards in their own way. Agencies have been moving from a system of policy enforcement that provides very few incentives for compliance to a system where greater incentives are used to encourage cities to comply. For example, the Clean Texas program offers a system of incentives to its members which allows them to escape the traditional approach of state enforcement of both state and federal minimum compliance standards. Officials charged with the task of implementation and enforcement are taking greater measures to crack down on cities escaping compliance by way of loop-holes and release high levels of pollutants under the use of permits. Officials are hoping to eliminate a sense of forced compliance by allowing voluntary participation in programs in exchange for less severe inspections and punishments for shortfalls on compliance attempts. Drastic changes are often not warmly welcomed, and Texas governing agencies are looking for unique and innovative ways of implementing these new practices with the least amount of conflict and resistance. 1.2 Methodology There were several major steps taken in answering our research questions: 1. Questionnaire Design: In order to gain a greater understanding and knowledge of how city governments in the state of Texas feel about environmental regulations, myself, students of PAUD 5333, and Dr. Gerber held class discussions regarding survey writing techniques and wrote a set of possible questions that would achieve the desired information. Following this discussion, Dr. Gerber wrote a complete questionnaire that would collect information and address our two key research questions. The draft went through several revisions and a final instrument was established. Beyond gaining this greater understanding of a city s opinions on environmental regulation in general, we also wanted to know each city s basic attitude toward the main governing bodies that are responsible for the regulation of the 2

11 environmental practices as well as the city s interactions with compliance agencies. We also suspected that there could be differing opinions on environmental issues based upon departmental duties and interactions. This dilemma was tackled by contacting high level administrators such as city managers, mayors, etc., and by contacting employees/directors of various city departments that may have more direct contact with environmental enforcement agencies. 2. Data Collection: Myself and students of PAUD 5333 conducted the telephone survey. We classified cities as participants and non-participants in AEP programs. While it is quite possible that many Texas cities may choose to follow their own plans for protecting the environment, we chose to make a strict classification on formal involvement versus self regulation. We took a census of participating cities since their number is small. Of the non-participating cities, we collected a random sample since non-participating cities are by far too numerous to conduct a census. We used the census of participating cities as a benchmark for comparison between the two groups. Finally, we set to work collecting data over the phone with the assistance of the computerized adaptation of questions provided by Mr. Brian Cannon in the Earl Survey Research Laboratory at Texas Tech University. The participant responses were recorded by the computer, and the data was extracted from these records. Throughout the survey process, the survey was conducted by myself, students of PAUD 5333, and ESL staff. The students were instructed on survey techniques. When the survey was conducted, it incorporated several areas of small definitions and points of clarification for the respondents. However, this was minimal and therefore at times they potentially could have been unclear as to what was being asked. This is always an issue that is present in survey design. One must look for a balance between the survey being too lengthy and 3

12 not incorporating enough background information. We strived to create a survey that would be quite short and from which we would collect the desired information in an accurate manner. 3. Data Quality and Preliminary Analysis: After collecting the data, we looked at the results for any key components that seemed out of the ordinary and for any general trends. 4. Sampling Scheme: Prior to data collection cities were categorized into several groups or strata. Next, a stratified sample of cities was taken in order to yield results which are accurate for the population. Finally, stratified sampling weights were assigned to responding cities. 5. Data Analysis: During this step, we used Wald statistics and logistic regression to find statistically significant differences between the two types of cities: participating and non-participating. We used Wald statistics to determine dependencies and logistic regression to determine a suitable model as well as correct standard error estimates for the parameters. 4

13 CHAPTER II SURVEY TOOLS 2.1 Overview of Questionnaire Design Our survey instrument began with two main research questions and in response to those questions, we worked to design a survey that would address those questions and concerns. Our survey instrument was written by Dr. Brian Gerber after class discussions involving himself, students of PAUD 5333 and myself. Guidelines for an appropriate survey design were followed. Much time was spent creating and revising questions to ensure that they were understandable while giving the necessary background information and not being too long and drawn out. Mr. Brian Cannon adapted this final draft to a computer questionnaire. We were then able to utilize a questionnaire that was understandable and simple to administer. We decided that questions needed to be somewhat short. We wanted them to be clear, concise, and direct. Most questions were designed using a Likert scale that would allow the respondent to make a selection between 1 and 7 as well as a non-response option. This allowed for more accuracy in selecting a response to a question but also kept the responses in somewhat of a uniform format. We also attempted to select closed questions as many questions pertained to general information about cities about AEP s. These two types of questions allowed us to get specific information from each respondent in a very organized manner rather than such a wide variety of responses that may be hard to interpret the exact meaning of. Once the AEP survey was finalized, we compiled a list of cities to be contacted. These cities included all participating cities as well as the randomly selected sample of non-participating cities. The survey was administered in the Early Survey Research Laboratory at Texas Tech University by the staff, the students in PAUD 5333, and myself. The finalized survey was prepared as a computerized survey instrument on the laboratory s computers using the computer-assisted telephone 5

14 interviewing software package. We administered the survey in such a way that the cities were randomly selected and each city was contacted as many times as was necessary to ensure at least one response. We attempted to collect 3 responses from each city but were sometimes unsuccessful. Nonetheless, we did take care in making sure that each city on our contact list was represented. In survey sampling, there is always the issue of non-response. Non-response may occur in two ways: Item Non-response: respondents may not respond to a particular question, or Unit Non-response: there may not be a member of the target population available for the survey. We were careful to eliminate the unit non-response by allowing for callbacks until a response from a city official was obtained. However, item non-response did occur as we did not allow for clarification beyond the survey questions for those who did not understand the question or know about that particular component of the city s operations. Our final data set consisted of 164 responses from officials in 79 cities. 2.2 The AEP Survey Instrument We began by getting a firm understanding of who we are interviewing from a city and if their knowledge would be affected by the department in which they worked and/or their position. In order to gain a clear understanding of these types of situations, the first couple of survey questions collected information about the department in which they worked and their job title. These questions were used to determine if and how perspectives about environmental issues depend upon the area of local government in which the interviewee works. Next, we survey respondents about where policy is initiated within their own governments. Along with this information, we wanted to gain a clearer understanding about whether the initiation of programs begin a particular department, as well as where the responsibility for enforcement of these policies falls. After collecting some initial information, we collected information on specific viewpoints about the ways that each city deals with its own environmental 6

15 problems. We wanted to know if each city believed that it had an environmental/pollution problem as well as if they were dealing with these problems or any other potential problems regarding the environment and pollution. In addition to the city employees opinions about their communities, we wanted to know what concerns their departments had with environmental compliance issues and the degree to which they believed it was a concern. This level of concern was measured on a scale from 1 to 7, where 1 means little concern and 7 means major concern, so that we could separate out the different levels of concern, or lack there of, in a particular department. Beyond the local concerns with environmental policy and control, we wanted to assess how employees felt their cities compare other near-by cities of similar size. We wanted to know, in addition to how they rated themselves, the degree to which they were surpassing or trailing their neighboring towns. Next, we set out to gain a greater understanding of relationships between cities and governing agencies and program compliance groups such as the Texas Commission on Environmental Quality and Clean Texas. We provided respondents a bit of general information about the Clean Texas program so that they would be able to recognize involved agencies and would be in a position to inform us as to their familiarity with such programs as well as their involvement with programs like Clean Texas. We also were interested in the city employee s opinion as to whether or not these programs were effective and necessary or important in the protection of the environment. Again for these questions responses were measured on a Likert scale rather than just a yes or no type of question. This would let us know whether employees had some experience in dealing with environmental programs, lots of experience, or virtually no experience with these programs and the extent to which they deem AEP s as important in creating a solution to environmental compliance problems. The next question was designed to assess the underlying reasons as to why a city chooses to participate in a voluntary program such as Clean Texas. We gave 7

16 respondents a list of options to choose from which included many of the most prominent reasons for participating in AEP programs. Respondents were given the option to refuse to answer and an option if they did not know why their city joined these programs. AEP s offer cities incentives for high levels of voluntary compliance. The first issue relating voluntary compliance is improved compliance ratings. We again asked respondents to measure their feelings about the need of an improved compliance rating on a Likert scale. Then, we tackled the issue of greater flexibility in the ways in which they comply with their environmental programs within the framework of these voluntary programs. Again a Likert scale was used to see whether they strongly favored flexibility in their compliance or not. Another key issue was the connection between higher levels of compliance and greater benefits that would reach far beyond just environmental quality such as bond ratings, insurance rates, etc. Once again we allowed for wide ranges in opinions in hopes of more accurately capturing the opinions about how other benefits are linked to environmental compliance issues. The next set of questions were only asked of respondents from Clean Texas cities. They were asked to provide a list of Clean Texas activities and services. We wanted to asses the direct repercussions of Clean Texas participation and how this participation effects activities that were devised to achieve environmental compliance. Beyond just knowing what was taking place under the Clean Texas program for a particular city, we also wanted to know how the city became a member, meaning, who was the driving force behind the decision to participate in this program. Next, since the Texas Commission on Environmental Quality (TCEQ) is at the center of the various environmental programs that are essential to the care and quality of the environment in Texas, we wanted attitudes of city government officials about TCEQ, the responsiveness of the commission, the helpfulness of the commission and its employees, the information that the commission provides and the accuracy of this information, the programs that it offers, and standards it 8

17 upholds. The order of several questions was shuffled so as to spot if there were trends of one question eliciting a response on a later question. Again for this series of questions, we measured attitudes on a seven point Likert scale. Then, the next question was meant to asses the overall political ideology of the city. We gave respondents options ranging from very liberal to very conservative. The purpose of this question was to collect data about whether political ideology in any way has a connection to views about AEP s and environmental protection. At the end of the interview we had a couple of questions regarding public services. We wanted to determine if each city had a means for disposing of environmentally hazardous waste and if that facility was funded and operated by the city alone. If a city did have this service, we collected information if the collection process involved drop-offs, pick-ups, or both. Then finally, we asked about budgeting constraints and if the city had in the recent past, or were currently, suffering from budget problems, that prohibited them from providing as many environmental services as they would like to. Again, we gave respondents a list of options, extremely limited, somewhat limited, not very limited, or not limited at all. This was meant to assess if a city was really wanting to do more than they could or if they were doing all that they wanted to do at this time with regards to environmental cleanup. 9

18 CHAPTER III EXPLORATORY DATA ANALYSES Exploratory Data Analysis (EDA) incorporates a wide variety of techniques that are used to extract noticeable and preliminary ideas and patterns from a data set. It is a set of steps that are used to help gain initial information from a data set that may reveal trends. EDA employs mainly graphical techniques that allow one to easily see inherent differences, abnormalities, and the like within a set of data. EDA is an important way for us to gain the following information in a clear and concise way: zero in on key variables; target predetermined variables of interest; test for assumptions; determine relationships and structures of the data set; uncover abnormal areas that exist within the data; and, gain greater insight into the overall data set. During the EDA process, we mainly used a series of graphical steps and procedures that allowed one to visualize the structure and nature of the set of data. We present much of this information in tabular form that breaks it down into smaller pieces that are more understandable. In each case, we reach a conclusion as to whether that small piece has any underlying properties that stand out and should be further explored. We performed EDA on responses to many survey question and divided the data into two groups: participating and non-participating cities. We considered sample proportions or counts by group for each question of interest. Our first question of interest involved the key people in charge of environmental policy making. Question three in our survey is If you had to choose *one* from the following list, who in general would you say is the most important in making decisions about environmental policy in your community?. We had elected officials 10

19 like the Mayor or the City Council, the City Manager, Department managers or other key staff from relevant departments, a local government advisory board, the TCEQ, and someone else like local business or environmental groups as our possible options for response. Tables 3.1 and 3.2 summarize the the counts for each response and we conclude that there may be a significant difference between the two groups. Table 3.1: Counts for Policy Maker in Non-Participating Cities RESPONSE COUNT These counts for non-participating cities correspond to the number of responses for each option out of a total of 79 responses where two responses were item non-response. Table 3.2: Counts for Policy Maker in Participating Cities RESPONSE COUNT This time, however, this is a tally for the AEP participating cities. We had a total of 64 responses with no non-response for this question. We began by looking at the counts for each response to question three and after conferring with Dr. Gerber, it was observed that several categories are redundant 11

20 and correspond to essentially the same task within a city government. Therefore, we collapsed categories to obtain tables with three responses instead of six which was a bit more manageable. The new categories were: City Manager Mid-Level Manager Elected Officials Next, we attempted to uncover if there was any difference between the proportions for the participating and non-participating cities as to who they feel drives environmental decision making process. We used our census of participating cities as a benchmark for comparison with the non-participating cities. We obtained table 3.3 which summarizes the proportion for each type of official. This allowed us to perform a test of hypotheses that there is a difference versus there is not a difference in the observed and expected counts. Table 3.3: Percentages by the Three Occupational Stratum OCCUPATION SAMPLE(%) POPULATION(%) Manager Mid-level Elected In performing a χ 2 Goodness-of-Fit test we utilize the following setup: H 0 : p i = p (0) i where p (0) i are prespecified values. Then, we formulate the following equations from Lohr [16]: χ 2 = (observed count expected count) 2 = expected count all cells Our hypotheses for our χ 2 goodness of fit test are: k i=1 (nˆp i nˆp (0) i ) 2 nˆp (0) i H 0 : The distribution of observed frequencies equals the distribution of expected frequencies under simple random sampling versus = n k i=1 ˆp i ˆp (0) i. ˆp (0) i (3.1) 12

21 H 1 : The distribution of observed frequencies does not equal the distribution of expected frequencies under simple random sampling. This chi-square test assured that the sample behaves as a simple random sample. In other words, this means that we ignore all weighting schemes. We divide our data into two groups, one that consists of a random sample of cities that are not participating in AEP. The second group consists of a census of the cities that are participating in an AEP. This census gave us our true population proportions that we then used to calculate our expected counts for the data. We obtained the following table. Table 3.4: Observed and Expected Counts with the Three Occupational Stratum CLASSIFICATION O E O E (O E) 2 (O E) 2 Manager Mid-Level Elected Total E Conclusion: It is shown above that with a chi-square value of with 2 degree of freedom, we have a p-value of 0.03 and hence we do have sufficient evidence to reject the null hypothesis at a 5% significance level. Therefore there is a significant difference between the observed and the expected counts for our data set. Again, we attempted another grouping of our categories. This time we collapsed our categories into two new categories: City Manger and Not City Manager officials. We obtained the following percentages for the sample and census: Table 3.5: Percentages by Two Occupational Stratum OCCUPATION SAMPLE(%) POPULATION(%) Manager Non-Manager

22 Then, we obtained the observed and expected counts for our sample and census data. Table 3.6: Observed and Expected Counts with the Two Occupational Stratum CLASSIFICATION O E O E (O E) 2 (O E) 2 Manager Non-Manager Total E Likewise, when we worked with only two occupational groupings, we obtained the same outcome: lack of evidence to to reject the null hypothesis and therefore there is no reason to believe there is a difference in the expected and observed counts for this test as well. Conclusion: We obtained a chi-square value of with 1 degree of freedom and a p-value of approximately which means we fail to reject the null hypothesis at a 5% significance level. 14

23 CHAPTER IV SAMPLING WEIGHTS There are two basic approaches to stratification used in survey sampling. One can stratify the data into natural strata, sample from each of the strata, and then simply assign sampling weights based upon the strata into which a response falls. Another approach is that of collecting a random sample of units from our entire population and then once the sample has been collected, construct our strata and associated poststratified weights based on the stratum constructions after the data has been collected. This second approach is known as poststratification. In our data collection and weight process, we constructed stratum that occur naturally population based stratum and then use our knowledge of the properties of each strata to create sampling weights. 4.1 Overview of Sampling Weights Calculation Consider table 4.1 which shows how our sample is broken down and that a sample weighting scheme is necessary. Table 4.1: Groupings by City Size SIZE NUMBER OF CITIES NUMBER SAMPLED SAMPLED(%) Over Due to the careful design and implementation of our instrument we will not see any problems that arise out of a unit non-response, the refusal of an entire sample unit. This type of non-response is taken into consideration and the problem is resolved by allowing for call backs until a response it achieved. Therefore, the only type of 15

24 non-response that we must work with is that of item non-response. Item non-response is simply a respondent not answering one (or a few) of the questions but on the whole agreeing to participate in the survey. Sampling weights are way that we can adjust for non-response and undercoverage within a sample. Our data was collected as a stratified sample where stratum correspond to one of 48 city size-region combinations. The levels for city size were , , , , ,and over and the eight regions considered where 1 through 8. We are making several assumptions when we construct the sampling weights for our data. These assumptions are as follows: The state was broken into 8 regions; Within each region, the sample was selected by matching a participating city with a non-participating city of similar size; In a few cases, there are not any cities that would match with a participating city based on population. So, these cities were matched with at least 2 smaller cities. Sampling weights are a way that we can adjust a sample using the true population counts when our survey involves undercoverage and/or non-response. In this process, we construct weights that reflect this use of the true population properties. A sample weight, w i, can be thought of as a ratio of the number of units represented in population by the i th sampled unit. Specifically for our survey, we created strata based on the size of the city and separated the cities into two categories: participating or not participating in an AEP. Our sample of cites that are participating in an AEP was a census. Additionally, due to the nature of our population cities within Texas we have at our disposal, a tremendous amount of information for each city. We are able to collect much additional information about each city without additional survey sampling. These additional response variables include the following: Population density a measure of the ratio of persons per unit area, in this 16

25 case square mile. LCV Score a score given by the League of Conservative Voters to congressmen which in turn is reflected back to each of the cities and their representatives. Revenue Capacity a projected amount of money that a city is expected to receive through various resources including taxes, lotteries, and the like. TCEQ Compliance Rating a rating, clearly, given by the Texas Commission on Environmental Quality, which ranges on a scale from 0 upward. Smaller scores are given to more compliant cities. 4.2 Sampling Weights for Non-Participating Cities In our survey we took a stratified sample of non-participating cities. A scheme for calculating sample weights was developed. First, we created two new variables. The first was POPVAL. This variable is a categorical variable that labels each city with a category of A through F based on the population size of the city. Next, we created the variable MATCHVAL. This variable is also a categorical variable that initially takes on the same value as POPVAL. However, once it is discovered that a city is matched with a city outside of its population grouping, this variable is reassigned to the value of the participating city s MATCHVAL. MATCHVAL is a variable used to keep track of how cities were really paired. Note, however, that sample weights were computed on the basis of region and MATCHVAL. Table

26 below summarizes values of POPVAL. Table 4.2: POPVAL Values POPULATION SIZE POPVAL A B C D E over F Note: In most cases MATCHVAL and POPVAL are assumed identical values. They only differ in the event a city is matched with a city outside of its population classification. For example, within the region where El Paso is located, there are not any other cities of rather large size so El Paso is matched with two smaller cities Marfa and Crane. Once POPVALs are determined, then, we constructed weights for each of the cities based on the population size and the region in which they were included. Within a particular region and for each population group, the number of cities that could possibly be sampled were determined and assigned to the variable MATCH. This variable keeps track of all cities within that area that are of that particular size. Next, we determine how many of those cities are part of the sample. This value is recorded under the variable SAMP. Next a final weight is constructed as follows: In calculating the weights, we use formulae given by Lohr[16]. The formula is as follows: where w h = N h N n h n = N h N n n h (4.1) h = stratum number; n = sample size; N = population size; 18

27 n h = sample stratum size; N h = population stratum size; w h = sample stratum weight. This formula then reduces for our example to the following: w h = MAT CH SAMP. (4.2) 4.3 Sampling Weights for Participating Cities For those cities that are indeed participating in an AEP program, the weighting scheme is trivial. Again, each city is assigned a POPVAL and a MATCHVAL, as described above, based on population of the city and what size city it is matched with. We then assign to each MATCH and SAMP a value of 1. Then as before, we use a variation of the formula given by Lohr[16]. The formula is given by (4.2). This formula then reduces for each participating city to w h = MAT CH SAMP = 1. If one thinks of this in terms of the number of cities in the population represented by each city in the sample, then a sampling weight of 1 makes sense, as a participating city is only representing itself and no others. Once sampling weights were determined, we also constructed some additional contingency tables where the data was broken up into groupings based on the occupation of the respondent as well as whether the city is a participating or non-participating member of an AEP. Table 4.3 presents percentages for occupations in the sample and population. Initially, this was created while breaking 19

28 up the occupations into three distinct categories. Table 4.3: Percentages by the Three Occupational Stratum OCCUPATION SAMPLE(%) POPULATION(%) Manager Mid-level Elected After seeing no apparent differences between the two groups participating and non-participating cities we again, with the same idea in mind, created a structure for the data that involved only two occupation groupings. This again yielded no differences between the two. Below is summary table 4.4 for two groupings. Table 4.4: Percentages by the Two occupational Stratum OCCUPATION SAMPLE(%) POPULATION(%) Manager Non-Manager Final Sampling Weights First, the data is subdivided into eight groups corresponding the eight regions of Texas. Then within each region group, we develop six subgroups, cities with the following population ranges: , , , , , and over For the first strata where the population is between 2000 and 5000 and in region 1, using equation the above equation we obtain the following results for the weight, where we have 25 cities that match the criteria of population size and region location but only 2 were indeed sampled: w h = 25 2 = (4.3) The same method was employed to obtain the sample weight for each of the other 20

29 stratum. Below table 4.5 summarizes the stratum sampling weights. Table 4.5: Sampling Weights by City Size and Region Population Region, i h w ih Clearly, one can see that there are population groups left out in many of the regions. This simply means that there were not any cities within that region of that particular size and hence the sampling weight is irrelevant. These weights are to be interpreted as follows using the first stratum for illustration: For each city sampled that has a population between 2000 and 5000 and in region 1, it is representing itself as well as 12.5 other cities of similar size and location (i.e. within that population range and region). The interpretation is the same for all other stratum. Additionally, there are a few cities which fall outside of the norm for their 21

30 location and are handled separately as special cases. These occur when a city is unable to be matched with one in the same population range within a particular region. 22

31 CHAPTER V WALD STATISTICS 5.1 Introduction to Wald Statistics There are numerous ways to perform chi-square testing for homogeneity and independence. One of these such methods is the Wald test. The Wald test is used to test for independence between variables in complex survey designs. We first show the formulation of the Wald statistic by way of Taylor series expansion. We consider a 2 3 table where p ij denotes the probability a response falls in row i and column j of the table: p 11 p 12 p 13 p 21 p 22 p 23 Wald s statistic is used to simultaneously test two hypotheses: H 01 : θ 11 = p 11 p 1+ p +1 = 0 H 02 : θ 12 = p 12 p 1+ p +2 = 0 where p i+ = j p ij p +j = i p ij. Note that θ 11 and θ 12 can be approximated by ˆθ 11 = ˆp 11 ˆp 1+ˆp +1 ˆθ 12 = ˆp 12 ˆp 1+ˆp +2. We express θ 11 and θ 12 as functions of p = (p 11, p 12, p 13, p 21, p 22, p 23 ), h(p). A Taylor series expansion about the point ˆp is used to linearize the parameter estimate, h(ˆp), via the following equation: Therefore, we obtain h(ˆp) h(p) + h T (p)(p ˆp). (5.1) 23

32 V (h(ˆp)) V ( h T (p)(p ˆp)) = h T (p)v (ˆp) h(p) where, and V (ˆp) = h(p) = h(p) p 11. h(p) p 23 p 11 (1 p 11 ) p 11p 12 p 11p 13 p 11p 21 p 11p 22 p 11p 23 n n n n n n p 11p 12 n p 11p 13 p 12p 13 n n p 12 (1 p 12 ) p 12p 13 p 12p 21 p 12p 22 p 12p 23 n n n n n p 11p 21 p 12p 21 p 13p 21 n n n p 11p 22 p 12p 22 p 13p 22 p 21p 22 n n n n p 13 (1 p 13 ) p 13p 21 p 13p 22 p 13p 23 n n n n p 21 (1 p 21 ) p 21p 22 p 21p 23 n n n p 11p 23 p 12p 23 p 13p 23 p 21p 23 p 22p 23 n n n n n And, working under the assumptions of the null hypothesis, H 01 : θ 11 = 0 p 22 (1 p 22 ) p 22p 23 n n p 23 (1 p 23 ) n. H 02 : θ 12 = 0 and, for a sufficiently large sample size, the Wald Statistic is, χ 2 W = ˆθ T ˆV (ˆθ) 1ˆθ where θ = (θ 11, θ 12 ) T and V (ˆθ) = V (h(ˆp)) has a χ 2 distribution with 2 degrees of freedom. 5.2 Wald Test of Independence Between Occupation and Participation Using SAS, we obtain estimates for the proportions corresponding to each block in table 5.1 and obtain the following: Table 5.1: Table of Proportions for the AEP Data Set City Manager Mid-Level Manager Elected Official Participating Non-Participating

33 Using Maple, we perform the calculations discussed above to obtain the observed value of the Wald test statistic: χ 2 W = Conclusion: We reject the null hypothesis and conclude participation and the occupation of the main decision maker within a city government are not independent of one another since we obtain a p-value of 0.48, indicating we have exceeded the 95 th percentile of the chi-square distribution with two degrees of freedom. 25

34 CHAPTER VI LOGISTIC REGRESSION Binary response data is very prevalent in survey data analysis. Binary responses can be modelled in a fashion similar to how continuous variables are modelled. However, there are some slight modifications that are made. Among these are the need to assign a number to each of two categories. 6.1 Overview Of Logistic Regression Logistic regression models the relationship between a binary response variable and a set of predictors. These predictors can be categorical or continuous variables [10]. Our set of regressors are all variables of the continuous type with the exception of POPVAL, which we will discuss later in detail. The logistic regression model attempts to describe the logit of the response variable as a linear combination of the regressors [1]. We obtain the following equation where x is the observed value of the random variable, X : ( ) P(Y = 1 X = x) ln = logit(p(y = 1 X = x)) = β P(Y = 1 X = x) This equation in turn gives us the equation for the logistic model: p = e β 0 Pk where p is the probability of success. i=1 β ix i = eβ0+pk i=1 β ix i k β i x i. (6.1) i=1 (6.2) 1 + e β 0+Pk i=1 β ix i SAS uses by default the response value of 0 as our variable to be modelled. However, our variable of interest was that of participation, which corresponds to 1 rather than 0 which corresponds to non-participation. Therefore, we force a logistic model that utilizes the response value of 1 through the use of SAS model statement options. 26

35 6.2 Model Selection After the Wald test indicated participation depends on the occupation of the main decision maker regarding environmental policies, we decided that a logistic regression model might help to uncover even more of the differences as to what makes cities participate in AEP s and others not participate. We chose a logistic model since our variable to be modelled is a dichotomous variable. In other words, we modelled the classification of a city based on participation as a binary variable called type which assumes a value of 1 for the cities that participate in AEP programs such as clean Texas and 0 corresponding to the cities that do not participate. In this model we added several other variables to our data set for each city. Each city has variables describing population density, a compliance rating, an LCV score, and a revenue capacity value. Population density is a measurement of the ratio of people per square mile. This measurement can be thought of as the average number of people living in a city square mile. Compliance rating is a numerical value assigned to each city by TCEQ. It is on a scale which begins at zero which represents high environmental performance standards. A rating between 0.10 and 45 shows average compliance and over 45 is classified as poor environmental performance [5]. The LCV score is a score on environmental awareness and compliance assigned by the League of Conservation Voters. This score is based on a 0 to 100 scale where the higher the score the more compliant and environmentally friendly a city is thought to be. Often this score is assigned to the representative of the area based on his or her votes in congress and then is in turn a reflection on the cities views of environmental policy [12]. Revenue capacity is a measure of potential revenue a city is able to generate from its residents. This money is gained in the form of taxes, lotteries, and the like [20]. For logistic regression modelling we used SAS version 9.13 and we had many options within SAS to that made modelling easy. There are several SAS procedures that are able to produce logistic models. These include PROC LOGISTIC, PROC PROBIT, and PROC SURVEYLOGISTIC. We chose to use the PROC LOGISTIC 27

36 and PROC SURVEYLOGISTIC to model our data in hopes of finding a new insight into the reasons that a city might choose to participate or not participate. PROC LOGISTIC is a very well known and useful procedure that allows for much flexibility and creativity in the model selection process. There are many features that allow us to call for additional information which aides us in selecting an appropriate model. We are able to access statistics related to goodness of fit, residual plots, etc. Additionally, there are several features that allow for automated model selections. These include forward selection, backward selection, and stepwise selection. Additionally, PROC LOGISTIC allows for automatic set up of all interaction terms between all parameters of interest. Therefore, a model can be selected based on all possible terms in very concise and systematic manner. Forward selection is a way in which a model is chosen based on starting with an empty model and adding in regressors (including interaction regressors) until we have added in all regressors that are significant. By significant, we mean that a predictor has a p-value < After a regressor is added in, then the addition of another regressor is attempted. If there is a significant change in the model based on the new parameter, that is the new parameter is significant, then that parameter is added to the model as well. However, when all that remains are parameter that are insignificant then the selection stops. A final model is chosen and reported. Backward selection is a method of choosing a model in a manner such that all parameters and interactions are placed into the model. Then, these parameters are removed if found to not be significant. Parameters are removed until a model is selected that is highly significant and only includes parameters that are significant as well. The final selected model is then chosen and analyzed. Stepwise selection is model selection process that is a bit more complicated due to the fact that techniques of both forward and backward selection are incorporated. Regressors are both added in and may be removed depending on the other parameters already in the model. In this manner, we attempt to find the regressors that are the most highly significant since we have more opportunities to enter and 28

37 remove parameters so as to see if one variable could cause another variable to be insignificant. Again, a final model is selected and then analyzed. 6.3 Logistic Regression Results Compliance rating, LCV score, revenue capacity, and population density are all variables that we believe would have an impact on a city s decision to participate or not participate in a formal environmental protection program. We arrive at this feeling based on subject matter knowledge and intuition in that it would make sense that those cities with a wider range of options for financial support would also have the funds available to allocate towards AEP funding. Additionally, the expertise of Dr. Gerber backed these assumptions that these factors, or the majority of these factors, should without a doubt be highly significant in our modelling process. These are of particular importance to use in our model selection process. Furthermore, we attempt to use other regressors as well in our model construction. We select a list of possible regressors that include the following: TCEQ Rating; LCV Score; Revenue Capacity; Population Density; Region; POPVAL; MATCHVAL. 6.4 First Model Chosen Additional regressors were tested and were promptly removed as they were not highly significant in the model alone much less in conjunction with other regressors of interest. A logistic regression analysis is performed to investigate whether these factors are in fact meaningful predictors of participation in AEP programs. Applying logistic regression to our data set, we chose to use all three model 29

38 selection processes. In each case, we see that several variables are significantly contributing to the logistic regression model. We choose our model based on these selection routines as well as going through manual model selection. That is, we perform the model selection without the use of a pre-made routine and we reach the conclusion that again the following regressors are of interest and will be entered into our first model choice: MATCHVAL (A - E), x 1, x 2, x 3, x 4, x 5 ; Region, x 6 ; Revenue Capacity, x 7 ; TCEQ Rating, x 8 ; Population Density, x 9. In our analysis, we see that several parameters are found to be of interest. MATCHVAL, Region, Revenue Capacity, TCEQ Rating, and Population Density are the parameters that we will include in our model construction of city type whether a city participates or fails to participate in an advanced environmental protection program. We have the following model: p = e β 0 P9 i=1 β ix i = eβ0+p9 i=1 β ix i. (6.3) 1 + e β 0+P9 i=1 β ix i From our SAS program output we obtain the following parameter estimates which we substitute into the above formula to yield the following fitted model: p = e x x x x x x x x x e x x x x x x x x x 9. In both cases for the intercept and the regressors MATCHVAL, Region, Revenue Capacity, TCEQ Rating, and Population Density that they are all highly significant with a p-value less than or equal to Additionally, our model overall is, as well, highly significant with a very small p-value (< ). Next and most importantly, we looked at residual plots for our fitted model and how the model fits the data. These residuals on the whole fall more or less between -3 and 3. However after discussion, we decided that the fact that there are a few 30

39 residuals outside of that range is not a major problem as they were near to our desired values and there did not appear to be a pattern or trend within them. They seemed to be rather randomly scattered about. Below, are the residual plots for our first model: Figure 6.1: Residual Plots for our First Model 6.5 Second Model Chosen We also find that another model is of high significance, as well, and this model includes the regressors: LCV Score; POPVAL; Revenue Capacity. Again, all p-values are small for this model. We can see that replacing MATCHVAL with POPVAL creates a simpler model. This makes sense in that POPVAL would help explain the participation type while making Population Density no longer necessary. Typically, cities with higher populations (i.e. higher category POPVAL) would also have a higher density of people. As for region and TCEQ rating it is not quite obvious as to how these two are more dependent on POPVAL than on MATCHVAL. Our regressor list yields: POPVAL (A - E) x 1, x 2, x 3, x 4, x 5 ; 31

40 LCV x 6 ; Revenue Capacity x 7. Again, we get the following base model with which we will work as follows: p = e β 0 P7 i=1 β ix i = eβ0+p7 i=1 β ix i. (6.4) 1 + e β 0+P7 i=1 β ix i Substituting in for point estimates of the parameters yields the following fitted model: p = e x x x x x x x e x x x x x x x 7. Next, we turn our attention to the residual plots for the model. We find that overall, there does not appear to be a trend within the residuals. Additionally, the majority of the residuals fall within the range of -3 to 3. Despite the fact that there are a few that exceed the value of 3 we see that this is not a major problem or a reason to abandon the model. Below is the residual plot for the second model. Figure 6.2: Residual Plots for our Second Model 6.6 Hosmer-Lemeshow Goodness of Fit Next, we proceeded to look into model goodness of fit. SAS provides access to the Hosmer-Lemeshow statistic which follows a chi-square distribution with n-2 degrees of freedom, where n is the number of partitions of the data set. SAS has a default partition set for 10 therefore we have 8 degrees of freedom for this 32

41 distribution. Our results were a value well over 300 with a p-value of less that This however, has a bit of an odd interpretation. The smaller the p-value the worse the fit. Therefore, our model would be viewed as clearly displaying significant lack of fit. However, this did not seem possible. After some investigation, we see that several conditions that are necessary for the correct calculation of the Hosmer-Lemeshow statistic were violated. These included the following [7]: a large sample size, at least n=400, is necessary; a simple random sample; and, no weighting scheme. Clearly, all three of these major assumptions were violated in our data collection and therefore we will not rely upon the Hosmer-Lemeshow statistic to assess the fit of our model. 6.7 Marginal Response Curves with Confidence Bands After deciding that both our models are overall good fits to our data, we displayed our data along with our marginal model to get an idea of the model fit to the data. A marginal curve tells us how the proportion of response would change for each regressor. We see that our marginal curves show that each response variable does indeed help in explaining our participation in an AEP. Below are a series of plots with our selected model curve and 95% confidence bands. Each one corresponds to one of the numerical regressors from our full models. 33

42 Figure 6.3: Plot of Marginal Population Density Model with 95% Confidence Bands 34

43 Figure 6.4: Plot of Marginal TCEQ Rating Model with 95% Confidence Bands 35

44 Figure 6.5: Plot of Marginal LCV Score Model with 95% Confidence Bands 36

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

IMPLEMENTING BUSINESS CONTINUITY MANAGEMENT IN A DISTRIBUTED ORGANISATION: A CASE STUDY

IMPLEMENTING BUSINESS CONTINUITY MANAGEMENT IN A DISTRIBUTED ORGANISATION: A CASE STUDY IMPLEMENTING BUSINESS CONTINUITY MANAGEMENT IN A DISTRIBUTED ORGANISATION: A CASE STUDY AUTHORS: Patrick Roberts (left) and Mike Stephens (right). Patrick Roberts: Following early experience in the British

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Introduction... 3. Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

Introduction... 3. Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups... 1 Table of Contents Introduction... 3 Quantitative Data Collection Methods... 4 Interviews... 4 Telephone interviews... 5 Face to face interviews... 5 Computer Assisted Personal Interviewing (CAPI)...

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Current Situations and Issues of Occupational Classification Commonly. Used by Private and Public Sectors. Summary

Current Situations and Issues of Occupational Classification Commonly. Used by Private and Public Sectors. Summary Current Situations and Issues of Occupational Classification Commonly Used by Private and Public Sectors Summary Author Hiroshi Nishizawa Senior researcher, The Japan Institute for Labour Policy and Training

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory LA-UR-12-24572 Approved for public release; distribution is unlimited Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory Alicia Garcia-Lopez Steven R. Booth September 2012

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

GRADUATE SCHOOL OF LIBRARY AND INFORMATION SCIENCE INTRODUCTION TO LIBRARY AND INFORMATION STUDIES RESEARCH REPORT

GRADUATE SCHOOL OF LIBRARY AND INFORMATION SCIENCE INTRODUCTION TO LIBRARY AND INFORMATION STUDIES RESEARCH REPORT GRADUATE SCHOOL OF LIBRARY AND INFORMATION SCIENCE INTRODUCTION TO LIBRARY AND INFORMATION STUDIES RESEARCH REPORT Matthew S. Darby Charlotte Fowles Ruth Jiu Monika Szakasits Sarah Ziebell Mann Group LIS

More information

Credit Risk Analysis Using Logistic Regression Modeling

Credit Risk Analysis Using Logistic Regression Modeling Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Using Risk Assessment to Improve Highway Construction Project Performance

Using Risk Assessment to Improve Highway Construction Project Performance Using Risk Assessment to Improve Highway Construction Project Performance Mohamed F. Diab, MBA, Ph.D. and Amiy Varma, Ph.D., PE North Dakota State University Fargo, ND Khaled Nassar, Ph.D. American University

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

Social Survey Methods and Data Collection

Social Survey Methods and Data Collection Social Survey Social Survey Methods and Data Collection Zarina Ali June 2007 Concept of Survey & Social Survey A "survey" can be anything from a short paper- and-pencil feedback form to an intensive one-on

More information

Weight of Evidence Module

Weight of Evidence Module Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

More information

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

Data Analysis, Research Study Design and the IRB

Data Analysis, Research Study Design and the IRB Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

The Margin of Error for Differences in Polls

The Margin of Error for Differences in Polls The Margin of Error for Differences in Polls Charles H. Franklin University of Wisconsin, Madison October 27, 2002 (Revised, February 9, 2007) The margin of error for a poll is routinely reported. 1 But

More information

Topic 8. Chi Square Tests

Topic 8. Chi Square Tests BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test

More information

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Why Sample? Why not study everyone? Debate about Census vs. sampling

Why Sample? Why not study everyone? Debate about Census vs. sampling Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling

More information

Statistical & Technical Team

Statistical & Technical Team Statistical & Technical Team A Practical Guide to Sampling This guide is brought to you by the Statistical and Technical Team, who form part of the VFM Development Team. They are responsible for advice

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

NON-PROBABILITY SAMPLING TECHNIQUES

NON-PROBABILITY SAMPLING TECHNIQUES NON-PROBABILITY SAMPLING TECHNIQUES PRESENTED BY Name: WINNIE MUGERA Reg No: L50/62004/2013 RESEARCH METHODS LDP 603 UNIVERSITY OF NAIROBI Date: APRIL 2013 SAMPLING Sampling is the use of a subset of the

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Chapter 7 Conducting Interviews and Investigations

Chapter 7 Conducting Interviews and Investigations Chapter 7 Conducting Interviews and Investigations Chapter Outline 1. Introduction 2. Planning the Interview 3. Interviewing Skills 4. Interviewing Clients 5. Interviewing Witnesses 6. Planning and Conducting

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Descriptive Methods Ch. 6 and 7

Descriptive Methods Ch. 6 and 7 Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

AP Statistics Chapters 11-12 Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Statistics Chapters 11-12 Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. AP Statistics Chapters 11-12 Practice Problems Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Criticize the following simulation: A student

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York . NONPARAMETRIC STATISTICS I. DEFINITIONS A. Parametric

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

ECO 199 B GAMES OF STRATEGY Spring Term 2004 PROBLEM SET 4 B DRAFT ANSWER KEY 100-3 90-99 21 80-89 14 70-79 4 0-69 11

ECO 199 B GAMES OF STRATEGY Spring Term 2004 PROBLEM SET 4 B DRAFT ANSWER KEY 100-3 90-99 21 80-89 14 70-79 4 0-69 11 The distribution of grades was as follows. ECO 199 B GAMES OF STRATEGY Spring Term 2004 PROBLEM SET 4 B DRAFT ANSWER KEY Range Numbers 100-3 90-99 21 80-89 14 70-79 4 0-69 11 Question 1: 30 points Games

More information

Sample Size Issues for Conjoint Analysis

Sample Size Issues for Conjoint Analysis Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

American Journal Of Business Education July/August 2012 Volume 5, Number 4

American Journal Of Business Education July/August 2012 Volume 5, Number 4 The Impact Of The Principles Of Accounting Experience On Student Preparation For Intermediate Accounting Linda G. Carrington, Ph.D., Sam Houston State University, USA ABSTRACT Both students and instructors

More information

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

The Open University s repository of research publications and other research outputs

The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs Using LibQUAL+ R to Identify Commonalities in Customer Satisfaction: The Secret to Success? Journal

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Center for Effective Organizations

Center for Effective Organizations Center for Effective Organizations HR METRICS AND ANALYTICS USES AND IMPACTS CEO PUBLICATION G 04-8 (460) EDWARD E. LAWLER III ALEC LEVENSON JOHN BOUDREAU Center for Effective Organizations Marshall School

More information

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Math 251, Review Questions for Test 3 Rough Answers

Math 251, Review Questions for Test 3 Rough Answers Math 251, Review Questions for Test 3 Rough Answers 1. (Review of some terminology from Section 7.1) In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate,

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

First-year Statistics for Psychology Students Through Worked Examples

First-year Statistics for Psychology Students Through Worked Examples First-year Statistics for Psychology Students Through Worked Examples 1. THE CHI-SQUARE TEST A test of association between categorical variables by Charles McCreery, D.Phil Formerly Lecturer in Experimental

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01.

Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01. Page 1 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION (Version 01.1) I. Introduction 1. The clean development mechanism (CDM) Executive

More information

Barriers & Incentives to Obtaining a Bachelor of Science Degree in Nursing

Barriers & Incentives to Obtaining a Bachelor of Science Degree in Nursing Southern Adventist Univeristy KnowledgeExchange@Southern Graduate Research Projects Nursing 4-2011 Barriers & Incentives to Obtaining a Bachelor of Science Degree in Nursing Tiffany Boring Brianna Burnette

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Stats 202 Data Analysis Project Winter 2016

Stats 202 Data Analysis Project Winter 2016 Stats 202 Data Analysis Project Winter 2016 1 Learning Objectives The learning goals of the Stats 202 data analysis project are Formulate clear scientific research questions; Explore public data sources

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Bob Stine Dept of Statistics, School University of Pennsylvania www-stat.wharton.upenn.edu/~stine What is data mining? An insult? Predictive modeling Large, wide data sets, often

More information