CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following questions as they pertain to bivariate statistical approaches to testing for group differences and variable association. a) The T-test, ANOVA, and Chi-Square test are all ways of detecting variable associates via examinations of groups differences and associations. In what instance would you expect each of the three tests to be used? b) Pertaining to the first two tests listed above, how are the formal null hypotheses stated? What are the meanings of these formal statements? c) What is sampling theory? How is sampling theory linked to probability? and how does this underlie our ability to produce reliable and statistics within reasonable levels of confidence? d) Suppose you must choose the one- or two-tailed version pertain to certain tests mentioned above. In what cases would a one-tail test appropriate? In what case would a two-tail test be appropriate? Why?
Question II: Multivariate Regression Analysis OLS (see attached output) Familial disruption has been linked to higher levels of social disorganization and crime rates in research in the area of ecological criminology. However, levels of familial disruption have also been shown to be significantly related to regional differences in crime rates. Using county level data, the attached output has been compiled to test for the potential effects of being a Southern County ( south ) and the county level percent divorced ( pctdiv ) on the index crime rate of the county ( indexrt ). Interpret the output by detailing the results of the analysis and referring to the appropriate tables in your attempt to answer this question. Be sure to properly, and formally, interpret all appropriate statistics from the output. In doing so, focus on three basic research questions: ) What are the basic assumptions of the OLS regression approach? How are each tested in this case? does this data violate any of these assumptions? 2) Interpret all useful statistical output? 3) If we wanted to test that the relationship between familial disruption and crime rates at the county level were related to the region of the country in which the county was geographically, how would we do that in both mediation and moderation form? Logistic (see attached output) Using survey data associated with conditions, fear, and demographics suppose an analysis of one s fear of their was conducted. In the dataset there are a series of variables, including a binary indicator of fear ( = ever feeling unsafe in one s in reference to = never feeling unsafe). For this question then, we are predicting ever feeling unsafe in one s by race (being white), gender (being male) and by age. Interpret the output by detailing the results of the analysis and referring to the appropriate tables in your attempt to answer this question. Be sure to properly, and formally, interpret all appropriate statistics from the output.
In doing so, focus on three basic research questions/directives: ) What are the basic assumptions of the Logistic regression approach? How does this differ from the OLS approach?... and what inherent violations of the OLS approach make using the Logistic Regression approach necessary (hint: refer to violations of OLS assumptions)? 2) What is the nature of the Block and Block portions of the output? What does each section represent? 3) Interpret all useful statistical output.
Regression Question 2 Part Page of 5 Variables Entered/Removed b Variables Entered Variables Removed Method % of the population divorced, Southern County Indicator. Enter b. Dependent Variable: County Crime Rate per, Summary b R R Square Adjusted R Square Std. Error of the Estimate Durbin- Watson.36 a.93.92 29.59338.79 a. Predictors: (Constant), % of the population divorced, Southern County Indicator b. Dependent Variable: County Crime Rate per, ANOVA b Regression Residual Total Sum of Squares 2235.52 89.26 3699.698 df 2 353 355 Mean Square 67.726 875.768 F 69.673 a. Predictors: (Constant), % of the population divorced, Southern County Indicator b. Dependent Variable: County Crime Rate per, Sig. a (Constant) Southern County Indicator % of the population divorced Coefficients a Unstandardized Coefficients B Std. Error 32.689.22 2..953.39.68 Page
(Constant) Southern County Indicator % of the population divorced Question 2 Part Page 2 of 5 Standardized Coefficients Beta Coefficients a.297.23 t 26.769.85.826 Sig..9 Collinearity Statistics Tolerance.886.886 VIF.29.29 a. Dependent Variable: County Crime Rate per, Collinearity Diagnostics a Dimension 2 3 Eigenvalue 2.223.529.28 Condition Index 2.5 2.997 a. Dependent Variable: County Crime Rate per, (Constant).7.8.75 Variance Proportions Southern County Indicator.8.88. % of the population divorced.7.6.88 Case Number Casewise Diagnostics a County Crime Rate per, Std. Residual Predicted Value Residual 63 3.5 23.89 3.9662 88.92382 238 3.65 2.55 3.6885 7.8655 292 3.3 8.3 55.75 92.6595 359 5.756 227.27 56.9285 7.38 563.2 332.27 32.7583 299.57 62 6.2 27.2 57.229 89.9977 9 3.62.5 32.9 7.23899 3.27 22.96 33.3693 89.597 3 3.396 5.88 5.373.5655 3.952 52.3 35.3967 6.9335 2 3.832 6.33 32.9388 3.392 5 3.3 33.2 3.2 98.9559 28 3.99 59.66 56.23 3.53688 a. Dependent Variable: County Crime Rate per, Page 2
Question 2 Part Page 3 of 5 Residuals Statistics a Minimum Maximum Mean Std. Deviation N Predicted Value 32.6888 58.55 38.9 9.96 356 Residual -57.3899 299.575 29.5753 356 Std. Predicted Value -.656 2.27 356 Std. Residual -.937.2.999 356 a. Dependent Variable: County Crime Rate per, Charts Histogram Dependent Variable: County Crime Rate per, 3 Mean = 2.79E-5 Std. Dev. =.999 N =,356 Frequency 2-2 2 6 8 2 Regression Standardized Residual Page 3
Question 2 Part Page of 5 Normal P-P Plot of Regression Standardized Residual. Dependent Variable: County Crime Rate per,.8 Expected Cum Prob.6..2...2..6.8. Observed Cum Prob Page
Question 2 Part Page 5 of 5 Scatterplot Dependent Variable: County Crime Rate per, Regression Standardized Residual 8 6 2-2 - -2 2 Regression Standardized Predicted Value 6 Page 5
Logistic Regression Question 2 Part 2 Page of 3 Case Processing Summary Unweighted Cases a N Selected Cases Included in Analysis Missing Cases Total 52 52 Unselected Cases Total 52 Percent..... a. If weight is in effect, see classification table for the total number of cases. Dependent Variable Encoding Original Value Internal Value have felt unsafe Block : Beginning Block Classification Table a,b Observed Predicted have felt unsafe Step have felt unsafe 939 63 Overall Percentage Classification Table a,b Observed Predicted Percentage Correct Step have felt unsafe.. Overall Percentage 6.9 a. Constant is included in the model. b. The cut value is.5 Page
Question 2 Part 2 Page 2 of 3 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step Constant -.3.52 72.29.62 Variables not in the Equation Score df Sig. Step Variables black.29.39 gender 26.5 age 7.97 emp_ft.25.263 Overall Statistics 5.865 Block : Method = Enter Omnibus Tests of Coefficients Chi-square df Sig. Step Step 5.583 Block 5.583 5.583 Step -2 Log likelihood Summary Cox & Snell R Square Nagelkerke R Square 22.278 a.33.5 a. Estimation terminated at iteration number 3 because parameter estimates changed by less than.. Classification Table a Observed Predicted have felt unsafe Step have felt unsafe 85 96 85 7 Overall Percentage Page 2
Step Observed Overall Percentage Question 2 Part 2 Page 3 of 3 Classification Table a have felt unsafe Predicted Percentage Correct 9.9 7.7 62.3 a. The cut value is.5 Variables in the Equation Step a black B.2 S.E.. Wald.533 df Sig..26 Exp(B).52 gender -.558. 25.53.573 age -.5.3 2.666.985 emp_ft -.8. 2.62.5.835 Constant.52.96 7.39.8.682 a. Variable(s) entered on step : black, gender, age, emp_ft. Page 3