Multiple logistic regression analysis of cigarette use among high school students



Similar documents
Multinomial Logistic Regression

LOGISTIC REGRESSION ANALYSIS

Binary Logistic Regression

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

13. Poisson Regression Analysis

Additional sources Compilation of sources:

Multivariate Models of Student Success

Probability of Selecting Response Mode Between Paper- and Web-Based Options: Logistic Model Based on Institutional Demographics

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Strategies for Identifying Students at Risk for USMLE Step 1 Failure

DISCRIMINANT FUNCTION ANALYSIS (DA)

An Example of SAS Application in Public Health Research --- Predicting Smoking Behavior in Changqiao District, Shanghai, China

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Ordinal Regression. Chapter

Learner Self-efficacy Beliefs in a Computer-intensive Asynchronous College Algebra Course

activity guidelines (59.3 versus 25.9 percent, respectively) and four times as likely to meet muscle-strengthening

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

LOGIT AND PROBIT ANALYSIS

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

PREDICTORS OF DRUG ABUSE TREATMENT UTILIZATION AMONG FREQUENT COCAINE USERS

STATISTICA Formula Guide: Logistic Regression. Table of Contents

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HEALTHY PEOPLE 2020 CORE INDICATORS FOR ADOLESCENT AND YOUNG ADULT HEALTH

Sun Li Centre for Academic Computing

The Probit Link Function in Generalized Linear Models for Data Mining Applications

11. Analysis of Case-control Studies Logistic Regression

POLICY ON COMPREHENSIVE SCHOOL HEALTH EDUCATION

Fleet and Marine Corps Health Risk Assessment, 1 January 31 December, 2014

Multivariate Logistic Regression

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

Multinomial and Ordinal Logistic Regression

II. DISTRIBUTIONS distribution normal distribution. standard scores

Effect of Anxiety or Depression on Cancer Screening among Hispanic Immigrants

Credit Risk Analysis Using Logistic Regression Modeling

Written Example for Research Question: How is caffeine consumption associated with memory?

Missing data in randomized controlled trials (RCTs) can

Caregiving Impact on Depressive Symptoms for Family Caregivers of Terminally Ill Cancer Patients in Taiwan

Maternal and Child Health Issue Brief

PARALLEL LINES ASSUMPTION IN ORDINAL LOGISTIC REGRESSION AND ANALYSIS APPROACHES

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Health risk assessment: a standardized framework

Mind on Statistics. Chapter 4

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Data Cleaning and Missing Data Analysis

SUGI 29 Statistics and Data Analysis

THE EFFECT OF AGE AND TYPE OF ADVERTISING MEDIA EXPOSURE ON THE LIKELIHOOD OF RETURNING A CENSUS FORM IN THE 1998 CENSUS DRESS REHEARSAL

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Association Between Variables

Moderator and Mediator Analysis

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

MALAWI YOUTH DATA SHEET 2014

TEEN MARIJUANA USE WORSENS DEPRESSION

October 31, The effect of price on youth alcohol. consumption in Canada. Stephenson Strobel & Evelyn Forget. Introduction. Data and Methodology

Huron County Community Health Profile

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Logistic Regression.

Complementary and alternative medicine use in Chinese women with breast cancer: A Taiwanese survey

Moderation. Moderation

Certified in Public Health (CPH) Exam CONTENT OUTLINE

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

The South African Child Support Grant Impact Assessment. Evidence from a survey of children, adolescents and their households

Module 4 - Multiple Logistic Regression

Introduction to Quantitative Methods

Overview of the Adverse Childhood Experiences (ACE) Study. Robert F. Anda, MD, MS Co-Principal Investigator.

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

End User Satisfaction With a Food Manufacturing ERP

Analysing Questionnaires using Minitab (for SPSS queries contact -)

SPSS Guide: Regression Analysis

Introduction to Statistics and Quantitative Research Methods

Tobacco Questions for Surveys A Subset of Key Questions from the Global Adult Tobacco Survey (GATS) 2 nd Edition GTSS

Best Practices in Using Large, Complex Samples: The Importance of Using Appropriate Weights and Design Effect Compensation

CALCULATIONS & STATISTICS

Durham County Community Health. Assessment? What Is a Community Health

SUMAN DUVVURU STAT 567 PROJECT REPORT

VI. Introduction to Logistic Regression

Logit Models for Binary Data

Generalized Linear Models

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Does smoking impact your mortality?

Multivariate Analysis of Variance (MANOVA)

61 st ANNUAL CARPHA HEALTH RESEARCH CONFERENCE. June 23 25, 2016 Turks and Caicos Islands CALL FOR PAPERS

Impact of Diabetes on Treatment Outcomes among Maryland Tuberculosis Cases, Tania Tang PHASE Symposium May 12, 2007

Behavioral Health Indicators for Tennessee and the United States

Morbidity and Mortality among Adolescents and Young Adults in the United States

Research Methods & Experimental Design

Adolescence (13 19 years)

If You Think Investing is Gambling, You re Doing it Wrong!

Transcription:

Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict high school students cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of cocaine use, c) initial cigarette smoking age, d) feeling sad or hopeless, and e) physically inactive behavior. The results of the logistic regression analysis showed that the full model, which considered all the five independent variables together, was statistically significant.. The strongest predictors of youth smoking behavior were race, frequency of cocaine use and physically inactive behavior. For example, the odds of smoking are increased by a factor of 5.0 if the student is White compared to an African American, controlling for other variables in the model. The logistic model employed explained about 31% of the variance in current frequent cigarette use among the high school students. It correctly classified 93% of the cases. The key finding is that the selected variables are important correlates of frequent cigarette use among high school students. Keywords: Logistic regression, CDC, Youth risk behavior surveillance system, cigarette smoking. Multiple logistic regression analysis, Page 1

Tobacco use is the single most preventable cause of disease, disability, and death in the United States. Each year, an estimated 443,000 die prematurely from smoking or exposure to secondhand smoke, and another 8.6 million have a serious illness caused by smoking (CDC, 2010. p.1). INTRODUCTION The above quotation from CDC documents the harmful effects of cigarette smoking. Despite the risks associated with smoking, CDC (2010) estimates 46 million U.S. adults smoke cigarettes. The Department of Health and Human Services Center for Disease Control (CDC) and Prevention maintains cigarette use is the leading preventable cause of death in the United States. The CDC lists tobacco use by young adults as one of its priority health-risk behaviors (CDC, MMWR, 2004). Because most smokers initiate cigarette use during adolescence (Hersch, 1998), the prevalence of both past-year and past-month smoking peeks during smokers' late teens or early twenties. The adverse longterm effects of cigarette smoking as documented above by the CDC are inversely correlated with earlier initial smoking ages. Concomitantly, the CDC has set a national objective to reduce the prevalence of cigarette use among high school students to less than 16% for the year 2010 (CDC, MMWR, 2006). The CDC has developed a Youth Risk Behavior Surveillance System (YRBSS) to monitor the health risk behaviors among American youth. The system monitors six categories of high-risk behaviors that according to the CDC contribute substantially to "the leading causes of death, disability, and social problems among youth and adults in the United States" (CDC, MMWR, 2010 page 1). The categories of high-risk behaviors are: 1. Behaviors that contribute to unintentional injury and violence, 2. Alcohol and other drug use, 3. Sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases including HIV infection, 4. Physical inactivity, 5. Obesity and dietary behaviors, and, most importantly for this study, 6. Tobacco use. The purpose of this study was to assess the impact of a set of predictors on cigarette smoking behavior of high school students. Specifically, the target outcome behavior of interest is Current Frequent Cigarette Use (CFCU) among the youth. This study sought to answer the question: Can CFCU among the youth be accurately predicted from Race, Frequency of Cocaine Use (FCU), age at which the youth smoked a whole cigarette for the first time [Initial Cigarette Smoking Age (ICSA)], feeling of sadness or hopelessness, [Felt Sad or Hopeless (FSH)] and Physically Inactive Behavior (PIB). See Figure 1 for the specific CDC questionnaire items and the predictors or independent variables of this study. The paper is structured as follows: First, I briefly review the methodological strategy employed and the data source for this study. Subsequently, I present the results of the data analysis through descriptive statistics and a logistic regression indicating the significant predictors. Finally, I summarize the study results. METHOD The dependent or the outcome variable of interest, Current Frequent Cigarette Use was constructed as a yes/no dichotomous indicator of smoking status based on the Multiple logistic regression analysis, Page 2

response to CDC (2009) survey questionnaire item: During the past 30 days, on how many days did you smoke cigarettes? Respondents who answered they smoked 20 to 29 days or all the 30 days were classified as current frequent cigarette user (coded yes otherwise no for less frequent or non cigarette users). The categorical dependent variable of the study necessitated the use of multiple logistic regression model for investigating whether the likelihood of current frequent cigarette use among the youth was related to the selected predictors above. (Menard, 2010; Hosmer, & Lemeshow, 2000). The specific logistic regression model fitted to the data was: Logit (CFCU) = bo + b1 (Hispanic) + b2 (White) + b3 (ICSA) + b4 (FSH) + b5 (FCU) + b6 (PIB). (Where b0 is a constant. b1, b2,... b6 are logistic coefficients or estimates for the parameters, β1, β2,... β6). Race was a design variable coded, Hispanic = 1, White = 2, African American was the reference category. PIB and FSH are categorical variables and were dummy coded 0, 1. DATA SOURCE This study drew from the CDC s 2009 National Youth Risk Behavior Survey (YRBS), a questionnaire containing items designed to elicit information from high school students about the fore-mentioned categories of health-related risk behaviors, along with basic demographic information. The sampling frame for the survey consisted of all public and private schools with 9-12 grade students. Representative samples of students were drawn from those grades. All the questionnaires were self-administered. Student response rate for the national survey was 88%, approximately 16460 students. For this study, usable data from 11683 students were analyzed. DATA ANALYSIS. Prior to analysis of the data the dependent variable of interest, current frequent cigarette use which had binary response of Yes/No, was recoded 0, 1 (No/Yes). The coding change was made to reflect the predicted target category, current frequent cigarette use. As suggested by Menard (2010), preliminary analysis of the data was performed to check the assumptions of logistic regression with respect to the selected predictors of the study. ICSA, FSH, FCU, PIB and Race were subjected to Linear regression analysis to evaluate multicollinearity among the predictors or the independent variables. Multicollinearity among predictors in logistic regression creates problems for the validity of the model for the investigation. In particular, it affects the validity of the statistical tests of the regression coefficients by inflating their standard errors. (Garson, 2010). The results of the analysis showed that the data did not violate the multicollinearity assumption. The tolerance value of each independent variable was greater than.720 which exceeded the suggested criteria of below.10. (Pallant, 2007). Lack of multicollinearity among the independent variables was also supported by the obtained variance inflation factor (VIF) values. They were all well below the cut-off value of 10. (Field, 2005). The VIF values of Multiple logistic regression analysis, Page 3

the variables ranged from 1.013 to 1.376. After the preliminary analysis of the data, the binary logistic regression procedure in SPSS was used to perform the analysis to determine whether the likelihood of CFCU could be predicted from the independent variables. Data from 11683 high school students were included in this analysis. RESULTS Sample description: The age distribution of the students ranged from 14 to 18 years old. Among the respondents, approximately 70% were White, 17% African Americans and 13% Hispanics. About 11% of the students smoked a whole cigarette for the first time before age 13. About 46% of the students have ever smoked cigarette. The prevalence of Ever smoked cigarettes was higher among Hispanic students (51.6%) than White (46.1%) and African Americans (43.5). Most Ever smokers first smoked at age 13 or 14. The percentage of Ever smokers that were 13 or 14 years old was 25%. The corresponding percentages for Ever smokers that were 11 or 12, 15 or 16 years old were 12% and 23% respectively. The results of the data analysis showed that the proportion of students who have tried smoking increased with age. For example, by the age of 18, approximately 53% of the youth have tried smoking. In contrast to the relatively high prevalence (51.6%) of Ever Smoked Cigarettes among Hispanic youth, a relatively small percentage of Hispanics have ever smoked cigarettes daily, i.e., had ever smoked at least one cigarette every day for 30 days. For example, the prevalence of ever smoked cigarettes daily was higher among White (13.7%) than African American (4.3%) and Hispanic (6.3%). The results of the logistic regression analysis show that the full model which considered all the five independent variables together was statistically significant, χ2 = 1700.966, df = 6, N = 11424, p <.001.This implies that the odds for a high school student to indicate that he was a current frequent cigarette user were related to the five independent variables, Race, ICSA, FSH, FCU, and PIB. The model correctly classified approximately 93% of the cases. The pseudo R estimates indicate that the model explained between 13% (Cox & Snell R Squared) and 31% (Nagelkerke R Squared) of the variance in current frequent cigarette use. Table 1 presents a summary of the raw score binary logistic regression coefficients, Wald statistics, odds ratios [(Exp (B)] along with a 95% CI. Wald statistics indicate that all the variables significantly predict current frequent cigarette use. The strongest predictor of CFCU was race. In particular, white. The odds ratio for white was 5.0 i.e., the odds of a high school student indicating that he is a current frequent cigarette user are increased by a factor of 5.0 if the student is White compared to African American adjusting for the effects of the other predictors in the model. Other predictors that made significant contribution to the model (CFCU) were frequency of cocaine use, physical inactive behavior, feeling sad or hopeless and initial cigarette smoking age. The older the youth before he smoked a whole cigarette for the first time, the more likely he would report that he is a current frequent cigarette user. The predictor (ICSA) recorded an odds ratio of 1.6. Thus, the odds of smoking frequently compared to not smoking cigarettes frequently increase by a factor of 1.6 for a unit increase in age from when the youth smoked cigarette for the first time. In other words, the odds of Multiple logistic regression analysis, Page 4

current frequent cigarette use increase by 60% for each unit increase in ICSA. (Warner, 2008). Cocaine use and cigarette smoking behavior of young people were strongly related. Frequent cocaine use (FCU) predicts smoking behavior. (p <.001). For a unit increase in the number of times the youth uses any form of cocaine, including powder, crack, or freebase, the odds for smoking cigarettes frequently i.e., smoking 20 to 29 days or 30 days in one month are increased by a factor of 3.5 when all other variables are held constant. Feeling sad or hopeless recorded an odds ratio of 1.7. This indicates that the odds of smoking cigarettes frequently were about 1.7 times higher for high school students who felt sad or hopeless than for those who did not feel that way. As shown in Table 1, physically inactive behavior is implicated in student smoking. It recorded an odds ratio of about 1.7. In other words, students who were physically inactive have higher odds of smoking frequently ( more than 1.7 times as high) compared with students who are physically active, controlling for other variables in the model. SUMMARY Multiple logistic regression was used to jointly examine the influence of race, frequency of cocaine use, physically inactive/active behavior, initial cigarette smoking age and feeling sad or hopeless The key finding is that the selected variables are important correlates of current frequent cigarette use among high school students. The strongest predictors of youth smoking behavior are race, frequency of cocaine use and physically inactive behavior. For example, the odds of smoking are increased by a factor of 5.0 if the student is White compared to an African American. The logistic model employed explained about 31% of the variance in current frequent cigarette use among the high school students. It correctly classified 93% of the cases. Multiple logistic regression analysis, Page 5

Table 1- Logistic regression predicting the likelihood of high school students reporting frequent cigarette use. Predictor B S.E Wald Df P Odds Ratio 95% C.I. Lower 95% C.I Upper Race White 1.62.18 77.75 1.00 5.4 3.52 7.23 Hispanic -.46.27 2.95 1.09.63.38 1.07 FCU 1.26.07 285.62 1.00 3.52 3.04 4.07 FIB.54.08 41.44 1.00 1.71 1.45 2.02 FSH.53.08 43.65 1.00 1.70 1.45 1.98 ICSA.49.02 712.411 1.00 1.64 1.58 1.70 Constant -7.23.22 1050.87 1.00.00 Figure 1- Selected predictors from CDC s 2009 National Youth Risk Behavior Survey (YRBS). CDC questionnaire item Variable name Acronym Q5. What is your race Race Q23. During the past 12 months, did you feel ever so sad of hopeless almost every day for two weeks or more in a row that you stopped doing some usual activities? Q29. How old were you when you smoked a whole cigarette for the first time? Q50. During the past 30 days, how many times did you use any form of cocaine, including powder, crack, or freebase? Q80. During the past 7 days, on how many days were you physically active for a total of at least 60 minutes per day? Felt sad or hopeless Initial cigarette smoking age Frequency of cocaine use Physically inactive/active behavior FSH ICSA FCU PIB Multiple logistic regression analysis, Page 6

REFERENCES Center for Disease Control and Prevention, (2010). Tobacco Use: Targeting the Nation s Leading Killer. Available at: www.cdc.gov/chronicdisease/resources/publications/aag/osh.htm. Accessed on Sept. 2010. Center for Disease Control and Prevention, (2009). Youth Risk Behavior Survey. Available at: www.cdc.gov/yrbs. Accessed on Sept. 2010. CDC, (2004). Methodology of the Youth Risk Behavior Surveillance System. MMWR. Sept. 24, vol.53. RR-12 CDC, (2006). Cigarette Use Among High School Students United States, 1991-2005. MMWR. 55 (26); 724-726. CDC, (2010). Youth Risk Behavior Surveillance United States 2009. MMWR. June 4, vol.59/ss-5. Chao-Ying; Pen, J; & Task-Shing, H. (2002). Logistic Regression Analysis and Reporting: A Primer. Understanding Statistics: statistical Issues in Psychology, Education, and the Social Sciences. 1(1), 31-70. Field, A. (2009). Discovering Statistics Using SPSS. London: Sage. Garson, D. (2010). Logistic Regression: Footnotes, from North Carolina State University. Available at: http:/faculty.chass.ncsu.edu/garson/pa765/logistic.htm. Accessed on June 2010. Hersch, J. (1998). Teen Smoking Behavior and the Regulatory environment. Duke Law Journal, Vol. 47:1143 Hosmer, D.W; & Lemeshow, S. (2000). Applied Logistic Regression, second edition. New York: Wiley Menard, S. (2010). Logistic Regression: From Introductory to Advanced Concepts and Applications. Thousand Oaks, CA: Sage. Pallant, J. (2007). SPSS Survival Manual. Open University Press. England: Berkshire. Pampel, F.C. (2000). Logistic Regression: A Primer. Sage Quantitative Applications in the Social Sciences Series #132. Thousand Oaks, CA. Sage Publications. Tabachnic, B. G; & Fidell, L.S. (1996). Using Multivariate Statistics, 3 rd ed. New York: Harper Collings. Warner, R. (2008). Applied Statistics: From Bivariate Through Multivariate Techniques. Thousand Oaks, CA. Sage Publications. Multiple logistic regression analysis, Page 7

Multiple logistic regression analysis, Page 8

Multiple logistic regression analysis, Page 9