Growth Charts of Body Mass Index (BMI) with Quantile Regression
|
|
|
- Kelley Higgins
- 9 years ago
- Views:
Transcription
1 Growth Charts of Body Mass Index (BMI) with Quantile Regression Colin Chen SAS Institute Inc. Cary, NC, U.S.A. Abstract Growth charts of body mass index (BMI) are constructed from the recent four-year national crosssectional survey data ( ) using parametric quantile regression methods, which are implemented with a newly developed SAS procedure ( and SAS macros. KEY WORDS: Body mass index, growth charts, quantile regression, smoothing algorithm, simplex, interior point. 1 Introduction Overweight has become a common problem in public health, especially for children. Obesity has been related to numerous health risks, both physical and psychological. Body mass index, defined as the ratio of weight (kg) to squared height (m 2 ), has been popularly used as a measure of overweight and obesity. The percentiles of BMI for a specified age is of particular interest in light of public health concerns. Not only are the upper percentiles closely watched for overweight and obesity, the lower percentiles are also observed for underweight. The empirical percentiles with grouped age provide a discrete approximation for the population percentiles. However, continuous percentile curves are both more accurate and attractive. There have been several methods used to construct such age-dependent growth charts. Early methods fit smoothing curves on sample quantiles of segmented age groups. However, such methods are not robust to outliers. Large sample size is needed in order to estimate the percentiles in each age group with appropriate precision. The segmentation may lose information from nearby groups. To avoid segmentation, Cole and Green (1992) developed a Box-Cox transformation-based semiparametric approach from the LMS (Lamda- Mu-Sigma) method introduced by Cole (1988). The semiparametric LMS method solves penalized likelihood equations. Because of the lack of finite expectation for some of the derivatives of the penalized log-likelihood, solutions of these equations could be sensitive to a start point. There is also the question whether the Box- Cox transformation is good enough for the specified distributional assumption, e.g., normality. Quantile regression, which was introduced by Koenker and Bassett (1978), is an alternative way to create growth charts. It does not put any distributional assumption beforehand. It is also relatively easy to accommodate other covariates besides age. Computationally, it is fast and stable. For a random variable Y with probability distribution function the τth quantile of Y 1 is defined as the inverse function where 0 < τ < 1. In particular, the median is Q(1/2). F (y) = Prob (Y y), (1) Q(τ) = inf {y : F (y) τ}, (2) Phone: [email protected] 1 Recall that a student s score on a test is at the τth quantile if his (or her) grade is better than 100τ% of the students who took the test. The score is also said to be at the 100τth percentile. 1
2 The τth sample quantile ˆξ(τ), which is an analogue of Q(τ), may be formulated as the solution of the optimization problem n ρ τ (y i ξ), (3) min ξ R i=1 where ρ τ (z) = z(τ I(z < 0)), 0 < τ < 1, is usually called the check function. When covariates X (e.g., age) are considered, the linear conditional quantile function, Q(τ X = x) = x β(τ), can be estimated by solving ˆβ(τ) = argmin β R p n i=1 ρ τ (y i x iβ) (4) for any quantile τ (0, 1). The quantity ˆβ(τ) is called the regression quantile. The case τ = 1/2, which minimizes the sum of absolute residuals, is usually known as L 1 (median) regression. As the unconditional sample quantile ˆξ(τ), for a given τ, τ percent of the observed values of the continuous response variable Y (i.e., BMI) is expected to fall below the conditional quantile hyperplane x ˆβ(τ). Given X = x, as an unbiased estimator of the 100τth percentile of the conditional distribution of Y, x ˆβ(τ) should be close to the 100τth sample percentile at X = x. This indicates that the percentile curves constructed with the quantile regression method fit the local empirical percentiles well. BMI has been known to be skewed on the right. Departure from the normality assumption after the Box-Cox transformation in the LMS method was reported in Flegal (1999). Departure from normality, especially in the tails, can affect estimates of underweight or overweight. This further promotes the quantile regression method for constructing growth charts of BMI. Historically, quantile regression, which solves the optimization problem of (5) with a general simplex algorithm, was known to be computationally expensive. Barrodale and Roberts (1973) developed a faster simplex algorithm according to the special structure of the design matrix for median regression. Koenker and d Orey (1993) extended this special version of the simplex algorithm to quantile regression for any τ. However, in large statistical applications, the simplex algorithm is regarded as computationally demanding. In theory, the worst-case performance of the simplex algorithm shows an exponentially increasing number of iterations with sample size. Since the general quantile regression fits nicely into the standard primal-dual formulations of linear programming, the powerful interior point algorithm can be applied. The worst-case performance of the interior point algorithm is proven to be better than that of the simplex algorithm. More important, experience has shown that the interior point algorithm is advantageous for larger problems (Portnoy and Koenker 1997). Besides the interior point method, various heuristic approaches have been provided for computing L 1 - type solutions. Among these approaches, the finite smoothing algorithm of Madsen and Nielsen (1993) is the most useful. By approximating the L 1 -type objective function with a smoothing function, the Newton- Raphson algorithm can be used iteratively to obtain the solution after finite loops. The smoothing algorithm extends naturally to general quantile regression (Chen 2003). It turns out to be significantly faster for problems with a large number of covariates. These three algorithms represent the most advanced algorithms for computing regression quantiles. Comparisons of these algorithms and other computational issues for quantile regression are described in Chen and Wei (2005). All these three algorithms have been implemented in the newly developed QUANTREG procedure in SAS, which can be downloaded from The QUANTREG procedure also computes three types of confidence intervals for regression quantiles and conducts diagnostics and other statistical inferences. In the following sections, some usages of the procedure will be demonstrated. More details about this new procedure can be found in Chen (2005) and the documentation from the download site. The purpose of this paper is to show how to use the quantile regression method to construct BMI growth charts, to introduce the new QUANTREG procedure, and to compare the BMI growth charts constructed using the recent four-year national cross-sectional survey data ( ) with the CDC 2000 BMI reference growth charts. 2
3 Table 1: Summary Statistics Variable Q1 Median Q3 Mean SD MAD Men WEIGHT HEIGHT AGE BMI Women WEIGHT HEIGHT AGE BMI Data Since 1999, the National Center for Health Statistics has conducted a national health and nutrition examination (NHANES) survey annually. The survey data are released in a two-year cycle. The recent releases are NHANES and NHANES Each release includes several data files. To construct the growth charts of BMI, two data files are needed. One contains demographic variables, such as age, sex, race, income, etc. The other contains variables related to body measurements, such as height, weight, BMI, head circumference, etc. Each data file includes the respondent sequence number (SEQN), which identifies each individual. Different files in the same survey can be merged by this variable. The data files are in the binary XPT format and can be easily read and edited using the SAS editor. After merging the two data files for each survey, the two merged files are combined to form the four-year ( ) data set. Records for pregnant women are deleted. Then the following variables, WEIGHT(kg), HEIGHT(cm), BMI(kg/m 2 ), AGE(year), GENDER, and SEQN, are kept and the others are dropped. AGE was recorded in the best months for younger respondents (<20 years old) and in the best years for elders. Months are transfered to years in decimal. Records with missing values in any of the six variables are also deleted. The data set bmi contains all the remaining records with ages from 2 years to 80 years. There are 8,250 men and 8,053 women in bmi. Since men and women have different growth patterns, bmi is split into two data sets, bmimen and bmiwomen, by gender. Parallel growth charts of BMI are constructed for men and women with these two data sets. It should be pointed out that the sample weight variable in the demographic data files is not included in this study. The sample weight is used to balance oversampling and undersampling in the surveys to make the data more representative for the national pupulation. This is certainly an important factor when national reference growth charts are constructed. In this paper, individuals in the two surveys are the target population and sample weight is not considered. 3 Preliminary Analysis Table 1 displays the summary statistics of weight, height, age, and BMI for men and women in the two data sets bmimen and bmiwomen, respectively. Q 1 and Q 3 are the first and third quantiles. SD is the standard deviation and MAD is the median absolute deviation, which is a robust measure of the univariate scale. Large differences between SD and MAD indicate that there may be some outliers in that variable. Among the four variables, height shows the greatest differences between SD and MAD for both men and women. These statistics are univariate. To explore the growth pattern of BMI, a preliminary median regression is used. The logarithm of BMI is used as the response. Although the logarithm transformation might not help the quantile regression fit, it might help the statistical inference on regression quantiles. The response is fitted with a parametric model, which involves six powers of AGE. These power terms are tested with the Wald and likelihood ratio tests for significance. The median regression can be done with the QUANTREG procedure: 3
4 Table 2: Parameter Estimates with Median Regression: Men Parameter DF Estimate 95% Confidence Limits Intercept inveage sqrtage age sqrtage*age age*age age*age*age proc quantreg data=bmimen algorithm=interior ci=resampling; model logbmi = inveage sqrtage age sqrtage*age age*age age*age*age / diagnostics cutoff=4.5 quantile=.5; id SEQN age weight height bmi; test_age_cubic: test age*age*age / wald lr; run; where inveage = age 1, sqrtage = age 1 2. Table 2 displays the estimated parameters and their 95% confidence intervals based on the covariance computed by the resampling method. The resampling method implements the Markov chain marginal bootstrap (MCMB) of He and Hu (2002), which greatly promotes the bootstrap method for quantile regression. More details about confidence intervals of regression quantiles by MCMB and other methods can be found in Kocherginsky, He, and Mu (2005) and Chen and Wei (2005). All confidence intervals do not include 0, which indicates that these parameters are significant at 5% test level. Median regression (and general quantile regression) is robust to extremes of the response variable. The QUANTREG procedure provides a diagnostic table of outliers. With CUTOFF=4.5, 22 respondents (0.27% of 8,250 men) are identified as outliers. Median regression with bmiwomen can be done in parallel fashion. The results are similar with bmimen. With CUTOFF=4.5, only 1 female respondent is identified as an outlier. With CUTOFF=3.5, which corresponds to a probability of less than 0.25E 3 under normality, 34 female respondents (0.42% of 8,053), compared with 83 (1.01% of 8,250) male respondents, are identified as outliers. The same six power terms are tested as significant in median regression with bmiwomen. This model used for median regression is used for other quantiles in order to construct the growth charts for both males and females. 4 BMI Growth Charts The parametric method with the six powers in AGE identified in the preliminary analysis is used to construct the growth charts of BMI. This can be done with the following SAS macro. %macro quantiles(nquant, Quantiles); %do i=1 %to &NQuant; proc quantreg data=bmimen ci=none algorithm=interior; model logbmi = inveage sqrtage age sqrtage*age age*age age*age*age / quantile=%scan(&quantiles,&i,, ); output out=outp&i pred=p&i; run; %end; %mend; The macro can be used for any number of quantiles. The following statements request fitted values for 10 quantiles ranging from 0.03 to
5 %let quantiles = %str(.03,.05,.10,.25,.5,.75,.85,.90,.95,.97); %quantiles(10,&quantiles); For each quantile, the regression quantile is computed as in median regression and fitted values of BMI are computed at each observed age and saved in an output data set. These fitted values are plotted against age to create the percentile curves. Figure 1: BMI Growth Percentile Curves with Polynomials The 10 percentile curves together with the scatter plot of BMI for men are shown in Figure 1, and correspondingly for women. For both men and women, starting from age 2, BMI has a small drop followed by a quick growth until age 20. For men, BMI becomes stable at around age 30, while for women the growth continues to around age 40. Beginning from age 10, girls show a faster growth rate of BMI for all upper quantiles and the median. This prevalence continues to around age 40 and results in a larger proportion of women with higher BMI. This is also shown by the stronger concavity of the 95th and 97th percentile curves for women. For both men and women, BMI decreases from age 70. The BMI growth patterns for both men and women suggest that an effective way to control overweight in a population is to start in childhood. The 10 quantiles selected here are the 10 quantiles used in the CDC 2000 BMI reference charts; see Kuczmarski et al. (2002). The CDC 2000 BMI reference charts were constructed from a much larger data pool, which consists of data from five previous national surveys between 1963 to The percentile curves in these reference charts are developed from two stages. In the smoothing stage, smoothed empirical percentiles are computed with a locally weighted regression on the empirical percentiles of segmented age intervals. In the transformation stage, these smoothed empirical quantiles are used to fit the LMS model with a set of 10 equations, which corresponds to the 10 selected quantiles. The three parameters λ, µ, and σ in the LMS model are solved for each age interval. The CDC 2000 BMI reference charts were constructed only for young people of 2 to 20 years old. The age intervals have a length of one month each. Interpolation of these parameters is suggested for finer age intervals. This smoothing-transformation-based method requires large and balanced data, such that there are enough data in each segmented age interval. For the four-year survey data used here, age was recorded in the best months for young people and in the best years for elderly people. To construct BMI growth charts for all ages, such an interval segmentation is hard to achieve. However, quantile regression, which does not involve any age segmentation, uses all data to fit each percentile curve. Although different methods are used for the BMI growth charts with the recent four-year survey data here from the CDC 2000 BMI reference charts, these charts are comparable. This comparison reveals important nationwide health information, especially for children. Growth pattern is more easily changed for young people by growth-related environmental factors, such as nutrition, dietary habits, exercise, and so on. 5
6 The comparison is summarized as follows: (a) For boys of 2 to 20 years old, the upper percentile curves in Figure 1 have large lifts compared to the upper percentile curves in the CDC 2000 BMI reference charts. The 97th percentile for 10-year-old boys in Figure 1 is about 6.4 BMI units higher (an increase of 27%); and the 85th percentile is 4.4 BMI units higher (an increase of 23%), respectively. (b) For girls of 2 to 20 years old, the upper percentile curves (not diaplayed) also show large lifts compared to the upper percentile curves in the CDC 2000 BMI reference charts. The 97th percentile for 10-yearold girls is about 5.5 BMI units higher (an increase of 22.5%); and the 85th percentile is about 4.0 BMI units higher (an increase of 20%), respectively. (c) For boys of 2 to 20 years old, the median and lower percentile curves in Figure 1 do not have too much lift compared to the upper percentile curves in the CDC 2000 BMI reference charts. The median for 10-year-old boys in Figure 1 is about 1.5 BMI units higher (an increase of 9%); and the 3rd percentiles in these two charts are almost equal. (d) Similarly, for girls of 2 to 20 years old, the median and lower percentiles do not show much difference in the two charts. The median for 10-year-old girls is about 1.4 BMI units higher (an increase of 8.5%); and again the 3rd percentiles in these two charts are almost equal. (e) For both boys and girls, the growth pattern for the 95th and 97th percentile curves in Figure 1 looks different from that of the CDC BMI reference charts. The curves in are concave in the neighborhood of age 20, but the corresponding two curves in the CDC 2000 BMI reference charts are not. (f) The percentile curves in Figure 1 have some boundary effect at age 2 because there are not enough data at age 2 or closeby. Results from (a) and (b) can be interpreted as a warning of overweight or obesity for boys and girls in the four-year survey. These results are consistent with the empirical results of Hedley et. al. (2004). Similar warnings have been issued in other countries, e.g., in Switzerland by Zimmermann et al. (2004). (c) and (d) indicate that BMI does not increase for the entire population over time. That indicates that overweight and obesity can be controlled. The cause for (e) is not so clear. It might be due to the different models used for the two sets of charts, or it might be due to the different data used. The boundary effect in (f) might be reduced by using some smoothing techniques in nonparametric quantile regression, which will be discussed in the next section. 5 Discussion We have introduced an approach to construct growth charts of BMI using parametric quantile regression. This approach can also be used to construct growth charts for other age-related (time-related) variables, such as height, weight, head circumference, and other medical measurements. If the target data is used as reference data, such growth charts constructed using quantile regression can be used as reference growth charts. Instead of using polynomials to fit the percentile curves, cubic B-splines based on age can be used in quantile regression as a method of nonparametric quantile regression. Wei et. al. (2004) have used such a nonparametric quantile regression method to construct reference growth charts for Finnish children. For the four-year national survey data, we tried the nonparametric quantile regression method with B-splines. For the BMI data of men, we selected the knots {2.5, 5, 10, 20, 35, 55, 75}. Based on these knots, the basis B-spline functions can be generated. Refer to Chen (2005) for how to generate the basis B-spline functions in SAS. By replacing the powers of age in the macro with these basis variables, percentile curves can be drawn. Compared to the parametric quantile regression method, the nonparametric quantile regression method with B-splines needs extra effort. First, it needs a method to select the knots, or some other methods to fix the basis B-spline functions. Too many knots or too many basis functions provide a good fit of the percentile curves to the data, but make the curves too erratic. Second, a large amount of data is usually needed to make the constructed percentile curves stable to some changes of parameters. 6
7 The nonparametric quantile regression method with B-splines does not reduce the boundary effect. Eilers and Marx (1996) introduced a new smoothing technique using B-splines, which is called the P-spline approach. We are investigating the performance of the P-spline approach in nonparametric quantile regression, especially in constructing growth charts. References [1] Barrodale, I. and Roberts, F.D.K. (1973), An improved algorithm for discrete l 1 linear approximation, SIAM J. Numer. Anal., 10, [2] Chen, C. (2003), A finite smoothing algorithm for quantile regression, Preprint. [3] Chen, C. (2005), An introduction to quantile regression and the QUANTREG procedure, Proceedings of the Thirtieth Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc. [4] Chen, C. and Wei, Y. (2005), Computational issues on quantile regression, Special Issue on Quantile Regression and Related Methods, Sankhya, forthcoming. [5] Cole, T.J. (1988), Fitting smoothed centile curves to reference data (with discussion), JRSS A, 151, [6] Cole, T.J. and Green, P.J. (1992), Smoothing reference centiles curves: The LMS method and penalized likelihood, Statistics in Medicine, 11, [7] Eilers P.H.C. and Marx, B.D. (1996), Flexible smoothing using B-spline and panelized likelihood, Statistical Science, 11, [8] Flegal, K.M. (1999), Curve smoothing and transformations in the development of growth curves, American Journal of Clinical Nutrition, 70, 163S-5S. [9] He, X. and Hu, F. (2002), Markov chain marginal bootstrap, Journal of the American Statistical Association, 97, [10] Hedley, A.A., Ogden, C.L., Johnson, C.L., Carroll, M.D., Curtin, L.R., Flegal, K.M. (2004), Prevalence of overweight and obesity among US children, adolescents, and adults, JAMA, 291(23), [11] Kocherginsky, M., He, X., and Mu, Y. (2005). Practical confidence intervals for regression quantiles. Journal of Computational and Graphical Statistics, 14, [12] Koenker, R. and Bassett, G. (1978), Regression quantiles, Econometrica, 46, [13] Koenker, R. and d Orey, V. (1993), Computing regression quantiles, Applied Statistics, 43, [14] Kuczmarski R.J., Ogden C.L., Guo S.S., et al. (2002), 2000 CDC growth charts for the United States: Methods and development. Vital Health Stat., 11, 246, [15] Madsen, K. and Nielsen, H.B. (1993), A finite smoothing algorithm for linear L 1 estimation, SIAM J. Optimization, 3, [16] Portnoy, S. and Koenker, R. (1997), The Gaussian Hare and the Laplacian Tortoise: Computation of Squared-error vs. Absolute-error Estimators, Statistical Science, 12, [17] Wei, Y., Pere, A., Koenker, R., and He, X. (2004), Quantile regression methods for reference growth charts. Preprint. [18] Zimmermann, M.B., Gübeli, C., Püntener, C., and Molinari, L. (2004), Overweight and obesity in 6 12 year old children in Switzerland, Swiss Med. Wkly., 134,
Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008
Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008 Cynthia L. Ogden, Ph.D.; Molly M. Lamb, Ph.D.; Margaret D. Carroll, M.S.P.H.; and Katherine M. Flegal, Ph.D. Key findings
A simple guide to classifying body mass index in children. June 2011
A simple guide to classifying body mass index in children June 2011 Delivered by NOO on behalf of the Public Health Observatories in England NOO A simple guide to classifying body mass index in children
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
2016 PQRS OPTIONS FOR INDIVIDUAL MEASURES: CLAIMS, REGISTRY
Measure #128 (NQF 0421): Preventive Care and Screening: Body Mass Index (BMI) Screening and Follow-Up Plan National Quality Strategy Domain: Community/Population Health 2016 PQRS OPTIONS F INDIVIDUAL MEASURES:
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
USE OF CONSUMER PANEL SURVEY DATA FOR PUBLIC HEALTH COMMUNICATION PLANNING: AN EVALUATION OF SURVEY RESULTS. William E. Pollard
USE OF CONSUMER PANEL SURVEY DATA FOR PUBLIC HEALTH COMMUNICATION PLANNING: AN EVALUATION OF SURVEY RESULTS William E. Pollard Office of Communication Centers for Disease Control and Prevention 1600 Clifton
Defining obesity for children by using Body Mass Index
Department of Economics and Social Sciences Dalarna University D level essay in statistics, 2008 Defining obesity for children by using Body Mass Index Authors Asrar Hussain Mahdy 1 Iram Bilal 2 SUPERVISED
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test
The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation
Statistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
Statistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
Dongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
PS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng [email protected] The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Elementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
Probability Distributions
CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Child Obesity and Socioeconomic Status
NOO data factsheet Child Obesity and Socioeconomic Status September 2012 Key points There are significant inequalities in obesity prevalence for children, both girls and boys, and across different age
SAS Certificate Applied Statistics and SAS Programming
SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine
2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
Chapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
A C T R esearcli R e p o rt S eries 2 0 0 5. Using ACT Assessment Scores to Set Benchmarks for College Readiness. IJeff Allen.
A C T R esearcli R e p o rt S eries 2 0 0 5 Using ACT Assessment Scores to Set Benchmarks for College Readiness IJeff Allen Jim Sconing ACT August 2005 For additional copies write: ACT Research Report
Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
Chapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Point and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
Statistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: [email protected] Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC
Algebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Assessing Model Fit and Finding a Fit Model
Paper 214-29 Assessing Model Fit and Finding a Fit Model Pippa Simpson, University of Arkansas for Medical Sciences, Little Rock, AR Robert Hamer, University of North Carolina, Chapel Hill, NC ChanHee
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
Linear Models and Conjoint Analysis with Nonlinear Spline Transformations
Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including
Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
Local Quantile Regression
Wolfgang K. Härdle Vladimir Spokoiny Weining Wang Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin http://lvb.wiwi.hu-berlin.de
Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs
Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
Lecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry
Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing
Confidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Point Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
Comparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
The Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
