Categorical Variables



Similar documents
Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Introduction; Descriptive & Univariate Statistics

Chapter 5 Analysis of variance SPSS Analysis of variance

Dongfeng Li. Autumn 2010

Histogram of Numeric Data Distribution from the UNIVARIATE Procedure


From The Little SAS Book, Fifth Edition. Full book available for purchase here.

Evaluating the results of a car crash study using Statistical Analysis System. Kennesaw State University

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC

Additional sources Compilation of sources:

EXST SAS Lab Lab #4: Data input and dataset modifications

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

>

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Correlation and Regression

Using the Magical Keyword "INTO:" in PROC SQL

IBM SPSS Direct Marketing 23

Data, Measurements, Features

Association Between Variables

PROC MEANS: More than just your average procedure

SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA

VI. Introduction to Logistic Regression

Two Correlated Proportions (McNemar Test)

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Text Analytics Illustrated with a Simple Data Set

Descriptive Analysis

Module 14: Missing Data Stata Practical

Analysing Questionnaires using Minitab (for SPSS queries contact -)

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Paper PO06. Randomization in Clinical Trial Studies

Answering Your Research Questions with Descriptive Statistics Diana Suhr, University of Northern Colorado

IBM SPSS Direct Marketing 22

Innovative Techniques and Tools to Detect Data Quality Problems

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Chapter 8. Comparing Two Groups

Foundation of Quantitative Data Analysis

SAS Software to Fit the Generalized Linear Model

Introduction to Quantitative Methods

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Instruction to all the Pest Monitors/Data Entry Operators

Chi-square test Fisher s Exact test

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

CLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining.

Research Methods & Experimental Design

Exploratory data analysis (Chapter 2) Fall 2011

How to set the main menu of STATA to default factory settings standards

Survey Analysis: Options for Missing Data

Quick Stata Guide by Liz Foster

Data Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Creating Dynamic Reports Using Data Exchange to Excel

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Data exploration with Microsoft Excel: analysing more than one variable

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Kindergarten Plants and Plant Growth Assessment

Chapter 6 INTERVALS Statement. Chapter Table of Contents

3. Analysis of Qualitative Data

Taming the PROC TRANSPOSE

R with Rcmdr: BASIC INSTRUCTIONS

SPSS Tests for Versions 9 to 13

THE SCIENCE THE FUTURE OF CANADIAN CANOLA: APPLY THE SCIENCE OF AGRONOMICS TO MAXIMIZE GENETIC POTENTIAL.

Concepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D.

Sensitivity Analysis in Multiple Imputation for Missing Data

SAS Views The Best of Both Worlds

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1

Topic 9. Factorial Experiments [ST&D Chapter 15]

PharmaSUG Paper QT26

Basic Statistical and Modeling Procedures Using SAS

Different ways of calculating percentiles using SAS Arun Akkinapalli, ebay Inc, San Jose CA

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

A Property and Casualty Insurance Predictive Modeling Process in SAS

The Seven Basic Tools. QUALITY CONTROL TOOLS (The Seven Basic Tools) What are check sheets? CHECK SHEET. Illustration (Painting defects)

Salary. Cumulative Frequency

This can be useful to temporarily deactivate programming segments without actually deleting the statements.

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Introduction Course in SPSS - Evening 1

Speaker Summary Note

Analyzing Research Data Using Excel

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Using SAS to Perform a Table Lookup Last revised: 05/22/96

Paper Exploring, Analyzing, and Summarizing Your Data: Choosing and Using the Right SAS Tool from a Rich Portfolio

Chapter 32 Histograms and Bar Charts. Chapter Table of Contents VARIABLES METHOD OUTPUT REFERENCES...474

Lecture 2: Types of Variables

MEASURES OF LOCATION AND SPREAD

Chapter 1: Integrated Pest Management (IPM)

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

InfiniteInsight 6.5 sp4

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison

Transcription:

Categorical Variables Variables which record a response as a set of categories are termed categorical. Such variables fall into three classifications: Nominal, Ordinal, and Interval. Nominal variables have categories that have no natural order to them. Examples could be different crops: wheat, barley, and peas or different irrigation methods: flood, furrow, and dry land. Ordinal variables, on the other hand, do have a natural order. Examples of these could be pesticide levels: high, medium, and low or an injury scale: 0, 1, 2, 3, 4, and 5. Caution should be used with some tests designed for ordinal variables because they may assume equal 'distance' between the levels. Such distances may be hard to actually quantify. The last type is the interval variable and it is, as the name implies, created from intervals on a continuos scale. This could be categories based on weights, e.g.: 0-2 gm, 2-10 gm, and >10 gm. 1/5

Categorical Responses The actual response in any category type is binary (i.e. it records one of two possible conditions or outcomes). In the examples above, this could be the presence or absence of a weed or insect species for the different crops, or a YES/NO answer to irrigation method, etc. For SAS, it does not matter whether the variables are numeric or alphabetic. The yes/no type variable could be entered as "YES":"NO" or 1:2 or "Y":"N", etc. However, it is important to maintain the same case with alphabetic variables. The values "N" and "n" are not the same in SAS! A sample data file could look like the following: OBS VAR1 VAR2 VAR3... 1 YES 1 Y 2 YES 2 N 3 NO 1 Y 2/5

Although SAS categorical procedures will automatically tabulate the response categories before analysis, this data type can be easily condensed to a more compact form. That is, the variables could record the number of "YES" responses, "NO" responses, etc. The summary may also record the number for combinations of categorical conditions such as the number of observations with "YES", 1, and "N". This form will be refered to as count data. The following data file example illustrates this: VAR1 VAR2 VAR3 COUNT YES 1 Y 34 NO 1 Y 23 YES 2 Y 12 NO 2 Y 16 It is possible to get SAS to do this summarization for you. Given a binary data form, several SAS procedures can create a summarized SAS data set (counts). As an illustration, the following SAS code summarizes a binary data file using the MEANS Procedure. Other Procedures that could be used are SUMMARY, FREQ and UNIVARIATE. 3/5

Example. DATA BINARY; INPUT SUBJECT RESPONSE $ @@; CARDS; 1 Y 2 N 3 N 4 Y 5 N 6 N 7 Y 8 Y 9 Y 10 Y 11 Y 12 N 13 N 14 Y 15 Y 16 N; PROC SORT DATA=BINARY; BY RESPONSE; PROC MEANS NOPRINT DATA=BINARY; VAR SUBJECT; OUTPUT OUT=COUNTS N=COUNT; BY RESPONSE; PROC PRINT DATA=COUNTS; VAR RESPONSE COUNT; ---------------------------------------------------------------- Output. OBS RESPONSE COUNT 1 N 7 2 Y 9 4/5

Categorical Structures A basic structure for categorical data is the one-way frequency, i.e.: only one variable is examined at a time. Again using an example from above, a one-way frequency could be counts from the different crops. If two categorical variables have been recorded, the cross classification is called a two-way contingency table. This could be a 3 x 3 table of crop types by pesticide levels. More complex structures involving three or more categorical variables are refered to as multidimensional distributions. A cross classification of crop type by pesticide level by irrigation method would be an example. 5/5