DATA CLEANING: LONGITUDINAL STUDY CROSS-VISIT CHECKS
|
|
|
- Georgia Watkins
- 9 years ago
- Views:
Transcription
1 Paper ABSTRACT DATA CLEANING: LONGITUDINAL STUDY CROSS-VISIT CHECKS Lauren Parlett,PhD Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD Cross-visit checks are a vital part of data cleaning for longitudinal studies. The nature of longitudinal studies encourages repeatedly collecting the same information. Sometimes, these variables are expected to remain static, go away, increase, or decrease over time. This presentation reviews the naïve and the better approaches at handling one-variable and two-variable consistency checks. For a single-variable check, the better approach features the new ALLCOMB function, introduced in SAS 9.2. For a two-variable check, the better approach uses a BY PROCESSING variable to flag inconsistencies. This paper will provide you the tools to enhance your longitudinal data cleaning process. INTRODUCTION One of my first tasks as a data analyst was to "clean" the data collected in a longitudinal study that began long before I was hired. Researchers of the longitudinal study collected regular data on the subjects on a yearly basis. The same forms would be used for each visit. Not only did the collected data have to make sense across forms, but also the data had to make sense over time. For example, if the subject had indicated at his first visit that his mother was dead, the data should not show her alive at a later visit. That sort of inconsistency had to be flagged for further investigation. My predecessor had adequately written code for the cross visit checks; however, I could not shake the feeling that the code could be more efficient and more dynamic. In this paper I will provide examples of cross visit checks for one and two variable scenarios. For each, I will go through my initial naïve approach and then come at the problem from a more sophisticated approach. Since the ALLCOMB function is an important part of these methods, SAS 9.2 or higher is required. SETTING UP YOUR DATA The data set used in this paper is completely artificial in order to illustrate concepts and possible pitfalls. This data is for a pretend study about the association between vision and stroke over time. Presented here are 5 individuals (ID) who each had 4 visits (Visit). At each visit there were four pieces of information collected: whether the person accompanying them -- their informant-- was new (NewInfrmnt), the date of birth of the informant (InfrmntDOB), whether the subject's vision was normal (NormVision), and whether a stroke had occurred in the past (StrkHist). Missing or unknown values are denoted by the value 99. Otherwise, variable coding is standard. DATA mydata; INPUT ID Visit NewInfrmnt InfrmntDOB MMDDYY10. NormVision StrkHist; DATALINES; /13/ /13/ /22/ /22/ /04/ /04/ /04/ /04/ /14/ /14/ /14/ /14/ /21/ /16/ /16/
2 /16/ /08/ /09/ /09/ /09/ ; The first step in setting up the data is to convert the missing code to a missing value. Then transpose it so that there is one observation per subject such that each row contains all of the variable values across the visits. Only one variable will be transposed at a time. We will start with vision. DATA mydata (DROP=i); SET mydata; ARRAY vars {*} _NUMERIC_; DO i=1 to HBOUND(vars); IF vars{i} = 99 THEN vars{i}=.; PROC SORT data=mydata; BY ID; PROC TRANSPOSE data=mydata PREFIX=v OUT=vision_trans(DROP=_NAME_); BY ID; *Each subject gets a row; ID Visit; *Names each new variable according to visit number; VAR Normvision; After running that code, the mydata dataset will look like this: ID v1 v2 v3 v Output 1. Study Data after Transposing by Visit Number Each subject is on one row. The vision values for each visit are in separate columns. Now we are ready to begin checking across visits. ONE-VARIABLE CHECK NAÏVE APPROACH My first approach to the problem was, admittedly, simplistic, though it did produce the expected results. Someone could present with normal vision and then subsequently have poor vision, but not vice versa. My approach was to type out logic code (IF/THEN statements) to compare each visit. Ultimately, the code should have identified IDs 10 and 11 for further review. My code looked like this: DATA vision_check; *Compare against Visit 1; IF v1=0 AND v2=1 THEN flag=1; IF v1=0 AND v3=1 THEN flag=1; 2
3 IF v1=0 AND v4=1 THEN flag=1; *Compare against Visit 2; IF v2=0 AND v3=1 THEN flag=1; IF v2=0 AND v4=1 THEN flag=1; *Compare against Visit 3; IF v3=0 AND v4=1 THEN flag=1; The output dataset vision_check: Output 2. One-Variable Naive Approach: Dataset after Flagging Discrepancies Using IF/THEN Statements Nothing wrong there! But things could get a little hairy if the study ended up lasting another 2 years. How many lines of logic code would that be for this one variable? You can calculate on this own using the formula for permutations where n = the total number of visits and r= the number of visits rearranged at one time. n! P(n, r) = (n r)! For example, four visits with two visits evaluated at a time would require 12 lines of IF/THEN code. P(4,2) = 4! ( ) = = 12 (4 2)! (2 1) Six visits with two comparisons would require 30 lines. Seven visits with two comparisons? 42 lines. That is a lot of hand coding. And doing that for each variable would take a very long time. There should be an easier way. That is when I tried to use an array and do loop to cut down on the number of lines. DATA vision_checkv2 (DROP=i); ARRAY visit {*} v:; DO i=2 TO HBOUND(visit); IF visit{i-1} > visit {i} THEN flag=1; That was a lot faster than coding 6 or 30 logic lines. But, there was something wrong with the vision_checkv2 dataset Output 3. One-Variable Naive Approach: Dataset after Flagging Discrepancies Using Array and Do Loop The code caught a missing value for subject 12. In SAS, a missing value (.) is considered less than 1; therefore, that observation was flagged as having a discrepancy where one did not exist. I tried again with another condition on the comparison. If either value is missing, the code will skip to the next number of the do loop. 3
4 DATA vision_checkv3 (DROP = i); ARRAY visit {*} v:; DO i=2 TO HBOUND(visit); IF visit{i} =. OR visit{i-1}=. THEN CONTINUE; IF visit{i} > visit {i-1} THEN flag=1; And what do we get for vision_checkv3? Output 4. One-Variable Naive Approach: Dataset after Flagging Discrepancies Using Array and Do Loop, Controlling for Missing Values That's not right either! Subject 11 is now gone because the code could not compare the second and fourth visits where there was a discrepancy. BETTER APPROACH The answer involves a rarely-used SAS function ALLCOMB that is available in SAS 9.2 and later. The purpose of the ALLCOMB function is to generate combinations of variables in a specific order. Order matters in our checks (we must check earlier visit versus a later visit and in that order), so that is why we use ALLCOMB rather than ALLPERM. This seems at odds with our first naïve approach where permutations dictated the number of lines, but that was because we could interchange the order of the logic statements without a change in the outcome. That is not the case with this one-variable approach. ALLCOMB works on elements in an array. The function parameters are the count (or row), the number of chosen elements, and the source of the elements. The result of the ALLCOMB function is a matrix of all possible combinations in a sorted order. The sorted order algorithm will output the same matrix every time it is run. The process for the code is summarized below. 1. Set the data step with the transposed vision dataset. 2. Put the visits into an explicit array (visit). 3. Calculate the number of visits in the array (n). 4. Declare the number of visits to evaluate at one time (k). 5. Calculate the number of times the DO LOOP will have to run through the rows of the ALLCOMB matrix (ncomb). 6. Set up a DO LOOP where that will run ncomb times to compare k visits at a time. 7. Call the ALLCOMB function to interchange two visits and select a certain row (i). 8. Write a logic statement comparing the early and later visits, accounting for missing values. Here is the code for the one-variable approach: DATA vision_onevar (DROP = n k ncomb i); ARRAY visit {*} v:; n = dim(visit); k = 2; ncomb = comb(n,k); DO i=1 TO ncomb; CALL allcomb(i, k, of visit[*]); IF visit{1} < visit{2} AND visit[1]^=. THEN flag=1; 4
5 The resulting vision_onevar dataset: Output 5. One-Variable Better Approach: Dataset after Flagging Discrepancies Using ALLCOMB Function It flagged the proper IDs, just like the first naïve way, but this code can accommodate many additional visits without adding lines to the data step. Note that the values are no longer with the right visits. They are in reverse order. That can be fixed by copying the visits values to temporary variables before the do loop and then restoring them after the DO loop. There is code in the two-variable better approach that can serve as your template for this. Finally, the less than operator can be changed to greater than or not equal to depending on your particular coding scheme and logic needs. TWO-VARIABLE CHECK What if your consistency check depends on two variables rather than one? The value of one variable is allowed to change if another variable for that same visit also changes? That is where the two-variable approach comes in. NAÏVE APPROACH Our example involves the variables NewInfrmnt and InfrmntDOB. The date of birth of the informant should not change unless the informant is new (NewInfrmnt=1). According to this logic, subjects 10 and 14 should be flagged for further review. My first instinct was to expand the better code from the one-variable check where the ALLCOMB function is used. First, I transposed the two variables and added the prefixes "new" and "dob" to their new columns. Next, I merged the resulting datasets. Before going into the comparison, I stored the original values in temporary variables since the ALLCOMB function ultimately reverses the values. Then, two ALLCOMB function calls are employed rather than just one. The code: DATA inft_check (DROP= n k ncomb i j m temp:); MERGE inft_trans inft_trans2; BY ID; ARRAY new {*} new:; ARRAY dob {*} dob:; ARRAY temp1 {4}; ARRAY temp2 {4}; DO i=1 to HBOUND(new); temp1{i} = new{i}; temp2{i} = dob{i}; n = dim(new); k = 2; ncomb = comb(n,k); DO j=1 TO ncomb; CALL allcomb(j, k, of new[*]); CALL allcomb(j, k, of dob{*}); IF dob{1} ^= dob{2} AND new[2]=0 AND dob{1}^=. AND dob{2}^=. THEN flag=1; 5
6 DO m=1 TO HBOUND(new); new{m} = temp1{m}; dob{m} = temp2{m}; FORMAT dob1-dob4 MMDDYY9.; The resulting inft_check dataset: ID new1 new2 new3 new4 dob1 dob2 dob3 dob /13/70 01/13/70 02/22/68 02/22/ /21/66 06/16/80 06/16/80 06/16/ /08/72 07/09/72 07/09/72 07/09/72 Output 6. Two-Variable Naive Approach: Dataset after Flagging Discrepancies Using Two ALLCOMB Functions That's not right. What went wrong? It turns out that the code compared visit 1 and visit 3 for subject 13 and found that a new informant was not specified even though the dates of birth were different. So, it flagged subject 13. However, the new informant WAS specified for visit 2, so subsequent visits had no need to indicate a new informant was present. The two ALLCOMB calls approach was not the way to go. BETTER APPROACH A more elegant solution requires no function calls. Rather than using the transposed dataset, the two-variable approach makes use of the long dataset. Keep that in mind if your data is structured as wide. The code: PROC SORT data=mydata; BY ID InfrmntDOB; DATA inft_twovar; SET mydata (KEEP= ID Visit InfrmntDOB NewInfrmnt); BY ID InfrmntDOB; IF first.infrmntdob and NewInfrmnt=0 THEN flag=1; FORMAT InfrmntDOB MMDDYY9.; This approach relies on a BY processing variable (first.infrmntdob). When a dataset is SET with one or more BY variables, the processing makes first. and last. variables at the beginning and end of each group. In this better approach, if the observation is the first of a group of date of births for a subject, first.informntdob is equal to 1. If, in that same observation, there is no new informant declared (NewInfrmt=0), then the observation is flagged as having a discrepancy. The resulting inft_twovar dataset: New Infrmnt ID Visit Infrmnt DOB flag /22/ /09/72 1 Output 7. Two-Variable Better Approach: Dataset after Flagging Discrepancies Using BY Variables The dataset has flagged the proper IDs. It shows you the visit where the error occurred. Keep in mind that this method 6
7 will not flag visits where there is a new date of birth and the new informant variable is missing. Those should be flagged and hand-checked using other code. CONCLUSION Checking for cross-visit consistency is an important part of data cleaning. By doing this, you can catch typos, inconsistent assessments, and more. Sometimes the naïve ways of coding work, such as the first naïve way I tried in the one variable example. Unfortunately, those solutions are not always dynamic and can balloon when additional follow up visits are added to the study. Since longitudinal data collection (or any data collection!) is bound to have missing or unknown values, using the newly offered-- though rarely talked about-- function ALLCOMB is essential for checking more than just one visit versus the next visit. When examining two variables, using the BY statement in a DATA step and the resulting BY processing variables (FIRST. or LAST.) is a good way to flag discrepancies when one value relies on another. The examples provided in this paper can be scaled up and supplemented according to your specific data cleaning needs. ACKNOWLEDGEMENTS This work was completed while working on the Johns Hopkins University BIOCARD study. Funding for the BIOCARD study is provided by the National Institute of Aging, NIH, grant U01AG Special thanks to the SAS Support Community for providing ideas about the two-variable cross visit check. Also, thank you to Joseph Guido for reviewing this paper. CONTACT INFORMATION Your comments are questions are valued and encouraged. Contact the author at: Lauren Parlett, PhD Johns Hopkins University Bloomberg School of Public Health Department of Epidemiology 615 N. Wolfe St. Rm W6024 Baltimore, MD (410) (phone) (410) (fax) SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7
Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.
Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although
Using the Magical Keyword "INTO:" in PROC SQL
Using the Magical Keyword "INTO:" in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts, Boston, Massachusetts Abstract INTO: host-variable in PROC SQL is a powerful tool. It simplifies
Taming the PROC TRANSPOSE
Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious
From The Little SAS Book, Fifth Edition. Full book available for purchase here.
From The Little SAS Book, Fifth Edition. Full book available for purchase here. Acknowledgments ix Introducing SAS Software About This Book xi What s New xiv x Chapter 1 Getting Started Using SAS Software
Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1
Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1 Institute for Health Care Research and Improvement, Baylor Health Care System 2 University of North
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE
WRITING PROOFS. Christopher Heil Georgia Institute of Technology
WRITING PROOFS Christopher Heil Georgia Institute of Technology A theorem is just a statement of fact A proof of the theorem is a logical explanation of why the theorem is true Many theorems have this
Beyond the Simple SAS Merge. Vanessa L. Cox, MS 1,2, and Kimberly A. Wildes, DrPH, MA, LPC, NCC 3. Cancer Center, Houston, TX. vlcox@mdanderson.
Beyond the Simple SAS Merge Vanessa L. Cox, MS 1,2, and Kimberly A. Wildes, DrPH, MA, LPC, NCC 3 1 General Internal Medicine and Ambulatory Treatment, The University of Texas MD Anderson Cancer Center,
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there
Parameter Passing in Pascal
Parameter Passing in Pascal Mordechai Ben-Ari Department of Science Teaching Weizmann Institute of Science Rehovot 76100 Israel [email protected] Abstract This paper claims that reference parameters
IBM SPSS Data Preparation 22
IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release
LOCF-Different Approaches, Same Results Using LAG Function, RETAIN Statement, and ARRAY Facility Iuliana Barbalau, ClinOps LLC. San Francisco, CA.
LOCF-Different Approaches, Same Results Using LAG Function, RETAIN Statement, and ARRAY Facility Iuliana Barbalau, ClinOps LLC. San Francisco, CA. ABSTRACT LOCF stands for Last Observation Carried Forward
USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1
USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1 There may be times when you run a SAS procedure when you would like to direct the procedure output to a SAS data
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC ABSTRACT One of the most expensive and time-consuming aspects of data management
Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server
Paper 8801-2016 Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server Nicole E. Jagusztyn, Hillsborough Community College ABSTRACT At a community
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
SAS and Clinical IVRS: Beyond Schedule Creation Gayle Flynn, Cenduit, Durham, NC
Paper SD-001 SAS and Clinical IVRS: Beyond Schedule Creation Gayle Flynn, Cenduit, Durham, NC ABSTRACT SAS is the preferred method for generating randomization and kit schedules used in clinical trials.
Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in
Commission Formula. Value If True Parameter Value If False Parameter. Logical Test Parameter
Excel Review This review uses material and questions from Practice Excel Exam 1 found on the Lab Exam 2 Study Guide webpage. Print out a copy of Practice Excel Exam 1. Download the Practice Excel Exam
Dataframes. Lecture 8. Nicholas Christian BIOST 2094 Spring 2011
Dataframes Lecture 8 Nicholas Christian BIOST 2094 Spring 2011 Outline 1. Importing and exporting data 2. Tools for preparing and cleaning datasets Sorting Duplicates First entry Merging Reshaping Missing
Transforming SAS Data Sets Using Arrays. Introduction
Transforming SAS Data Sets Using Arrays Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ Introduction This paper describes how to efficiently transform SAS data sets using arrays.
Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA
Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces
Chapter 16: Follow-up
Chapter 16: Follow-up The term follow-up refers to the process whereby a registry continues to monitor the status of a patient's health at periodic intervals. Data fields concerning patient vital status,
Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN
Paper S1-08-2013 Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN ABSTRACT Are you still using TRIM, LEFT, and vertical bar operators
Changing the Shape of Your Data: PROC TRANSPOSE vs. Arrays
Changing the Shape of Your Data: PROC TRANSPOSE vs. Arrays Bob Virgile Robert Virgile Associates, Inc. Overview To transpose your data (turning variables into observations or turning observations into
Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT
Paper AD01 Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT ABSTRACT The use of EXCEL spreadsheets is very common in SAS applications,
Anyone Can Learn PROC TABULATE
Paper 60-27 Anyone Can Learn PROC TABULATE Lauren Haworth, Genentech, Inc., South San Francisco, CA ABSTRACT SAS Software provides hundreds of ways you can analyze your data. You can use the DATA step
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA
WUSS2015 Paper 84 Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA ABSTRACT Creating your own SAS application to perform CDISC
Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY
Paper FF-007 Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY ABSTRACT SAS datasets include labels as optional variable attributes in the descriptor
... ... Automate your forms assembly process with PlanetPress Suite. Go Beyond Printing :::
Transactional and variable content document printing, output management and automated delivery. FOR INSURANCE Automate your forms assembly process with Suite Decrease your operational costs Increase your
2 SYSTEM DESCRIPTION TECHNIQUES
2 SYSTEM DESCRIPTION TECHNIQUES 2.1 INTRODUCTION Graphical representation of any process is always better and more meaningful than its representation in words. Moreover, it is very difficult to arrange
PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays
Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays, continued SESUG 2012 PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays William E Benjamin
1.2 Solving a System of Linear Equations
1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables
The Query Builder: The Swiss Army Knife of SAS Enterprise Guide
Paper 1557-2014 The Query Builder: The Swiss Army Knife of SAS Enterprise Guide ABSTRACT Jennifer First-Kluge and Steven First, Systems Seminar Consultants, Inc. The SAS Enterprise Guide Query Builder
NUMBER SYSTEMS APPENDIX D. You will learn about the following in this appendix:
APPENDIX D NUMBER SYSTEMS You will learn about the following in this appendix: The four important number systems in computing binary, octal, decimal, and hexadecimal. A number system converter program
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
Laboratory Assignments of OBJECT ORIENTED METHODOLOGY & PROGRAMMING (USING C++) [IT 553]
Laboratory Assignments of OBJECT ORIENTED METHODOLOGY & PROGRAMMING (USING C++) [IT 553] Books: Text Book: 1. Bjarne Stroustrup, The C++ Programming Language, Addison Wesley 2. Robert Lafore, Object-Oriented
B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);
SAS-INTERVIEW QUESTIONS 1. What SAS statements would you code to read an external raw data file to a DATA step? Ans: Infile and Input statements are used to read external raw data file to a Data Step.
6 3 4 9 = 6 10 + 3 10 + 4 10 + 9 10
Lesson The Binary Number System. Why Binary? The number system that you are familiar with, that you use every day, is the decimal number system, also commonly referred to as the base- system. When you
Pseudo code Tutorial and Exercises Teacher s Version
Pseudo code Tutorial and Exercises Teacher s Version Pseudo-code is an informal way to express the design of a computer program or an algorithm in 1.45. The aim is to get the idea quickly and also easy
Alex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC
Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.
Subsetting Observations from Large SAS Data Sets
Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 STATISTICIANS WOULD DO WELL TO USE DATA FLOW DIAGRAMS
Proceedings of the Annual Meeting of the American Association, August 5-9, 2001 STATISTICIANS WOULD DO WELL TO USE DATA FLOW DIAGRAMS Mark A. Martin Bayer Diagnostics, 333 Coney Street, East Walpole MA
Binary Adders: Half Adders and Full Adders
Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order
PharmaSUG 2013 - Paper AD08
PharmaSUG 2013 - Paper AD08 Just Press the Button Generation of SAS Code to Create Analysis Datasets directly from an SAP Can it be Done? Endri Endri, Berlin, Germany Rowland Hale, inventiv Health Clinical,
Gantt Chart for Excel
Project Management Templates for Excel http://spreadsheetml.com/projectmanagement/ganttchart.shtml Copyright (c) 2009-2010, ConnectCode All Rights Reserved. ConnectCode accepts no responsibility for any
EXST SAS Lab Lab #4: Data input and dataset modifications
EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will
Guide to Performance and Tuning: Query Performance and Sampled Selectivity
Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled
A Process is Not Just a Flowchart (or a BPMN model)
A Process is Not Just a Flowchart (or a BPMN model) The two methods of representing process designs that I see most frequently are process drawings (typically in Microsoft Visio) and BPMN models (and often
Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:
Section 1.2: Row Reduction and Echelon Forms Echelon form (or row echelon form): 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry (i.e. left most nonzero entry) of a row is in
SSN validation Virtually at no cost Milorad Stojanovic RTI International Education Surveys Division RTP, North Carolina
Paper PO23 SSN validation Virtually at no cost Milorad Stojanovic RTI International Education Surveys Division RTP, North Carolina ABSTRACT Using SSNs without validation is not the way to ensure quality
SUGI 29 Posters. Mazen Abdellatif, M.S., Hines VA CSPCC, Hines IL, 60141, USA
A SAS Macro for Generating Randomization Lists in Clinical Trials Using Permuted Blocks Randomization Mazen Abdellatif, M.S., Hines VA CSPCC, Hines IL, 60141, USA ABSTRACT We developed a SAS [1] macro
AP Computer Science Java Mr. Clausen Program 9A, 9B
AP Computer Science Java Mr. Clausen Program 9A, 9B PROGRAM 9A I m_sort_of_searching (20 points now, 60 points when all parts are finished) The purpose of this project is to set up a program that will
Using SAS to Build Customer Level Datasets for Predictive Modeling Scott Shockley, Cox Communications, New Orleans, Louisiana
Using SAS to Build Customer Level Datasets for Predictive Modeling Scott Shockley, Cox Communications, New Orleans, Louisiana ABSTRACT If you are using operational data to build datasets at the customer
Solving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.
Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application
Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA
PharmaSUG 2014 Paper CC07 Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA ABSTRACT Do you have imported data with line breaks and want to export the data to
Second Degree Price Discrimination - Examples 1
Second Degree Discrimination - Examples 1 Second Degree Discrimination and Tying Tying is when firms link the sale of two individual products. One classic example of tying is printers and ink refills.
Quick Start to Family Tree
Quick Start to Family Tree Family Tree is a new way to organize and record your genealogy online. It is free, is available to everyone, and provides an easy way to discover your place in history with free
COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
Lecture 2 Matrix Operations
Lecture 2 Matrix Operations transpose, sum & difference, scalar multiplication matrix multiplication, matrix-vector product matrix inverse 2 1 Matrix transpose transpose of m n matrix A, denoted A T or
Everything you wanted to know about MERGE but were afraid to ask
TS- 644 Janice Bloom Everything you wanted to know about MERGE but were afraid to ask So you have read the documentation in the SAS Language Reference for MERGE and it still does not make sense? Rest assured
Identifying Invalid Social Security Numbers
ABSTRACT Identifying Invalid Social Security Numbers Paulette Staum, Paul Waldron Consulting, West Nyack, NY Sally Dai, MDRC, New York, NY Do you need to check whether Social Security numbers (SSNs) are
Being Legal in the Czech Republic: One American s Bureaucratic Odyssey
February 2008 Being Legal in the Czech Republic: One American s Bureaucratic Odyssey Abstract: This article is the first-person testimony of an American citizen living in the Czech Republic for over ten
Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration
Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration This JavaScript lab (the last of the series) focuses on indexing, arrays, and iteration, but it also provides another context for practicing with
Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison
Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison Assumptions That you have used SAS at least a few times. It doesn t matter whether you run SAS in
Experiences in Using Academic Data for BI Dashboard Development
Paper RIV09 Experiences in Using Academic Data for BI Dashboard Development Evangeline Collado, University of Central Florida; Michelle Parente, University of Central Florida ABSTRACT Business Intelligence
SPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER
SPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER SAS White Paper Table of Contents Abstract.... 1 Background.... 1 Example 1: Telescope Company Monitors Revenue.... 3 Example
Simple File Input & Output
Simple File Input & Output Handout Eight Although we are looking at file I/O (Input/Output) rather late in this course, it is actually one of the most important features of any programming language. The
Adobe Acrobat 6.0 Professional
Adobe Acrobat 6.0 Professional Manual Adobe Acrobat 6.0 Professional Manual Purpose The will teach you to create, edit, save, and print PDF files. You will also learn some of Adobe s collaborative functions,
Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: [email protected]
96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad
White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.
White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access
Paper FF-014. Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL
Paper FF-014 Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL ABSTRACT Many companies are moving to SAS Enterprise Guide, often with just a Unix server. A surprising
Process Modeling. Chapter 6. (with additions by Yale Braunstein) Slide 1
Process Modeling Chapter 6 (with additions by Yale Braunstein) Slide 1 PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 John Wiley & Sons, Inc. All rights reserved.
5 Homogeneous systems
5 Homogeneous systems Definition: A homogeneous (ho-mo-jeen -i-us) system of linear algebraic equations is one in which all the numbers on the right hand side are equal to : a x +... + a n x n =.. a m
Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound
PharmaSUG 2014 - Paper AD21 Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound ABSTRACT Natalie Reynolds, Eli Lilly and Company, Indianapolis, IN Keith
Converting Electronic Medical Records Data into Practical Analysis Dataset
Converting Electronic Medical Records Data into Practical Analysis Dataset Irena S. Cenzer, University of California San Francisco, San Francisco, CA Mladen Stijacic, San Diego, CA See J. Lee, University
CHAPTER 2. Logic. 1. Logic Definitions. Notation: Variables are used to represent propositions. The most common variables used are p, q, and r.
CHAPTER 2 Logic 1. Logic Definitions 1.1. Propositions. Definition 1.1.1. A proposition is a declarative sentence that is either true (denoted either T or 1) or false (denoted either F or 0). Notation:
Step by Step Guide to Importing Genetic Data into JMP Genomics
Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one
WORKING WITH SUBQUERY IN THE SQL PROCEDURE
WORKING WITH SUBQUERY IN THE SQL PROCEDURE Lei Zhang, Domain Solutions Corp., Cambridge, MA Danbo Yi, Abt Associates, Cambridge, MA ABSTRACT Proc SQL is a major contribution to the SAS /BASE system. One
Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha
Algorithm & Flowchart & Pseudo code Staff Incharge: S.Sasirekha Computer Programming and Languages Computers work on a set of instructions called computer program, which clearly specify the ways to carry
Basics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional
BASIC TECHNIQUES IN USING EXCEL TO ANALYZE ASSESSMENT DATA
1 BASIC TECHNIQUES IN USING EXCEL TO ANALYZE ASSESSMENT DATA University of Hawai i at Mānoa 11/15/12 2 Mission: Improve Student Learning Through Program Assessment 1 3 Workshop outcomes By the end of this
Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD
NESUG 2012 Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD ABSTRACT PROC SQL is a powerful yet still overlooked tool within our SAS arsenal. PROC SQL can create tables, sort and summarize data,
ZIMBABWE SCHOOL EXAMINATIONS COUNCIL. COMPUTER STUDIES 7014/01 PAPER 1 Multiple Choice SPECIMEN PAPER
ZIMBABWE SCHOOL EXAMINATIONS COUNCIL General Certificate of Education Ordinary Level COMPUTER STUDIES 7014/01 PAPER 1 Multiple Choice SPECIMEN PAPER Candidates answer on the question paper Additional materials:
1. ERP integrators contacts. 2. Order export sample XML schema
Scandiweb Case study Dogstrust ERP integration Task description www.dogstrustproducts.com is a company that produces cosmetics for dogs. They have a lot of not very expensive products that they sell a
