Using SQL Joins to Perform Fuzzy Matches on Multiple Identifiers Jedediah J. Teres, MDRC, New York, NY
|
|
|
- Gwendoline Morrison
- 9 years ago
- Views:
Transcription
1 Using SQL Joins to Perform Fuzzy Matches on Multiple Identifiers Jedediah J. Teres, MDRC, New York, NY ABSTRACT Matching observations from different data sources is problematic without a reliable shared identifier. Using multiple identifiers can be more restrictive as it requires multiple exact matches. One way around this is to create a match score based on the number of matching identifiers. This score can be weighted to favor certain matches or sets of matches (e.g., first name and last name) over others (e.g., first name and date of birth). This paper builds on a previous paper describing a technique for creating and using a match score in the context of a SQL join to find matches in cases when all identifiers are not exactly the same and allows for the use of COMPGED to find close matches without requiring strict equality. INTRODUCTION This paper expands upon a previously described method of combining data sets using multiple variables as matching criteria in by using COMPGED to allow for fuzzy matching. Prior to SAS 9.2, using COMPGED in the context of a SQL JOIN produced a note to the log each time a character was compared to a blank space. With the release of SAS 9.2, this is no longer an issue, and COMPGED can be used to expand the flexibility of JOINS in SQL. Knowledge of the SELECT statement and JOINs in is assumed. Prior knowledge of COMPGED is helpful but not necessary. An understanding of the PUT function is needed. SAMPLE DATA SETS Two SAS data sets are used for illustration purposes in this paper: REF and CHK. Data set REF Obs FNAME LNAME SSN NINO DOB 1 John Baldwin BM567891E 1/3/ Robert Plant YZ912345H 8/20/ Jimmy Page HL234567B 1/9/ John Bonham YZ891234G 5/31/ Ray Davies HL456789D 6/21/ Dave Davies KR789123F 2/3/ Peter Quaife AA123456A 12/31/ Mick Avory HL345678C 2/15/1944 Data set REF contains 5 variables and 8 observations. Each observation represents a unique sample member; there are no duplicate observations in the data set REF. Data set CHK 1 John Jones BM567891E 1/3/1946 Led Zeppelin 2 Robert Plant YZ912235H 8/20/1948 Led Zeppelin 3 Jimmy Page HL234567D 1/9/1944 Led Zeppelin 4 John Bonham YZ891234G 5/31/1948 Led Zeppelin 5 Ray Davies HL456789D 6/21/1944 The Kinks 6 Dave Davies KR789213F 2/3/1947 The Kinks 7 Pete Quiafe AA132456A 12/31/1943 The Kinks 8 Mick Avery HL345678C 2/15/1944 The Kinks 9 Jay Davie HI636789D 6/12/1984 Blue Devils 10 Dave David KR789213F 2/3/1974 The Minks Data set CHK contains 6 variables and 10 observations. Each observation represents a unique individual; there are no duplicate observations in the data set CHK. The data set CHK contains the same variables as the data set REF, but there is an additional variable, BAND. The goal is to merge the data set CHK to the data set REF in order to pick up the BAND variable for the people in the sample. 1
2 Using AN INNER JOIN TO COMBINE DATA SETS In its most basic form, the syntax for combing data sets using an inner join in is as straightforward as a match-merge in a DATA step. Here, the data sets REF and CHK are joined using the National Insurance Number field (NINO) as the common identifier. create table inner_join1 as on (ref.nino eq chk.nino) The resulting data set contains only 4 observations. The original data set REF had 8 observations, so 3 observations failed to return a match. 1 John Baldwin BM567891E 1/3/1946 Led Zeppelin 2 John Bonham YZ891234G 5/31/1948 Led Zeppelin 3 Mick Avory HL345678C 2/15/1944 The Kinks 4 Ray Davies HL456789D 6/21/1944 The Kinks Upon further investigation, it becomes clear that several NINOs do not match. John Baldwin BM567891E John Jones BM567891E Robert Plant YZ912345H Robert Plant YZ912235H Jimmy Page HL234567B Jimmy Page HL234567D John Bonham YZ891234G John Bonham YZ891234G Ray Davies HL456789D Ray Davies HL456789D Dave Davies KR789123F Dave Davies KR789213F Peter Quaife AA123456A Pete Quiafe AA132456A Mick Avory HL345678C Mick Avery HL345678C These are all fairly minor differences consisting mostly of the transposition of two numbers or the errant substitution of a letter. Each represents a possible data entry error. With a data set this small, values could be edited so they match on both files. In a larger data set, that would not be a viable solution, so it's well worth considering what other options are available. COMPGED Since NINO is a character variable, we can use the COMPGED function to compute the generalized edit distance between the two character values. The SAS documentation on the COMPGED function states that Generalized edit distance is a generalization of Levenshtein edit distance, which is a measure of dissimilarity between two strings. The finer points of COMPGED are beyond the scope of this paper, but the general syntax takes the form COMPGED(string-1, string-2 <,cutoff> <,modifiers>) The COMPGED function returns a value based on the difference between the two character strings. The default values for minor transpositions and substitutions tend to be 100 or less. Using the COMPGED function as part of the criteria in the JOIN would allow for greater flexibility and not require strict equality, making fuzzier matches possible. Technically, we are still matching on a binary outcome whether COMPGED returns a value less than 100 or not. It s the method of calculating that value that allows for some fuzziness. create table inner_join2a as on (compged(ref.nino,chk.nino) le 100) Note that in order to use COMPGED, we must specify the tolerance that is, what is the maximum level of dissimilarity between two values of NINO that is acceptable? By specifying that the value returned by the 2
3 COMPGED function must be less than or equal to 100, we are allowing for only very slight differences. COMPGED will return 0 if two strings are equal, so we are allowing strict equality as well. The resulting data set has 8 observations, but all is not well. Robert Plant from Led Zeppelin is missing, and Dave Davies returned a match for someone in a band called The Minks. 1 John Baldwin BM567891E 1/3/1946 Led Zeppelin 2 John Bonham YZ891234G 5/31/1948 Led Zeppelin 3 Jimmy Page HL234567B 1/9/1944 Led Zeppelin 4 Mick Avory HL345678C 2/15/1944 The Kinks 5 Dave Davies KR789123F 2/3/1947 The Kinks 6 Ray Davies HL456789D 6/21/1944 The Kinks 7 Peter Quaife AA123456A 12/31/1943 The Kinks 8 Dave Davies KR789123F 2/3/1947 The Minks The members of these bands have been assigned both British National Insurance Numbers (NINOs) and American Social Security Numbers (SSNs) for illustration purposes. While we were unable to match Robert Plant on his NINO, perhaps looking for fuzzy matches on SSN would return the desired match. The following code: create table inner_join2b as on (compged(ref.ssn,chk.ssn) le 100) Returns the following error message: ERROR: Function COMPGED requires a character expression as argument 1. ERROR: Function COMPGED requires a character expression as argument 2. ERROR: Expression using less than or equal (<=) has components that are of different data types. COMPGED requires character expressions, or strings. The PUT function returns a string value, and can be nested in the COMPGED function. The following code: create table inner_join2c as on (compged(put(ref.ssn,9.),put(chk.ssn,9.)) le 100) Creates the following data set: 1 Ray Davies HL456789D 6/21/1944 Blue Devils 2 John Baldwin BM567891E 1/3/1946 Led Zeppelin 3 John Bonham YZ891234G 5/31/1948 Led Zeppelin 4 Jimmy Page HL234567B 1/9/1944 Led Zeppelin 5 Robert Plant YZ912345H 8/20/1948 Led Zeppelin 6 Mick Avory HL345678C 2/15/1944 The Kinks 7 Dave Davies KR789123F 2/3/1947 The Kinks 8 Ray Davies HL456789D 6/21/1944 The Kinks 9 Peter Quaife AA123456A 12/31/1943 The Kinks 10 Dave Davies KR789123F 2/3/1947 The Minks 3
4 We have successfully rescued Robert Plant from limbo but at the cost of exacerbating the over-match issue from the previous JOIN. Now we have Dave Davies from The Minks as well as Ray Davies from Blue Devils. Clearly, we must refine the match criteria. USING A WEIGHTED MATCH SCORE Until now, we have allowed for fuzzy matches, but we have treated them the same as exact matches. What would happen if we assigned more weight to exact matches while still allowing for fuzzy matches? The code below creates a weighted match score (see Teres 2009) that gives twice as much weight to exact matches as fuzzy matches. This match score is then used to select the observation with the closest match as defined by the HAVING clause. The following query: proc sql ; create table inner_join3b as select ref.*, band, ((2*(ref.ssn eq chk.ssn)) + (compged(put(ref.ssn,9.),put(chk.ssn,9.)) le 100)) as wms on (compged(put(ref.ssn,9.),put(chk.ssn,9.)) le 100) group by ref.ssn having calculated wms eq max(calculated wms) Creates the following data set: wms 1 John Baldwin BM567891E 1/3/1946 Led Zeppelin 3 2 John Bonham YZ891234G 5/31/1948 Led Zeppelin 3 3 Jimmy Page HL234567B 1/9/1944 Led Zeppelin 1 4 Robert Plant YZ912345H 8/20/1948 Led Zeppelin 3 5 Mick Avory HL345678C 2/15/1944 The Kinks 1 6 Dave Davies KR789123F 2/3/1947 The Kinks 3 7 Ray Davies HL456789D 6/21/1944 The Kinks 3 8 Peter Quaife AA123456A 12/31/1943 The Kinks 1 At this point, we have met the goal of combining the two data sets. However, the weighted match score (WMS) takes on only 2 values, 1 and 3. It may be desirable to include all available identifying information in the creation of the match score to allow for greater variability in the quality of the matches. The following code expands the idea of including both the exact and fuzzy matches to all identifiers. Note the use of the nested PUT function with the FORMAT MMDDYY10 for DOB. The following query: create table inner_join5 as select ref.*, band, ((2*(ref.ssn eq chk.ssn)) + (compged(put(ref.ssn,9.),put(chk.ssn,9.)) le 100) + (2*(ref.nino eq chk.nino)) + (compged(ref.nino, chk.nino) le 100) + (2*(ref.fname eq chk.fname)) + (compged(ref.fname, chk.fname) le 100) + (2*(ref.lname eq chk.lname)) + (compged(ref.lname, chk.lname) le 100) + (2*(ref.dob eq chk.dob)) + (compged(put(ref.dob,mmddyy10.), put(chk.dob,mmddyy10.)) le 100)) as wms 4
5 on ( (compged(put(ref.ssn,9.),put(chk.ssn,9.)) le 100) or (compged(ref.nino, chk.nino) le 100) or (compged(ref.fname, chk.fname) le 100) or (compged(ref.lname, chk.lname) le 100) or (compged(put(ref.dob,mmddyy10.), put(chk.dob,mmddyy10.)) le 100) ) group by ref.ssn having ((calculated wms eq max(calculated wms))) order by band, ref.lname, ref.fname ; Produces the following data set: wms 1 John Baldwin BM567891E 1/3/1946 Led Zeppelin 12 2 John Bonham YZ891234G 5/31/1948 Led Zeppelin 15 3 Jimmy Page HL234567B 1/9/1944 Led Zeppelin 11 4 Robert Plant YZ912345H 8/20/1948 Led Zeppelin 12 5 Mick Avory HL345678C 2/15/1944 The Kinks 11 6 Dave Davies KR789123F 2/3/1947 The Kinks 13 7 Ray Davies HL456789D 6/21/1944 The Kinks 15 8 Peter Quaife AA123456A 12/31/1943 The Kinks 7 Here the match score is weighted so that it favors exact matches, assigning them twice the weight of fuzzier matches. It becomes much clearer that some matches are better than others. For example, Peter Quaife had a WMS value of 7 because while his entries in REF and CHK matched exactly on NINO and DOB, there were slight discrepancies in FNAME, LNAME, and SSN (each differed by one character). On the other hand, John Bonham and Ray Davies had much stronger matches, with WMS values of 15, indicating that they were exact matches. Because different identifiers might be more or less important, however, it s hard to say whether matches with weighted match scores of 15 are twice as good as a match with a value of 7. This approach could be taken even further. Matches on SSN, fuzzy or otherwise, could be given more weight than matches on first name, for example. COMPGED options can also be refined to change the penalties associated with certain types of mismatches between character expressions to best fit any given context. CONCLUSIONS The use of the COMPGED function in joins offers an incredible amount of flexibility when combining data sets. Allowing for fuzzy matches opens the door to over-matching a data set by including spurious matches, however. Creating a weighted match score to favor exact matches is a useful tool for refining results when combining data sets. REFERENCES SAS Institute Inc SAS 9.2 Language Reference: Dictionary, Fourth Edition. Cary, NC: SAS Institute Inc. Functions and Call Routines: COMPGED. Staum, Paulette (2007). Fuzzy Matching using the COMPGED Function. Proceedings of the 2007 NorthEast SAS Users Group Conference. Teres, Jedediah J (2009). Using SQL Joins to Perform Weighted Matches on Multiple Identifiers. Proceedings of the 2009 NorthEast SAS Users Group Conference. ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. The author wishes to thank Paulette Staum for her encouragement, and Chris Bost and Aaron Hill for reviewing this paper. 5
6 CONTACT INFORMATION Jedediah Teres MDRC 16 East 34 th St, 19 th Floor New York, NY (212) telephone (212) fax 6
Subsetting Observations from Large SAS Data Sets
Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined
Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC
Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently
Imelda C. Go, South Carolina Department of Education, Columbia, SC
PO 003 Matching SAS Data Sets with PROC SQL: If at First You Don t Succeed, Match, Match Again ABSTRACT Imelda C. Go, South Carolina Department of Education, Columbia, SC Two data sets are often matched
Taming the PROC TRANSPOSE
Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious
Introduction to SAS Mike Zdeb (402-6479, [email protected]) #122
Mike Zdeb (402-6479, [email protected]) #121 (11) COMBINING SAS DATA SETS There are a number of ways to combine SAS data sets: # concatenate - stack data sets (place one after another) # interleave - stack
Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC
Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make
More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board
More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make
A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC
A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC SESUG 2012 ABSTRACT The FEEDBACK option on the PROC SQL statement controls whether an expanded or transformed
Identifying Invalid Social Security Numbers
ABSTRACT Identifying Invalid Social Security Numbers Paulette Staum, Paul Waldron Consulting, West Nyack, NY Sally Dai, MDRC, New York, NY Do you need to check whether Social Security numbers (SSNs) are
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data
Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI
Paper #HW02 Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps into
Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI
Paper 191-27 AN INTRODUCTION TO PROC SQL Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI ABSTRACT PROC SQL is a powerful Base SAS 7 Procedure that combines the
Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY
Paper FF-007 Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY ABSTRACT SAS datasets include labels as optional variable attributes in the descriptor
9.1 SAS. SQL Query Window. User s Guide
SAS 9.1 SQL Query Window User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004. SAS 9.1 SQL Query Window User s Guide. Cary, NC: SAS Institute Inc. SAS
Performing Queries Using PROC SQL (1)
SAS SQL Contents Performing queries using PROC SQL Performing advanced queries using PROC SQL Combining tables horizontally using PROC SQL Combining tables vertically using PROC SQL 2 Performing Queries
CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases
3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables
Search and Replace in SAS Data Sets thru GUI
Search and Replace in SAS Data Sets thru GUI Edmond Cheng, Bureau of Labor Statistics, Washington, DC ABSTRACT In managing data with SAS /BASE software, performing a search and replace is not a straight
Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA
Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces
Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN
Paper S1-08-2013 Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN ABSTRACT Are you still using TRIM, LEFT, and vertical bar operators
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular
Tips to Use Character String Functions in Record Lookup
BSTRCT Tips to Use Character String Functions in Record Lookup njan Matlapudi Pharmacy Informatics, PerformRx, The Next Generation PBM, 200 Stevens Drive, Philadelphia, P 19113 This paper gives you a better
SQL Simple Queries. Chapter 3.1 V3.0. Copyright @ Napier University Dr Gordon Russell
SQL Simple Queries Chapter 3.1 V3.0 Copyright @ Napier University Dr Gordon Russell Introduction SQL is the Structured Query Language It is used to interact with the DBMS SQL can Create Schemas in the
Information Systems SQL. Nikolaj Popov
Information Systems SQL Nikolaj Popov Research Institute for Symbolic Computation Johannes Kepler University of Linz, Austria [email protected] Outline SQL Table Creation Populating and Modifying
Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC
Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD ABSTRACT This paper demonstrates important features of combining datasets in SAS. The facility to
Encoding the Password
SESUG 2012 Paper CT-28 Encoding the Password A low maintenance way to secure your data access Leanne Tang, National Agriculture Statistics Services USDA, Washington DC ABSTRACT When users access data in
Chapter 9, More SQL: Assertions, Views, and Programming Techniques
Chapter 9, More SQL: Assertions, Views, and Programming Techniques 9.2 Embedded SQL SQL statements can be embedded in a general purpose programming language, such as C, C++, COBOL,... 9.2.1 Retrieving
COMP 110 Prasun Dewan 1
COMP 110 Prasun Dewan 1 12. Conditionals Real-life algorithms seldom do the same thing each time they are executed. For instance, our plan for studying this chapter may be to read it in the park, if it
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there
MySQL for Beginners Ed 3
Oracle University Contact Us: 1.800.529.0165 MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database.
What You re Missing About Missing Values
Paper 1440-2014 What You re Missing About Missing Values Christopher J. Bost, MDRC, New York, NY ABSTRACT Do you know everything you need to know about missing values? Do you know how to assign a missing
Innovative Techniques and Tools to Detect Data Quality Problems
Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful
Reading Delimited Text Files into SAS 9 TS-673
Reading Delimited Text Files into SAS 9 TS-673 Reading Delimited Text Files into SAS 9 i Reading Delimited Text Files into SAS 9 Table of Contents Introduction... 1 Options Available for Reading Delimited
Using the SQL Procedure
Using the SQL Procedure Kirk Paul Lafler Software Intelligence Corporation Abstract The SQL procedure follows most of the guidelines established by the American National Standards Institute (ANSI). In
How to Create a Custom TracDat Report With the Ad Hoc Reporting Tool
TracDat Version 4 User Reference Guide Ad Hoc Reporting Tool This reference guide is intended for TracDat users with access to the Ad Hoc Reporting Tool. This reporting tool allows the user to create custom
Normalizing SAS Datasets Using User Define Formats
Normalizing SAS Datasets Using User Define Formats David D. Chapman, US Census Bureau, Washington, DC ABSTRACT Normalization is a database concept used to eliminate redundant data, increase computational
Alternatives to Merging SAS Data Sets But Be Careful
lternatives to Merging SS Data Sets ut e Careful Michael J. Wieczkowski, IMS HELTH, Plymouth Meeting, P bstract The MERGE statement in the SS programming language is a very useful tool in combining or
Effective Use of SQL in SAS Programming
INTRODUCTION Effective Use of SQL in SAS Programming Yi Zhao Merck & Co. Inc., Upper Gwynedd, Pennsylvania Structured Query Language (SQL) is a data manipulation tool of which many SAS programmers are
Oracle Database 12c: Introduction to SQL Ed 1.1
Oracle University Contact Us: 1.800.529.0165 Oracle Database 12c: Introduction to SQL Ed 1.1 Duration: 5 Days What you will learn This Oracle Database: Introduction to SQL training helps you write subqueries,
2Creating Reports: Basic Techniques. Chapter
2Chapter 2Creating Reports: Chapter Basic Techniques Just as you must first determine the appropriate connection type before accessing your data, you will also want to determine the report type best suited
Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD
NESUG 2012 Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD ABSTRACT PROC SQL is a powerful yet still overlooked tool within our SAS arsenal. PROC SQL can create tables, sort and summarize data,
Nine Steps to Get Started using SAS Macros
Paper 56-28 Nine Steps to Get Started using SAS Macros Jane Stroupe, SAS Institute, Chicago, IL ABSTRACT Have you ever heard your coworkers rave about macros? If so, you've probably wondered what all the
Fuzzy Matching in Audit Analytics. Grant Brodie, President, Arbutus Software
Fuzzy Matching in Audit Analytics Grant Brodie, President, Arbutus Software Outline What Is Fuzzy? Causes Effective Implementation Demonstration Application to Specific Products Q&A 2 Why Is Fuzzy Important?
Using SQL Queries in Crystal Reports
PPENDIX Using SQL Queries in Crystal Reports In this appendix Review of SQL Commands PDF 924 n Introduction to SQL PDF 924 PDF 924 ppendix Using SQL Queries in Crystal Reports The SQL Commands feature
Writing a complaint letter
Writing a complaint letter Writing a complaint letter General guidelines Who do I complain to? If you want to complain about a hospital or an ambulance service contact the Complaints Manager or the Chief
SUBQUERIES AND VIEWS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6
SUBQUERIES AND VIEWS CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6 String Comparisons and GROUP BY 2! Last time, introduced many advanced features of SQL, including GROUP BY! Recall:
EXST SAS Lab Lab #4: Data input and dataset modifications
EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will
Chapter 1 Overview of the SQL Procedure
Chapter 1 Overview of the SQL Procedure 1.1 Features of PROC SQL...1-3 1.2 Selecting Columns and Rows...1-6 1.3 Presenting and Summarizing Data...1-17 1.4 Joining Tables...1-27 1-2 Chapter 1 Overview of
The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL
The SET Statement and Beyond: Uses and Abuses of the SET Statement S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT The SET statement is one of the most frequently used statements in the SAS System.
Table Lookups: From IF-THEN to Key-Indexing
Paper 158-26 Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine
PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL
PharmaSUG 2015 - Paper QT06 PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL ABSTRACT Inspired by Christianna William s paper on
Advanced Query for Query Developers
for Developers This is a training guide to step you through the advanced functions of in NUFinancials. is an ad-hoc reporting tool that allows you to retrieve data that is stored in the NUFinancials application.
Technical Paper. Reading Delimited Text Files into SAS 9
Technical Paper Reading Delimited Text Files into SAS 9 Release Information Content Version: 1.1July 2015 (This paper replaces TS-673 released in 2009.) Trademarks and Patents SAS Institute Inc., SAS Campus
CSC 443 Data Base Management Systems. Basic SQL
CSC 443 Data Base Management Systems Lecture 6 SQL As A Data Definition Language Basic SQL SQL language Considered one of the major reasons for the commercial success of relational databases SQL Structured
Workflow Solutions for Very Large Workspaces
Workflow Solutions for Very Large Workspaces February 3, 2016 - Version 9 & 9.1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
A s h o r t g u i d e t o s ta n d A r d i s e d t e s t s
A short guide to standardised tests Copyright 2013 GL Assessment Limited Published by GL Assessment Limited 389 Chiswick High Road, 9th Floor East, London W4 4AL www.gl-assessment.co.uk GL Assessment is
You have got SASMAIL!
You have got SASMAIL! Rajbir Chadha, Cognizant Technology Solutions, Wilmington, DE ABSTRACT As SAS software programs become complex, processing times increase. Sitting in front of the computer, waiting
Programming Idioms Using the SET Statement
Programming Idioms Using the SET Statement Jack E. Fuller, Trilogy Consulting Corporation, Kalamazoo, MI ABSTRACT While virtually every programmer of base SAS uses the SET statement, surprisingly few programmers
MOC 20461C: Querying Microsoft SQL Server. Course Overview
MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server
Using the COMPUTE Block in PROC REPORT Jack Hamilton, Kaiser Foundation Health Plan, Oakland, California
Using the COMPUTE Block in PROC REPORT Jack Hamilton, Kaiser Foundation Health Plan, Oakland, California ABSTRACT COMPUTE blocks add a great deal of power to PROC REPORT by allowing programmatic changes
Storing and Using a List of Values in a Macro Variable
Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all unusual to need
DBF Chapter. Note to UNIX and OS/390 Users. Import/Export Facility CHAPTER 7
97 CHAPTER 7 DBF Chapter Note to UNIX and OS/390 Users 97 Import/Export Facility 97 Understanding DBF Essentials 98 DBF Files 98 DBF File Naming Conventions 99 DBF File Data Types 99 ACCESS Procedure Data
# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA
# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA ABSTRACT This paper introduces the ways of creating temporary tables in SQL Server, also uses some examples
Chapter 2 The Data Table. Chapter Table of Contents
Chapter 2 The Data Table Chapter Table of Contents Introduction... 21 Bringing in Data... 22 OpeningLocalFiles... 22 OpeningSASFiles... 27 UsingtheQueryWindow... 28 Modifying Tables... 31 Viewing and Editing
PharmaSUG 2013 - Paper AD08
PharmaSUG 2013 - Paper AD08 Just Press the Button Generation of SAS Code to Create Analysis Datasets directly from an SAP Can it be Done? Endri Endri, Berlin, Germany Rowland Hale, inventiv Health Clinical,
SAS Programming Tips, Tricks, and Techniques
SAS Programming Tips, Tricks, and Techniques A presentation by Kirk Paul Lafler Copyright 2001-2012 by Kirk Paul Lafler, Software Intelligence Corporation All rights reserved. SAS is the registered trademark
ADP Workforce Now V3.0
ADP Workforce Now V3.0 Manual What s New Checks in and Custom ADP Reporting Grids V12 Instructor Handout Manual Guide V10171180230WFN3 V09171280269ADPR12 2011 2012 ADP, Inc. ADP s Trademarks The ADP Logo
Improving Maintenance and Performance of SQL queries
PaperCC06 Improving Maintenance and Performance of SQL queries Bas van Bakel, OCS Consulting, Rosmalen, The Netherlands Rick Pagie, OCS Consulting, Rosmalen, The Netherlands ABSTRACT Almost all programmers
Applications Development ABSTRACT PROGRAM DESIGN INTRODUCTION SAS FEATURES USED
Checking and Tracking SAS Programs Using SAS Software Keith M. Gregg, Ph.D., SCIREX Corporation, Chicago, IL Yefim Gershteyn, Ph.D., SCIREX Corporation, Chicago, IL ABSTRACT Various checks on consistency
http://www.nwea.org/support/course/enrolling-test-term
Script: Enrolling for a Test Term: Creating Your Class Roster File and Special Programs File. This document is the script containing the narration text for the online training Enrolling for a Test Term:
Lab Manual. Databases. Microsoft Access. Peeking into Computer Science Access Lab manual
Lab Manual Databases Microsoft Access 1 Table of Contents Lab 1: Introduction to Microsoft Access... 3 Getting started... 3 Tables... 3 Primary Keys... 6 Field Properties... 7 Validation Rules... 11 Input
Antech Automated Services User Guide
Version 1.0 Antech Calibration Services 2005 Contents 1. Email..3 1.1 Certificate Retrieval 1.2 Requesting a collection 1.3 The status of your instruments 1.4 Quotation acceptance 1.5 Account updates 2.
CONVERTING FROM NETKEEPER ISAM v6.32 to NETKEEPER SQL
Download and install the NetKeeper ISAM to SQL converter program. To download the Converter program go to: http://www.netkeeper.com/other/nkuser1.htm Scroll down the page until you find a link that says:
Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries
Paper TU_09 Proc SQL Tips and Techniques - How to get the most out of your queries Kevin McGowan, Constella Group, Durham, NC Brian Spruell, Constella Group, Durham, NC Abstract: Proc SQL is a powerful
Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database
Outline SAS-seminar Proc SQL, the pass-through facility How to make your data processing someone else s problem What is SQL and what is a database Quick introduction to Proc SQL The pass-through facility
That Mysterious Colon (:) Haiping Luo, Dept. of Veterans Affairs, Washington, DC
Paper 73-26 That Mysterious Colon (:) Haiping Luo, Dept. of Veterans Affairs, Washington, DC ABSTRACT The colon (:) plays certain roles in SAS coding. Its usage, however, is not well documented nor is
Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC
A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull
Selecting Features by Attributes in ArcGIS Using the Query Builder
Helping Organizations Succeed with GIS www.junipergis.com Bend, OR 97702 Ph: 541-389-6225 Fax: 541-389-6263 Selecting Features by Attributes in ArcGIS Using the Query Builder ESRI provides an easy to use
Internet/Intranet, the Web & SAS. II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA
II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA Abstract Web based reporting has enhanced the ability of management to interface with data in a
Access Tutorial 2: Tables
Access Tutorial 2: Tables 2.1 Introduction: The importance of good table design Tables are where data in a database is stored; consequently, tables form the core of any database application. In addition
Configuring an Alternative Database for SAS Web Infrastructure Platform Services
Configuration Guide Configuring an Alternative Database for SAS Web Infrastructure Platform Services By default, SAS Web Infrastructure Platform Services is configured to use SAS Framework Data Server.
Fine Grained Auditing In Oracle 10G
Fine Grained Auditing In Oracle 10G Authored by: Meenakshi Srivastava ([email protected]) 2 Abstract The purpose of this document is to develop an understanding of Fine Grained Auditing(FGA)
SONA SYSTEMS RESEARCHER DOCUMENTATION
SONA SYSTEMS RESEARCHER DOCUMENTATION Introduction Sona Systems is used for the scheduling and management of research participants and the studies they participate in. Participants, researchers, principal
Database Query 1: SQL Basics
Database Query 1: SQL Basics CIS 3730 Designing and Managing Data J.G. Zheng Fall 2010 1 Overview Using Structured Query Language (SQL) to get the data you want from relational databases Learning basic
Time needed. Before the lesson Assessment task:
Formative Assessment Lesson Materials Alpha Version Beads Under the Cloud Mathematical goals This lesson unit is intended to help you assess how well students are able to identify patterns (both linear
STIR Education Micro-Innovations that raise results STUDENT ATTENDANCE SCANNER
STIR Education Micro-Innovations that raise results STUDENT ATTENDANCE SCANNER FOCUS: DISCILINE STUDENT ATTENDANCE SCANNER STIR EDUCATION At STIR Education, our belief is that the best way to improve the
Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC
Using SAS With a SQL Server Database M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC ABSTRACT Many operations now store data in relational databases. You may want to use SAS
Advanced Subqueries In PROC SQL
Advanced Subqueries In PROC SQL PROC SQL; SELECT STATE, AVG(SALES) AS AVGSALES FROM USSALES GROUP BY STATE HAVING AVG(SALES) > (SELECT AVG(SALES) FROM USSALES); QUIT; STATE AVGSALES --------------- IL
SAS Enterprise Guide A Quick Overview of Developing, Creating, and Successfully Delivering a Simple Project
Paper 156-29 SAS Enterprise Guide A Quick Overview of Developing, Creating, and Successfully Delivering a Simple Project Ajaz (A.J.) Farooqi, Walt Disney Parks and Resorts, Lake Buena Vista, FL ABSTRACT
