SSN validation Virtually at no cost Milorad Stojanovic RTI International Education Surveys Division RTP, North Carolina



Similar documents
Identifying Invalid Social Security Numbers

SOUTH TEXAS COLLEGE. Identity Theft Prevention Program and Guidelines. FTC Red Flags Rule

Remove Voided Claims for Insurance Data Qiling Shi

Remove Orphan Claims and Third party Claims for Insurance Data Qiling Shi, NCI Information Systems, Inc., Nashville, Tennessee

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison

A Macro to Create Data Definition Documents

Model Identity Theft Policy and Adopting Resolution

MCPHS IDENTITY THEFT POLICY

We begin by defining a few user-supplied parameters, to make the code transferable between various projects.

STATE OF NORTH CAROLINA

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina

Central Oregon Community College. Identity Theft Prevention Program

PharmaSUG Paper AD11

Order from Chaos: Using the Power of SAS to Transform Audit Trail Data Yun Mai, Susan Myers, Nanthini Ganapathi, Vorapranee Wickelgren

Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX

COUNCIL POLICY STATEMENT

Spotting ID Theft Red Flags A Guide for FACTA Compliance. An IDology, Inc. Whitepaper

SUGI 29 Posters. Mazen Abdellatif, M.S., Hines VA CSPCC, Hines IL, 60141, USA

THE UNIVERSITY OF NORTH CAROLINA AT GREENSBORO IDENTITY THEFT PREVENTION PROGRAM

A Guide to Benedictine College and Identity Theft

Project Request and Tracking Using SAS/IntrNet Software Steven Beakley, LabOne, Inc., Lenexa, Kansas

Red Flag Identity Theft Financial Policy 1.10

Search and Replace in SAS Data Sets thru GUI

Identity Theft Policy Created: June 10, 2009 Author: Financial Services and Information Technology Services Version: 1.0

An macro: Exploring metadata EG and user credentials in Linux to automate notifications Jason Baucom, Ateb Inc.

SOCIAL SECURITY NUMBER VERIFICATION BEST PRACTICES

An Approach to Creating Archives That Minimizes Storage Requirements

SAS Macros as File Management Utility Programs

SUGI 29 Coders' Corner

31-R-11 A RESOLUTION ADOPTING THE CITY OF EVANSTON IDENTITY PROTECTION POLICY. WHEREAS, The Fair and Accurate Credit Transactions Act of 2003,

Demand for Analysis-Ready Data Sets. An Introduction to Banking and Credit Card Analytics

Portfolio Construction with OPTMODEL

Building a Customized Data Entry System with SAS/IntrNet

Identity Protection and You. Customer Update

These rules became effective August 1, 2009, and require certain agencies to implement an identity theft program and policy.

ing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

CIBC Student Want to Win $5,000 Contest (the Contest )

IRS Data Retrieval Tool (DRT) Checklist

Florida International University. Identity Theft Prevention Program. Effective beginning August 1, 2009

SAS Programming Tips, Tricks, and Techniques

Identity Theft Prevention Program

Power Against Identity Theft

Post Processing Macro in Clinical Data Reporting Niraj J. Pandya

CDW DATA QUALITY INITIATIVE

SAS UNIX-Space Analyzer A handy tool for UNIX SAS Administrators Airaha Chelvakkanthan Manickam, Cognizant Technology Solutions, Teaneck, NJ

THE INTRICATE ORGANIZATION AND MANAGEMENT OF CLINICAL RESEARCH LABORATORY SAMPLES USING SAS/AF"

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

CHAPTER 12 IDENTITY PROTECTION AND IDENTITY THEFT PREVENTION POLICIES

Identity Theft and Fraud

The University of North Carolina at Charlotte Identity Theft Prevention Program

IDENTITY THEFT PREVENTION PROGRAM

Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT

RESOLUTION NO A RESOLUTION ADOPTING THE "CITY OF CHATTANOOGA IDENTITY THEFT POLICY" ATTACHED HERETO AND MADE A PART HEREOF BY REFERENCE.

Child Identity Theft. Warning Signs

Applications Development

APPENDIX B DATA BASE MATCHES AND MATCH FLAGS

THE LUTHERAN UNIVERSITY ASSOCIATION, INC. d/b/a Valparaiso University IDENTITY THEFT PREVENTION PROGRAM

If it sounds too good to be true it probably is! 90% of victims of consumer fraud never report it

Introduction to Criteria-based Deduplication of Records, continued SESUG 2012

University of Tennessee's Identity Theft Prevention Program

Session 16 Standard Student Identification Method

Standards for Social Security Numbers in CU*BASE

DMACC IDENTITY THEFT- RED FLAGS PROCEDURES

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL

LexisNexis InstantID. Technical White Paper. Analyzing Results. For the following: LexisNexis Bridger Insight XG. Contact LexisNexis Sales:

Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached.

Using Wharton's FDIC Research Database

Application for Bank of Pontiac NetTeller Services Internet Banking and Bill Pay

Building A SAS Application to Manage SAS Code Phillip Michaels, P&Ls, Saint Louis, MO

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Protect Your Personal Information. Tips and tools to help safeguard you against identity theft

A Microsoft Access Based System, Using SAS as a Background Number Cruncher David Kiasi, Applications Alternatives, Upper Marlboro, MD

Paper PO03. A Case of Online Data Processing and Statistical Analysis via SAS/IntrNet. Sijian Zhang University of Alabama at Birmingham

Managing Data Issues Identified During Programming

USF System & Preventing Identity Fraud

A Unique Perspective into the World of Identity Fraud

Health Services Research Utilizing Electronic Health Record Data: A Grad Student How-To Paper

Paper TT09 Using SAS to Send Bulk s With Attachments

Protect Your Personal Information. Tips and tools to help safeguard you against identity theft

Deterring Identity Theft. The Federal Trade Commission estimates that as many as 9 million Americans have their identities stolen each year.

Using DDE and SAS/Macro for Automated Excel Report Consolidation and Generation

Personal Security Check List

Meeting Identity Theft Red Flags Regulations with IBM Fraud, Risk & Compliance Solutions

IDENTITY THEFT PREVENTION PROGRAM (RED FLAGS)

A Method for Cleaning Clinical Trial Analysis Data Sets

Fraud Prevention Tips

Double-Key Data EntryNerification in SASe Software Part of Downsizing a Clinical Data System to the OS/2" Operating System

NORTHEAST COMMUNITY COLLEGE ADMINISTRATIVE PROCEDURE NUMBER: AP FOR POLICY NUMBER: BP 3250 IDENITY THEFT PREVENTION PROGRAM PROCEDURES

How To Write A Clinical Trial In Sas

Analyzing the Server Log

WOW! YOU DID THAT WITH SAS STORED PROCESSES? Dave Mitchell, Solution Design Team, Littleton, Colorado

Xavier University. Fair & Accurate Credit Transactions Act (Red Flags Rule) Policy and Procedures

Deer Park Independent School District. Identity Theft Policy and Board of Trustees Resolution

IDENTITY THEFT PREVENTION (Red Flag) POLICY

Social Security Numbers. Fermilab s International Services Office August 2011

Instant Interactive SAS Log Window Analyzer

Converting Electronic Medical Records Data into Practical Analysis Dataset

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

SUGI 29 Data Presentation

Transcription:

Paper PO23 SSN validation Virtually at no cost Milorad Stojanovic RTI International Education Surveys Division RTP, North Carolina ABSTRACT Using SSNs without validation is not the way to ensure quality of the data you collect, merge in, or use from secondary sources. Verification of SSNs is out of the scope of this paper. The author s goal was to find an easy but still powerful way to validate a given SSN. He used a publicly available source (site of the Social Security Administration) to get current high group numbers. He developed a program / macro which can be easily used by each user. Also, he points out scenarios in which validating SSNs is not advantageous. INTRODUCTION The usage of Social Security Numbers (SSNs) in society is very broad. For example, when seeking to obtain some state and federal documents, enrolling in a post secondary education system, applying for loans, opening bank accounts, and perhaps more importantly, when applying for a job, we are asked to present our SSN. That said those examples are just the tip of iceberg called SSN. From the very beginning the usage of SSNs has been abused to some degree (either intentionally or unintentionally). In recent years such abuse has even lead to 'Identity Theft' in some cases, though the legal aspect of that is quite out of the scope of this paper. The author doesn't have such intention and moreover doesn't have any formal law education. Mainly the author is working with surveys which are in the area of education and Medicare and, in some of the surveys, the respondents were asked to present their SSN. In some of the aforementioned studies, SSNs in conjunction with other identifying information were used as a key for matching survey data with other previously collected data (in our arrangement or by other independent organizations). In theory, a SSN is a very useful piece of data for merging together data from various sources. It is unique for citizens and permanent residents of the US, it is relatively compact, and it is managed by one government organization; the Social Security Administration (SSA). ABOUT SSN AND VALIDATION VALIDATION: Validation means the Social Security Number (SSN) is a valid number. Validation can be, and usually is, determined by a mathematical calculation that determines that the number MAY BE A VALID NUMBER, along with the state and year in which that the number MAY HAVE BEEN ISSUED. This process in no way ensures that the number: has actually been issued, does not belong to a deceased person, or even that number actually was issued by the Government to your candidate. In this type of process, there is no actual access to real data from the Social Security Administration (attempted or implied). In an employment situation, an SSN Validation will not provide any useful information about the actual status of an applicant s Social Security Number. VERIFICATION: Verification means that the Social Security Administration (SSA) actually confirms that the number has been issued to a particular person, the state of issue has been provided by the SSA, and the year of issue has been provided by the SSA. While the SSA will not confirm to anyone whom the SSN is actually issued to, the top of your report will be imprinted with a starred box declaring possible fraud if the name on file with the SSA is not the exact name presented in your request. The report is also imprinted if, for example, the SSN has already had death benefits paid on it or if the person is drawing SSI disability payments. There are a total of 17 classes of fraud. SSNs consist of nine digits organized in three groups usually divided by a hyphen ("-"). 1

Let say SSN = 123-45-6789 as an example. The first three digits define AREA, the middle two digits define GROUP, and the last four digits define SERIAL NUMBER. Area code ranges from 001 to 732 as of the date this paper was written. An area code of 000 is invalid. The following sequence is used by the SSA to assign Group numbers: Odd numbers, 01 to 09 Even numbers, 10 to 98 Even numbers, 02 to 08 Odd numbers, 11 to 99 Group codes of "00" aren't assigned The serial number can t be 0000. When you get an SSN from a respondent, how should you go about verifying whether the SSN is valid or invalid? There are, in the author s opinion, three possible levels of verification (including validation). 1. Full checking which requests access to SSA databases (with active and inactive SSNs). In that case you are able to check more data on a person in addition to the SSN. 2. Validation which indicates whether an SSN is formally correct. You will be able to determine if the respondent gave an invalid or still non-issued SSN. 3. A third quick and 'dirty' check will give results less accurate than those obtained using option 2 but will require less effort. It is a simple program which will look for obvious inconsistencies in content of the SSN. There is of course a fourth way of verifying and/or validating an SSN, however it does cost some money. In the US there are several tens of commercial organizations or companies that would do it for you but with a price tag of $0.25 (one quarter) per case or higher. Some companies charge more and some less but there is no one company that will do it for free (at least the author is not aware of such a company). If you need to check data for let s say 100,000 cases, it could be significant financial burden (the total cost would be at least $25,000 US). On the other side if you have survey data (data collected based on interviewing respondents and not on official documents) you have the potential of loosing some cases if you are attempting to do a complete check of the SSN. Moreover you would still be in position to collect perfectly valid SSNs that may belong to one of the respondent s parents (if, for example, the respondent is a High School student) or to someone in his or her family. Should we collect SSNs at all? The answer should be NO if we do not plan to link data collected in the survey with some other sources of data that are already available for the same person (e.g. FAFSA data etc.). If we need to match our data with some other sources of data then we must collect information from respondents that will allow us to do it later and, as a rule, it is the SSN. From the author's experience he would like to point out that some 'financial' institutions that lend money to students accept, to certain degree, SSNs that are obviously invalid or even bogus. One potential source of problems is using Tax Identification Numbers (TIN) as a surrogate for SSN. Many foreign students get TINs because they are not eligible for a real SSN (the author s own children serve as an example). The future of using SSNs is not quite clear to the author. He thinks that respondents will become more and more reluctant to provide SSNs as more and more cases of Identity Theft" surface (1). On the other side, a recent report on identity theft gave some optimistic signs by identifying less cases where "Identity theft" resulted from using electronic devices (computers, Internet, etc.) than from classic methods (using discarded but not shredded bank statements, cancelled checks etc.). Also some SSNs were reserved for display in wallets or purses (2). The most misused SSN of all time is (078-05-1120). In 1938, wallet manufacturer the E. H. Ferree Company in Lockport, New York decided to promote its product by showing how a Social Security card would fit into its wallets. A sample card, used for display purposes, was inserted in each wallet. In the peak year of 1943, 5,755 people were using that number. In all, over 40,000 people reported this as their SSN. 2

In 1940 the Board published a pamphlet explaining a new program which showed a facsimile of a social security card on the cover. The card in the illustration used a made-up number of 219-09-9999. Let s go back to the primary topic of this paper which is how to perform some level of validation of SSNs while not spending a lot of money and still ending up with acceptable results? 1. Comparing SSNs and other respondent's data with SSA database information. 2. Utilizing a more complex macro that uses live data from the SSA s website (public data). 3. Utilizing a simple macro for formal validation of SSNs. The first option is out of the scope of this paper. If you are able to spend a quarter ($0.25) per case or so, you will find a number of commercial firms who will complete the task for you. Again, you should have a guarantee that your data will not be lost or disseminated without your knowledge by the company you choose to validate your SSNs. To perform validation at a certain level of quality the author developed a program (macro) in SAS which draws in a reputable source of public information website of the Social Security Administration. PROGRAM / MACRO FOR SSN VALIDATION %macro Validation_SSN (Input_DataSet=, SSN=, SSN_Flag=, Final_Check=) ; %local Input_DataSet SSN SSN_Flag Final_Check ; * Connection to SSA site with current High group data. This information is updated once per month. ; FILENAME ssa_hgrp URL 'http://www.ssa.gov/employer/highgroup.txt' ; proc format ; value flg 0 = 'Invalid' 1 = 'Valid'; DATA high_group(keep=area group seqno) ; INFILE ssa_hgrp lrecl=256 pad ; INPUT @1 text1 $char256. ; if substr(text1,1,1) in (' ','A','T','t','N','09'x) then delete ; text1 = compress(translate(text1," ","092A"x)); length area group seqno 3 ; do i = 1 to 6 ; area = input(substr(text2,(i-1)*5 + 1, 3), 3.) ; group = input(substr(text2,(i-1)*5 + 4, 2), 2.) ; odd = mod(group, 2) ; if odd = 1 and group <= 9 then seqno = (group + 1) / 2 ; else if odd = 0 and (group >= 10 and group <= 98) then seqno = group / 2 + 1 ; else if odd = 0 and (group >= 2 and group <= 8) then seqno = group / 2 + 50 ; else if odd = 1 and (group >= 11 and group <= 99) then seqno = (group - 1) / 2 + 50 ; if area > 0 then output ; end; data Area_Group_Seqno ; do i=0 to 999; area = i ; group = 0 ; seqno = 0 ; output ; end ; drop i ; data High_Group_All ; merge Area_Group_Seqno(in=in1) high_group ; by area ; if in1 ; data & Final_Check.(COMPRESS=YES) ; /* Repository of the max issued group number converted to seqno */ array Agrp{0:999} 3 _temporary_ ; do until(allareas) ; 3

set High_Group_All end = allareas ; Agrp(area) = seqno ; end ; do until(allssns) ; set &Input_DataSet. end = allssns ; area1 = input(substr(&ssn., 1, 3), 3.) ; group1 = input(substr(&ssn., 4, 2), 2.) ; odd1 = mod(group1, 2) ; if group1 = 0 then &SSN_Flag. = 0 ; else do ; if odd1 = 1 and group1 <= 9 then seqno1 = (group1 + 1) / 2 ; else if odd1 = 0 and (group1 >= 10 and group1 <= 98) then seqno1 = group1 / 2 + 1; else if odd1 = 0 and (group1 >= 2 and group1 <= 8) then seqno1 = group1 / 2 + 50 ; else if odd1 = 1 and (group1 >= 11 and group1 <= 99) then seqno1=(group1-1)/2+50; if Agrp{area1} >= seqno1 then &SSN_Flag. = 1; /* Valid SSN */ else &SSN_Flag. = 0; /* Invalid SSN */ end ; length first_digit $ 1 ; first_digit = &ssn. ; output & Final_Check. ; end ; stop ; drop area group seqno ; options nocenter nonumber nodate ; ods rtf file="c:\sesug07\report.rtf" style=default ; proc freq data=& Final_Check. ; title "& Final_Check. - freqs on valid and invalid SSNs" ; title2 "&sysdate." ; tables first_digit * &SSN_Flag. / missing list nocum ; format &SSN_Flag. flg. ; ods rtf close ; proc datasets library = work ; * Removing unnecessary data sets. ; delete High_Group_All Area_Group_Seqno high_group ; %mend Validation_SSN ; For testing purposes, the RANUNI function was used and 250,000 SSN numbers were generated. Those numbers were submitted to Validation_SSN macro and the results are presented in Tab 1. *Test run of macro ; *Input data set ; data ssn_chk ; length SSN $ 9 ; do i = 1 to 250000 ; rn_num = ceil(999999999*ranuni(i)) ; ssn = put(rn_num, Z9.) ; output ; end ; drop i ; %Validation_SSN(Input_DataSet=ssn_chk, SSN=ssn, SSN_Flag=ssn_flag, _Final_Check=Out_File) Result from testing is in Tab 1. 4

Tab 1. first_digit ssn_flag Frequency Percent 0 Invalid 13119 5.25 0 Valid 12133 4.85 1 Invalid 13072 5.23 1 Valid 11806 4.72 2 Invalid 6170 2.47 2 Valid 18687 7.47 3 Invalid 10269 4.11 3 Valid 14838 5.94 4 Invalid 4359 1.74 4 Valid 20466 8.19 5 Invalid 3662 1.46 5 Valid 21415 8.57 6 Invalid 15269 6.11 6 Valid 9851 3.94 7 Invalid 23564 9.43 7 Valid 1559 0.62 8 Invalid 25006 10.00 9 Invalid 24755 9.90 When you are reading results from the Tab1 please keep in mind few things. First, those results are from sample based on random number generator which simulates real SSNs. Second there is no guarantee sample SSNs covers uniformly whole spectrum of area codes starting with 0, 1, etc. Word Invalid means real SSN was not issued yet. When you are looking at the very last two rows it is obvious that all SSNs are invalid. No one SSN was issued with first digit 8 or 9 yet. Higher percentage of Invalid SSNs in the row with 7 as a first digit means that smaller portion of SSNs for all area codes starting with 7 was issued so far. Author likes to add few words on applied algorithm and program. Applied algorithm is relatively simple but on the other side it is very efficient. On the stand alone PC with Windows XP and SAS V 9.1.3 author checked up to 60,000 cases per second. Result could vary from PC to PC. CONCLUSION Validation of SSNs is possible with virtually no cost. You don t need to pay unnecessarily for a service that provides additional data about an individual that you will not use at all. The SAS program or macro is straightforward and could be done in a relatively small amount of time. The base source of data (high group numbers) is official and reliable. FURTHER READING Author recommends for further reading paper (7). It is an excellent paper from Carnegie Mellon University. 5

ACKNOWLEDGEMENTS I would like to thank Christopher Alexander and Robert Lee from the Education Studies Division (RTI) for all their help, comments and support in producing this paper. REFERENCES: 1. Speech by SEC Staff: Compliance Professionals versus Identity Thieves by http://www.sec.gov/news/speech/2006/spch100506jhw.htm 2. Social Security Cards Issued by Woolworth, http://www.ssa.gov/history/ssn/misused.html 3. Your Social Security Number And Card, SSA Publication No. 05-10002, October 2006 (Recycle prior editions), ICN 451384, http://www.pueblo.gsa.gov/cic_text/fed_prog/ssnumber/ssnumber.htm 4. SSN Validation Web Service http://www.codeproject.com/aspnet/ssnvalidator.asp 5. Social Security Number Identification Verification Search (40 free searches) http://www.usinfosearch.com/free_ssn_search.htm 6. United States Social Security Number Validation (using other programming languages than SAS ) http://www.quentinsagerconsulting.com/documents/10013.htm 7. SSN Validation as a Way to Combat Identity Theft http://privacy.cs.cmu.edu/dataprivacy/projects/ssnwatch/identitytheft.html 8. Validation (SSN), JavaScript, http://webscripts.softpedia.com/script/forms-and-controls-c-c/validation- SSN--21924.html 9. SSN VALIDATION PROCEDURES, State of Pennsylvania http://www.dpw.state.pa.us/oimpolicymanuals/manuals/bop/supp/950/950_c.htm DISCLAIMER All opinions and suggestions stated in the paper on how to validate SSNs do not necessarily reflect the opinions and suggestions of the Education Survey Studies Division (RTI). Using any of commercial websites or services mentioned in references for checking SSNs is at your fully responsibility. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Milorad Stojanovic Education Surveys Division RTI International 3040 Cornwallis Rd RTP, NC, 27909 Work Phone : (919) 541-7376 E-mail: milorad@rti.org TRADEMARK INFORMATION SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6