Understanding Unduplicated Count and Data Integration Presenters: Loren Hoffmann, System Administrator WI Statewide HMIS Ray Allen, Executive Director Community Technology Alliance September 14th and 15th, 2004 Chicago, IL Sponsored by the U.S. Department of Housing and Urban Development 1
Topics to be covered: Types of fields - and data quality Statistical Considerations An unduplicated Count Overcounts Undercounts Sponsored by U.S. Department of Housing & Urban Development 2
Data Quality: General tests: Completeness NULL vs something Validity Is the data valid? Is the data reasonable? Sponsored by U.S. Department of Housing & Urban Development 3
Data Quality: Types of data fields: Picklists; yes/no Text - numeric, alphanumeric Date Sponsored by U.S. Department of Housing & Urban Development 4
Data Quality: Picklists Validity - must be item from picklist Completeness - response/no response Who updates the list? System administrator or user What happens to deleted/inactivated items Sponsored by U.S. Department of Housing & Urban Development 5
Data Quality: Date field Valid date; NULL value Determining Validity: Control by format on entry mmddyyy 10152003 mmmddyyyy Oct152004 Is date a valid date Is it reasonable Sponsored by U.S. Department of Housing & Urban Development 6
Data Quality: Text Little that can be easily validated on a large scale Sponsored by U.S. Department of Housing & Urban Development 7
Data Quality: Completeness For a given field, how many NULLS are there? For the entire database For a specified period of time For a given agency/user Sponsored by U.S. Department of Housing & Urban Development 8
Unduplicated Count and the Client identifier To generate an unduplicated count (or to merge systems), most HMIS systems create and/or generate a common client identifier. Sponsored by U.S. Department of Housing & Urban Development 9
(UN) Duplicate Counts How does your system manage the unique client or unduplicated client count? You need to know the algorithm used Evaluate the data elements that are used to generate the count Sponsored by U.S. Department of Housing & Urban Development 10
Un-duplicated Counts Two possible errors It is not magic or foolproof. Undercount the number of clients: The system counts two client record entries as a single client when it really is two clients Overcount the number of clients: The system counts two client record entries as two clients when it really is the same client Sponsored by U.S. Department of Housing & Urban Development 11
Unique Client Count Two possibilities: Sponsored by U.S. Department of Housing & Urban Development 12
Unique Client Count Put all the clients in the same room and count them; Make a list of all the clients that you know; Sponsored by U.S. Department of Housing & Urban Development 13
Unique Client Count Using: Client first name Client last name Client date of birth Client gender Sponsored by U.S. Department of Housing & Urban Development 14
Un-duplicated Counts An example - how many clients? Using first name, last name, gender, date of birth William Smith, male, 10-15-1973 Bill Smith, male, 10-15-1973 William Smith, male, no DOB Consider: address, race, HH members, etc Sponsored by U.S. Department of Housing & Urban Development 15
Statistical Considerations Defining the universe: Number of client records in the system vs. Number of ACTIVE client records vs. Number of UNDUPLICATED client records vs. Number of valid responses for a given data element Sponsored by U.S. Department of Housing & Urban Development 16
Defining the Universe: Example: 1200 client records 1100 ACTIVE client records 1000 UNDUPLICATED clients 980 DOB fields have data 500 Marital Status fields have data Sponsored by U.S. Department of Housing & Urban Development 17
Defining the Universe: DOB example - 980 of 1000 had a valid date, therefore: If 70% of the 980 records are 18+ (adults) then the actual number of adults on the system is between 68% and 72% (margin of error is 2%) Marital status, with only 50% having information, has a margin of error of +- 25% Sponsored by U.S. Department of Housing & Urban Development 18
Issue: What do I do with conflicting answers for the same client? e.g. different race, DOB, or response to a question like : Is client homeless? with both a yes and a no Or DOB that is different Sponsored by U.S. Department of Housing & Urban Development 19
Coverage of Data Database statistics vs the universe Determine the relevant universe EX. Emergency Shelters Men s shelters Women s shelters Family units DV, etc Sponsored by U.S. Department of Housing & Urban Development 20
Merging Databases Most HMIS systems are decentralized and will require some form of systems integration and/or data migration to obtain unduplicated counts, service utilization patterns and characteristics of homeless persons served. Sponsored by U.S. Department of Housing & Urban Development 21
11 County Region of Northern California Population = 7,512,499 Geographic Area = 10,691 (sq. miles) Equivalent in size to the state of Maryland Sponsored by U.S. Department of Housing & Urban Development 22
BACHIC Bay Area Counties Homeless Information Collaborative Mission: To better enable policy makers, service agencies, and funders to understand and service the needs of the homeless within the community Goals: Obtain unduplicated regional count of homeless persons Identify prevalence of cross-county chronic homelessness Understand client movement across continuum boundaries Analyze service usage across continuums Inform funders about effectiveness of sponsored programs in the region Leverage HMIS learning and expertise across multiple communities; increase success factors, reduce risk factors Sponsored by U.S. Department of Housing & Urban Development 23
BACHIC Product: Regional HMIS Data Warehouse Outcomes: Better planning and resource management Clearer vision of the present and future needs of the homeless Sponsored by the Charles and Helen Schwab Foundation Sponsored by U.S. Department of Housing & Urban Development 24
HMIS Implementations by County Legacy System MS Access Metsys Deloitte ServicePoint - All locally hosted except for Contra Costa and Monterey Sponsored by U.S. Department of Housing & Urban Development 25
RHINO Data Collection Regional Homeless Information Network BACHIC group agreed to the collection of All Universal Data Elements All Program-Specific Data Elements What each county has agreed to forward RHINO All Universal & Program Elements except for Protected Personal Information (PPI) Exception: Year of Birth, Program Entry & Exit Dates and ZIP Code Sponsored by U.S. Department of Housing & Urban Development 26
Data Warehouse and Counties San Francisco Metsys System Regional HMIS Data Warehouse San Mateo Daisy/HOPE System ServicePoint Hosted Systems ServicePoint Stand-alone Locally Hosted System Other Local Systems Sponsored by U.S. Department of Housing & Urban Development 27
RHINO Design Data Entry Extraction Transformation Encryption (SSH2) County HMIS System Standard Format INTERNET Regional HMIS Data Warehouse BACHIC Reports Sponsored by U.S. Department of Housing & Urban Development 28
Project Phases Growth Hardware Software Testing Validation Of Design Pilot of Santa Clara County Implementation Of Select Diverse Counties All other Counties Vision Automated Real-Time Flexible Accurate Phase I Phase II Phase III Phase IV Sponsored by U.S. Department of Housing & Urban Development 29
Design of System Regional HMIS Data Warehouse Linux Server SQL Server DAT72 Backup Minimize counties efforts Especially ongoing duties/obligations Security, privacy Multiple diverse HMIS systems Different stages of implementations Different data formats Reporting Flexibility so as not to limit future reporting choices Work flow Processes and procedures for resolving exceptions Sponsored by U.S. Department of Housing & Urban Development 30
Transference of Data Data from counties will be CSV format (min. requirement) Minimum encryption (128 bit) using SSH2 (Secure Shell Version 2) Regional HMIS Data Warehouse County HMIS Systems Double firewall for increased security Future use of OpenSSL (Open Source software) Sponsored by U.S. Department of Housing & Urban Development 31
Regional Unique Identifier Required for de-duplication of customers within 11 counties. Information from personal identifiable data elements. Uses a hash algorithm to encrypt ID. Key is created by 11 counties, unknown to data warehouse team. Can not be reverse engineered, is one way encryption. Sponsored by U.S. Department of Housing & Urban Development 32
Data Integrity Before data is merged, it will be checked for the following: Each record/entity ID is unique Required data elements have some value Date formats are correct and values are reasonable Code values conform to HMIS Standard Ex. Gender: Male=0, Female=1 All data elements can be linked back to a unique person identifier within submitted data set Sponsored by U.S. Department of Housing & Urban Development 33
Reporting Demographics Total client population: Age Race Ethnicity Adult client population Gender Income Sources Disabilities Family income group Top 10 last permanent zip codes Scope Status Grid Item As of 12/04 Counties (CoC s) 11 Agencies 85 Emergency Shelters 47 Emergency Beds 8600 Etc Regional Homeless Population by County Solano 7% Santa Cruz 8% Santa Clara 11% Sonoma 9% Alameda 15% Contra Costa 3% Marin 2% Monterey 3% Napa 10% San Mateo 13% San Francisco 19% Sponsored by U.S. Department of Housing & Urban Development 34
Reporting % Veteran Status Demographics Families w/ children Single Veterans Chronically Homeless Migration and Service Access Last permanent zip outside county Clients receiving shelter or other services in multiple counties Program Effectiveness Reason for leaving Destination Chronically Homeless Yes No Grand Total Yes 63% 45% 49% No 38% 55% 51% Grand Total 100% 100% 100% 15% of Adult Client Population are Veterans Age Group % of Total Client Population 17 and under 27% 18 30 17% 31 50 40% 51 61 8% 62 and over 2% Unknown 7% Sponsored by U.S. Department of Housing & Urban Development 35
Contact Information Community Technology Alliance 115 East Gish Road, Suite 222 San José, CA 95112 Ray Allen Executive Director (408) 437-8800 (408) 437-9169 (fax) e-mail: admin@ctagroup.org Sponsored by U.S. Department of Housing & Urban Development 36
Questions and Answers Sponsored by U.S. Department of Housing & Urban Development 37