Purpose of Today s Lecture. Secondary Data Analysis. What is Secondary Data? What is Secondary Data? What is Secondary Data? What is Secondary Data?



Similar documents
Application of Information Systems and Secondary Data. Lynda Burton, ScD Johns Hopkins University

Mary B Codd. MD, MPH, PhD, FFPHMI UCD School of Public Health, Physiotherapy & Pop. Sciences

Understanding Retrospective vs. Prospective Study designs

Evaluation: Designs and Approaches

2. Issues using administrative data for statistical purposes

Clinical Study Design and Methods Terminology

Prospective, retrospective, and cross-sectional studies

Insurance Markets Ready or Not: Consumers Face New Health Insurance Choices. Employer-based. Insurance Premium. Contribution.

A Guide for the Utilization of HIRA National Patient Samples. Logyoung Kim, Jee-Ae Kim, Sanghyun Kim. Health Insurance Review and Assessment Service

Snap shot. Cross-sectional surveys. FETP India

Article from: Health Watch. January 2008 No. 57

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

An Introduction to Secondary Data Analysis

Competency 1 Describe the role of epidemiology in public health

Data Collection Methods

Basic of Epidemiology in Ophthalmology Rajiv Khandekar. Presented in the 2nd Series of the MEACO Live Scientific Lectures 11 August 2014 Riyadh, KSA

IPDET Module 6: Descriptive, Normative, and Impact Evaluation Designs

Which Design Is Best?

Research Design. Recap. Problem Formulation and Approach. Step 3: Specify the Research Design

National Findings on Access to Health Care and Service Use for Non-elderly Adults Enrolled in Medicaid

Descriptive Methods Ch. 6 and 7

The Cross-Sectional Study:

TITLE AUTHOR. ADDRESS FOR CORRESPONDENCE (incl. fax and ) KEYWORDS. LEARNING OBJECTIVES (expected outcomes) SYNOPSIS

Maryland Population POLICY ACADEMY STATE PROFILE. Maryland MARYLAND POPULATION (IN 1,000S) BY AGE GROUP

Written Example for Research Question: How is caffeine consumption associated with memory?

Master of Public Health (MPH) SC 542

Coventry Health Care of Florida, Inc. Coventry Health Plan of Florida, Inc. Coventry Health and Life Insurance Company Commercial Lines of Business

NON-PROBABILITY SAMPLING TECHNIQUES

This glossary provides simple and straightforward definitions of key terms that are part of the health reform law.

CAHPS Clinician & Group Survey: Overview of the Questionnaires (Four-Point Scale)

Achieving Quality and Value in Chronic Care Management

How do we know what we know?

Digging Deeper into Safety and Injury Prevention Data

Susan G. Queen, Ph.D. Assistant Secretary for Planning and Evaluation

SERVICES OFFERED: Yearly Comprehensive Medication Review (CMR) Quarterly Targeted Medication Review (TMR)

Ryan White Program Services Definitions

Massachusetts Population

Does referral from an emergency department to an. alcohol treatment center reduce subsequent. emergency room visits in patients with alcohol

How Midwest Orthopedic Specialty Hospital is meeting the NEEDS of our community. NSWERING HE CALL

National Health Interview Survey

Compensation Reports: Eight Standards Every Nonprofit Should Know Before Selecting A Survey

The Promise and Challenge of Adaptive Design in Oncology Trials

2015 Michigan Department of Health and Human Services Adult Medicaid Health Plan CAHPS Report

2015 HEDIS/CAHPS Effectiveness of Care Report for 2014 Service Measures Oregon, Idaho and Montana Commercial Business

Module 223 Major A: Concepts, methods and design in Epidemiology

Introduction to Quantitative Research Contact: tel

CREATING A POPULATION HEALTH PLAN FOR VIRGINIA

SURVEY RESEARCH RESEARCH METHODOLOGY CLASS. Lecturer : RIRI SATRIA Date : November 10, 2009

Medicare- Medicaid Enrollee State Profile

Huron County Community Health Profile

Collaboration is the Key for Health Plans in a Shared Risk Environment

Cohort Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

Contact: Barbara J Stout RN, BSC Implementation Specialist University of Kentucky Regional Extension Center

Care Management Approach for People Who Are at High Risk

Single and Multiple-Case Study Designs IS493

The International EMF Collaborative s Counter-View of the Interphone Study

CAHPS : Assessing Health Care Quality From the Patient s Perspective

Teaching Health Policy and Politics in U.S. Schools of Public Health

Treatment of Low Risk MDS. Overview. Myelodysplastic Syndromes (MDS)

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.

Basic Study Designs in Analytical Epidemiology For Observational Studies

New Jersey Population

Writing Your PG Research Project Proposal

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

Department of Behavioral Sciences and Health Education

Florida Population POLICY ACADEMY STATE PROFILE. Florida FLORIDA POPULATION (IN 1,000S) AGE GROUP

2012 Vermont Household Health Insurance Survey: Comprehensive Report

Identifying High-Risk Medicare Beneficiaries with Predictive Analytics

Location for Trials- Global Considerations A Pharma Perspective. Carlo Maccarrone Assoc. Director, Clinical Research GSK Australia 7 May 2014

CMS Data Resources. Informing the Affordable Care Act. Jason Petroski, PhD, MPA Director, Division of Survey Management and Data Analysis

While health care reform has its foundation and framework at

SUMMARY- REPORT on CAUSES of DEATH: in INDIA

Where Will my New Kidney Come From?

Glossary Monitoring and Evaluation Terms

Evaluation of the first year of the Inner North West London Integrated Care Pilot. Summary May In partnership with

Proof-of-Concept Studies and the End of Phase IIa Meeting with the FDA

Certified in Public Health (CPH) Exam CONTENT OUTLINE

6. MEASURING EFFECTS OVERVIEW CHOOSE APPROPRIATE METRICS

Singapore. Challenges in the Market

Natalia Olchanski, MS, Paige Lin, PhD, Aaron Winn, MPP. Center for Evaluation of Value and Risk in Health, Tufts Medical Center.

Transcription:

Purpose of Today s Lecture Secondary Data Analysis The Good, the Bad, and the Ugly Linda Simoni-Wastila Wastila,, PhD What is secondary data? What are the different types of secondary data? What are some the analytic issues specific to secondary data? How do you validate secondary data? What types of studies can you do with secondary data? Where can you get some secondary data? 1 2 A blessing! In a nutshell.. Secondary data is any data that has been collected and aggregated by anyone other than the user This differs from Primary data, which is data that the user collects, such as Survey Focus Group edical records Interviews 3 Secondary data has been around about as long as we have had written records Secondary data for research purposes, however, much more recent advance Pharmaceutical data even a more recent advance In past 15 years the number of sources of data with pharmaceutical information has exploded from a few sources to many sources 4 Before you google possible data sources, download a database, push a button, and complete your dissertation, there are a few things you need to consider: Does it make sense for you to collect your own data? If no, what data sources are available to you that will answer your research question(s)? What type of data can you hope and dream to procure? And where can you find it? To collect or not to collect? That is the question When you might want to collect primary data: Question is qualitative in nature Study design is observational Development/assessment of a data collection instrument Question has no available secondary data (or at least data that is available to you) 5 6 And where can you find it? 1

Pros: Cost to procure < cost to collect Large sample size/nationally representative Has population of interest Data are current Standard indices and measures Good data quality Available now Has documentation 7 Cons: Cost to procure > cost to collect Does not have population of interest Data are obsolete Questionable data quality Poor documentation ay not have specific variable(s) or population(s) of interest 8 Public-Use Data There are three types of secondary data: Public Use Administrative Claims Proprietary Secondary data that has been made available to the public (usually by a Federal agency) Cost usually minimal to user (though very expensive to collect) ade public because it is Federal mandate to make federally-funded funded data collection efforts available to any researcher Is often claims-based 9 10 Administrative Claims Data Data that captures transactions involving (usually) billed services. These data can be public sector (i.e., edicaid or edicare) or private sector (i.e., managed care, employer- sponsored insurance) or a combination (i.e., compilation of various payor sources) 11 Administrative Claims Data Administrative claims generally very large, difficult for first-time time user to navigate HIPPA has introduced new patient and privacy rules that may make it more difficult to obtain, especially for sensitive data (i.e., substance abuse/mental health services) Data quality varies from source and over time Data availability from Federal agencies is very good; may need connections to obtain private sector data 12 2

Proprietary Data Data that is collected and owned by another party who makes it available for those with the resources and/or the connections Generally quite expensive (there are often academic rates and some data may be available for free for dissertation work) These data are usually claims data, though some data is not (e.g., Tufts University CSDD Drugs In Development data) Once you have determined you will use secondary data, how do you determine the type of data you need? Two key considerations: File Structure Source of the data 13 14 File Structure: are the data static in time (cross- sectional) or continuous in time (longitudinal)? If your research question involves determining outcomes associated with drug exposure, your data will need to allow assessment of temporality (i.e., you will need longitudinal data) If your research question involves examination of associations, cross-sectional sectional data may be sufficient 15 Analytic Considerations Is the sample size sufficient? Is your population of interest included? In sufficient numbers? Is everyone included in the data you received? Are the outcomes of your research question available in your data? Are they measured sufficiently? What explanatory variables and covariates are included? How are they measured? What is the data quality? Are there missing variables? Values? How are missing data coded? Imputation? Is there documentation? ou WILL need to conduct data validation and cleaning steps with your data there is no such thing as perfect data. 16 Cost to obtain - inexpensive Public Use Claims Proprietary Cost to process/store Are you able to do the analyses you want with the data? For example, if study question involves determining outcomes associated w/ drug use, does your data allow you to assess causality? Cross-sectional sectional data generally will not allow you to assess causality, only the association between variables (exception: may ask respondents when an event happened so can assess temporality). Is your sample size sufficient? Does your research require national representativeness? Cost usually much less than to collect yourself Recency of data Has data var of interest (no control over what is asked and how it is asked) Has pop of interest (if nat l representative, may have but samples may still be too small for analyses w/ adequate power, esp if interested in rare events) Standard indices and measures Data quality (may be excellent but, you have no control over how data is collected, recorded, coded, entered, aggregated, etc; don t know where errors are) ational representativeness Waiting time to obtain (generally faster than own collection; however, to get recent data may wait; claims often difficult to process and clean) Documentation /A Unknown Unknown 17 18 3

Assessing quality of secondary data Secondary data varies in cost, timeliness, scope, available measures, and other domains How to know if it s s good data Research literature that has used the data Examine data dictionary and other documentation Ask for a sample or demo data Discuss with users Have vendor do preliminary search for variables, power How to validate secondary data Reliability varies across fields, respondents, and years Coding among institutions may vary umbers can be miscoded If you have outliers, look at data problems first Do interim analyses very early on; validate every filed in some way Calculate mean, median, mins and maxs, skewness, kurtosis If data are missing, try to determine why. Skip patterns? Coded as 9 or 0?? Or is it random? 19 20 our study design will be largely dependent upon the type and quality of data and its intended use Uses of secondary data: Exploratory/hypothesis generating Combine with primary data collection Hypothesis testing Exploratory/Hypothesis generating: can use secondary data to explore research questions prior to fielding your own primary data collection effort Can use to determine whether you will need to oversample specific populations, variable measurement, refinement and associations, and revising hypotheses 21 22 Combine with Primary data dynamic analysis that allows supplementation with indepth interviews, focus groups, surveys to provide context for quantitative analyses of secondary data Hypotheses Testing: ost research in our field falls into this domain Cross-sectional sectional data are most useful for exploratory studies and combined with primary data, but can be Ho-testing if data are congruent with the research questions and hypotheses Particularly Case-Control Control Studies any cross-sectional sectional data are collected annually, sop can pool to increase sample size and conduct trend analyses if look at cross-sections sections over time 23 24 4

Case-Control Control Studies Rare outcomes If money was not an issue, we d prefer prospective cohort studies Case-control studies are efficient cohort studies with a ready-made population from which to draw study subjects Case-Control Control Studies: Pros and Cons Strengths: Relatively inexpensive Good design for chronic conditions w/ long latency periods Rare diseases Can examine multiple factors Weaknesses: Inefficient for evaluating rare exposures Cannot directly estimate incidence rates Can be more difficult to control for biases and confounding 25 26 Designs for Temporal Analysis If your data are longitudinal, then you have more sophisticated options and can do robust Ho-testing studies Cohort: examine characteristics of cohorts at 2 or more points in time. Cohort = any group that experiences major life event at same time (eg( eg,, birth cohort). Especially suited to study aging, social, political or cultural change. Panel: data collected at 2 or more points in time for the same persons. Only panel designs allow study of changes among respondents rather than simply populations. Other types: event history design. Time series: used to describe changing patterns of phenomena, explain sources of changes, and make predictions about future changes. Can often use cross- sectional data on same measures bur different respondents. eed many time periods (at least 30). Where to Get Secondary Data See Handout ICPSR: InterUniversity Consortium for Political and Social Research www.icpsr icpsr.umich.edu www.cdc cdc.gov/nchs www.samhsa.gov and www.icpsr.umich.edu/sahda/ www.cms cms.hhs.gov 27 28 SOURCE DATA SET GEERAL DESCRIPTIO DataBase CBS EPS/ES HAES ACS/HACS HSDA/SDUH TF arketscan PB edicaid Claims PA edicare Claims Public Use Proprietary Claims/Encounter * Source(s) AHRQ CDC CDC SAHSA; SAHDA IDA; SAHDA edstat Various CS; also individual states IS CS AHRQ Agency for Healthcare Research and Quality http://www.ahrq.gov/ http://www.meps.ahrq.gov/ http://www.meps.ahrq.gov/epset/hc/epsethc.asp (for online analysis) http://www.ahrq.gov/data/hcsusix.htm Centers for edicare and edicaid Services http://www.cms.hhs.gov/ http://www.cms.hhs.gov/apps/mcbs/ EPS edical Expenditure Panel Survey HCSUS HIV Cost and Services Utilization Study CBS edicare Current Beneficiary Survey The edical Expenditure Panel Survey (EPS) is designed to continually provide policymakers, health care administrators, businesses, and others with timely, comprehensive information about health care use and costs in the United States, and to improve the accuracy of their economic projections. EPS collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of private health insurance held by and available to the U.S. population. (HCSUS is the first major research effort to collect information on a nationally representative sample of people in care for HIV infection. HCSUS is examining costs of care, utilization of a wide array of services, access to care, quality of care, quality of life, unmet needs for medical and nonmedical services, social support, satisfaction with medical care, and knowledge of HIV therapies. The edicare Current Beneficiary Survey (CBS) is a continuous, multipurpose survey of a nationally representative sample of aged, disabled, and institutionalized edicare beneficiaries. CBS, which is sponsored by CS, is the only comprehensive source of information on the health status, health care use and expenditures, health insurance coverage, and socioeconomic and demographic characteristics of the entire spectrum of edicare beneficiaries. 29 30 5

SOURCE CHS ational Center for Health Statistics http://www.cdc.gov/nchs/express.htm http://www.cdc.gov/nchs/about/major/nhis/hisdesc.htm http://www.cdc.gov/nchs/nhanes.htm http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm http://www.cdc.gov/nchs/lsoa.htm DATA SET HIS ational Health Interview Survey HAES III ational Health and utrition Examination Survey ACS ational Ambulatory edical Care Survey HACS ational Hospital Ambulatory edical Care Survey LSOA Longitudinal Studies of Aging GEERAL DESCRIPTIO The ational Health Interview Survey is a crosssectional household interview survey. HIS collects data each year in three areas: demographics, health status, and health care utilization. Data may be used to provide national estimates on the incidence of acute illness and injuries, prevalence of chronic conditions and impairments, the extent of disability, utilization of health care services, and other health related topics. Data focus is on chronic diseases and risk factors such as heart disease, diabetes arthritis, infectious diseases, immunization status, growth and development of children, overweight, dental health, respiratory disease, osteoporosis, mental health, others. The data can be used to provide national prevalence estimates. The purpose of ACS is to gather and disseminate statistical data about the medical care provided by office-based physicians in the US. The purpose of HACS is to produce statistics that are representative of the experience of the US population in hospital emergendy departments (EDs) and outpatient departments (OPDs). Contains online drug database search engine http://www2.cdc.gov/drugs/ LOSA is a multicohort study of persons 70 years of age and over designed primarily to measure changes in the health, functional status, living arrangements, and health services utilization of two cohorts of Americans as they move into and through the oldest ages. 31 6