Data Management and Analysis for Successful Clinical Research. Lily Wang, PhD Department of Biostatistics Vanderbilt University

Similar documents

Guidelines for Data Collection & Data Entry

Paul Harris, PhD. Planning, Collecting and Managing Data For Clinical And Translational Research

FACILITATOR/MENTOR GUIDE

Patient Satisfaction Survey Results

PATIENT INFORMATION INTAKE F O R M BESSMER CHIROPRACTIC P. C.

Implementation of SDTM in a pharma company with complete outsourcing strategy. Annamaria Muraro Helsinn Healthcare Lugano, Switzerland

Training Guide. Health. May 2012

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Determines if the data you collect is practical for analysis. Reviews the appropriateness of your data collection methods.

A Guide for Successfully Completing the Group Life Insurance Evidence of Insurability Form

Basics of Clinical Data Management

WKU Master of Public Administration Program Alumni Survey 2013

Guideline for Developing Randomization Procedures RPG-03

Clinical Trial Transparency. What is available?

[MICROSOFT EXCEL FOR DATA ENTRY] Fernandez Hospital Pvt Ltd. Academics Dept & Clinical Research Unit. Page1

Testing Hypotheses About Proportions

Best Practices for REDCap Database Creation

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

SMALL BUSINESS MOBILIZATION LOAN APPLICATION

College of Medicine Enrollment MD and MD/MPH Fall 2002 to Fall 2006

A Guide for Successfully Completing the Group Disability Insurance Evidence of Insurability Form

WKU Master of Public Administration Program Alumni Survey Results 2015

Meaningful Use Stage 2 Administrator Training

SPSS Workbook 1 Data Entry : Questionnaire Data

Dear Nursing Student,

IBM SPSS Statistics for Beginners for Windows

Multivariate Logistic Regression

Impromy Weight Loss 2014

PATIENT REGISTRATION FORM West Salem Clinic West Salem Clinic Dental Total Health Community Clinic

Mind on Statistics. Chapter 15

Guideline for Developing Randomization Procedures RPG-03

EcgSoft. Software Developer s Guide to EcgViewer. Innovative ECG Software info@ecg-soft.com

Introduction to Database Concepts and Microsoft Access Database Concepts and Access Things to Do. Introduction Database Microsoft Access

Children s Research Management System (CRMS) Version 3.0. Children s Hospital Colorado Research Institute Training Guide April 2015

Elementary Statistics

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Creating Codes with Spreadsheet Upload

User guide for generating bespoke tabulations in Showcase

Ryokan College Student Profile Program Completion Rates from 1991 to 2011 Program & Graduation Rate Student 85%

A Guide for Successfully Completing the Group Disability Insurance Evidence of Insurability Form

An Online Research Methods Course

How To Understand The Basic Concepts Of A Database And Data Science

HEALTH RISK ASSESSMENT (HRS) QUESTIONNAIRE

SAS Enterprise Guide in Pharmaceutical Applications: Automated Analysis and Reporting Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN

Instructions for applying data validation(s) to data fields in Microsoft Excel

Analyzing Research Data Using Excel

Grant Writing Essentials Deborah Ann McClellan, Ph.D.

Florida Department of Elder Affairs 701C Congregate Meals Assessment Rule: 58-A-1.010, F.A.C

Registration Forms (Please leave NO blanks, if something does not apply write N/A and if unknown write unknown)

Data Management II. Database file. Data management processes. Sasivimol Rattanasiri, Ph.D Clinical Epidemiology Unit Ramathibodi Hospital

Privacy guide. How to manage and protect your data. Instructions. Privacy guide

Clever + DreamBox SFTP Instructions

Pharmaceutical Applications

C-CoP (1B1b) Results, Procedure-Based Report for Colorectal Cancer Screening by Jurisdiction

Application for Health Coverage & Help Paying Costs

Mind on Statistics. Chapter 4

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Welcome to the Data Analytics Toolkit PowerPoint presentation on EHR architecture and meaningful use.

A Guide for Successfully Completing the Group Life Insurance Evidence of Insurability Form

Table 1. Survey Demographic by Position

General Membership Handbook

Describing, Exploring, and Comparing Data

Graphics in R. Biostatistics 615/815

Baccalaureate Degree Program. Application for Admission & Readmission RN-BSN Track

Data and Information Management in Public Health

Lab 9 Access PreLab Copy the prelab folder, Lab09 PreLab9_Access_intro

Is it statistically significant? The chi-square test

Instructions for Completion of Primary Bloodstream Infection (BSI) Form (CDC )

Database preparation and Abstract/Manuscript 101

THE ROWANS SURGERY MEDICAL HISTORY QUESTIONNAIRE MALE & FEMALE 18+

SuccessMaker Learning Management System User s Guide Release 1.0

Attention: Please read this entire page before filling out the application. If you do not provide what is needed, we cannot help you.

Sponsor. Novartis Generic Drug Name. Vildagliptin. Therapeutic Area of Trial. Type 2 diabetes. Approved Indication. Investigational.

Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint InfoPath 2013 Web Enabled (Browser) forms

5 Point Choice ( 五分選擇題 ): Allow a single rating of between 1 and 5 for the question at hand. Date ( 日期 ): Enter a date Eg: What is your birthdate

Microsoft Word 2010 Mail Merge (Level 3)

Innovative Techniques and Tools to Detect Data Quality Problems

Understanding the Science of Type 2 Diabetes. Anne Westbrook and April Gardner NABT Dallas, TX November 1, 2012

A basic create statement for a simple student table would look like the following.

Health Information Technology (HIT) Program Application

Utilizing Microsoft Access Forms and Reports

PRACTICE PROBLEMS FOR BIOSTATISTICS

Chapter 2 Introduction to SPSS

Data Management for Multi-Environment Trials in Excel

Illinois Standard Health Employee Application for Small Employers

Using Names To Check Accuracy of Race and Gender Coding in NAEP

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

ATHN Assets for the Community: ATHNdataset Michael Recht, MD PhD 10/30/2014 1

Introduction to Microsoft Access

Making an online form in Serif WebPlus

Borgess Diabetes Center PATIENT REGISTRATION/DEMOGRAPHICS

Creating a Database. Frank Friedenberg, MD

4/3/2012. Surveillance. Direct Care. Prevention. Quality Management

Case-Control Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

NCAA COMPLIANCE FORMS DATABASE INSTITUTIONAL MANUAL

Parametric and Nonparametric: Demystifying the Terms

Data Analysis Stata (version 13)

Date of birth Gender NHS number (if known) Town/Country of birth. Home Telephone no. Work Telephone no.

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March

Transcription:

Data Management and Analysis for Successful Clinical Research Lily Wang, PhD Department of Biostatistics Vanderbilt University

Goals of This Presentation Provide an overview on data management and analysis aspects of clinical research Minimize errors in datasets Ensure statistical software packages will recognize data correctly Facilitate efficient data analysis for projects 2

An Overview of the Process 1. Write the protocol - consult mentors, colleagues and visit us to finalize specific aims, testable hypothesis and study design 2. Create a Data Dictionary 3. Create a Patient Directory 4. Prepare datasets for statistical analysis 3

An Overview 5. The statisticians will assist with statistical tests 6. Review results, start thinking about writing the paper 7. Additional tables and figures 8. Write the paper/abstract 4

Timeline For abstract, please send us datasets at least 4 weeks in advance Please contact us even if you don t have the dataset ready, so we can schedule other projects and leave room for yours 5

1. Writing the Proposal Background Why this research is important Be concise Specific Aims, Testable Hypothesis Be focused, clearly conceptualized, and feasible The most important section of the proposal Consult mentors, colleagues and visit us 6

1. Writing the Proposal Methods/Experimental Design Participants Inclusion/Exclusion Criteria Recruiting Process How the measurements will be made 7

1. Writing the Proposal Challenges/Potential Problems Loss to follow up Bias - Confounding variables and other sources Human Subjects Protection Plan Informed consent Adverse events Privacy, confidentiality issues 8

Bias Definition - any systematic error in the design, conduct or analysis of a study that results in a mistaken estimate of an exposure s effect on the risk of disease 9

Confounding - definition In a study of whether factor A is a cause of disease B, we say a third factor, factor X is a confounder if Factor X is a known risk factor for disease B Factor X is associated with factor A, but is not a result of factor A 10

Confounding an example coffee drinking and pancreatic cancer 11

Confounding an example coffee drinking and pancreatic cancer If an association is observed between coffee drinking and pancreas cancer, then The coffee => cancer or Smoking is a risk factor for cancer and smoking is associated with coffee drinking 12

1. Writing the Proposal Confounding ways to deal with it in design phase match cases to controls on confounding variables in analysis phase stratification adjustment 13

1. Writing the Proposal Statistical Analysis (provided by the statisticians) Sample size/power calculations Analysis Plan 14

1. Writing the Proposal A good example Dr Malow stemplate 15

2. Create a Data Dictionary Name Description Units Type Values (Permissible ranges) group treatment group discrete 1= placebo, 2=trt age age in years year continuous 10 79 bp_sys systolic blood pressure mmhg continuous 100 160 bp_dias diastolic blood pressure mmhg continuous 80 150 date0 date for baseline assessment date mm/dd/yyyy 16

3. Create a Patient Directory ID FirstName LastName Address Phone... 1 John Smith 2 Mary Ann 3 Joe Kim Include any other information you like to record for reference Keep this file to yourself, and don t send it to us 17

4. Prepare datasets for Statistical Analysis A good example ID group age sex ht wt bp_sys bp_dias stage race date0 complic 1 1 25 1 61 350 120 80 3 3.0 1/15/1999 0 2 1 65 2 68 161 140 90 2 1.0 2/5/1999 1 3 1 25 1 47 150 160 110 4 2.0 1/15/1998 1 4 1 31 1 66 161 140 105 2 2.0 4/1/1999 0 5 1 42 2 72 177 130 70 2 1.0 2/15/1999 0 6 1 45 2 67 160 120 80 1 2.0 3/6/1999 0 7 1 44 1 72 145 120 80 1 1.0 2/28/1999 0 8 1 55 1 72 161 120 95 4 2.0 6/15/2000 1 9 1 0.5 2 66 174 160 110 3 4.0 12/14/2000 1 10 1 21 2 60 155 190 120 2 2.0 11/14/2000 0 18

4. Prepare datasets for Statistical Analysis First - strip off any confidential information (name, address, phone #) Rows - each subject (sample, observations) Columns - each measurement (variable) 19

4. Preparing datasets Variable Names (column labels) No special characters ( < etc) except _ Start with letters, not numbers Less than 8 characters Should be unique No spaces 20

4. Preparing datasets Data Values Be consistent: M m, date format, upper/lower case No spaces No embedded formula use paste special, then paste values Missing data: leave it as blank Unless there are different reasons for missing, code them as different values 21

4. Preparing datasets Only 1 variable in each column, use separate columns for non-mutually exclusive values Derived variables statisticians can do those Keep all information as continuous variables, information can t be recovered 22

4.Preparing datasets It s OK to have separate data sheets for demographic info and clinical measurements As long as there is a unique identifier (ID) that links all data sheets 23

4. Preparing Datasets If you are in a hurry Record data in a file and call it Raw_xxx.xls Later transform it into the desired format It s OK to format only those needed for analysis and send only these variables to the statisticians Good idea: visit us after you ve entered the first 5 patients and completed the data dictionary 24

What s wrong with this data sheet? Comparison of Drug A and Drug B Drug A Age of Patient Patient Height Weight 24hrhct blood pressure tumor Race Date complications Gender (inches) (pound) stage enrolled 1 25 Male 61" >350 38% 120/80 2-3 Hipanic 1/15/99 no 2 65+ female 5'8" 161 32 140/90 II White 2/05/1999 yes 3? Male 120cm 12 >160/110 IV Black Jan 98 yes, pneumonia 4 31 m 5'6" obse 40 140 sys 105 dias? ican-americ? 5 42 f >6 ft normal 39 missing =>2 W Feb 99 6 45 f 5.7 160 29 80/120 NA B last fall n 7 unknown? 6 145 35 normal 1 W 2/30/99 n 8 55 m 72 161.45 12/39 120/95 4 ican-americ 6-15-00 y 9 6 months f 66 174 38 160/110 3 Asian 14/12/00 y 10 21 f 5' Drug B 1 55 m 61 145 normal 120/80 120/90 IV ative Americ 6/20/ 3 2 45 f 4"11 166? 135/95 2b none 7/14/99 n 3 32 male 5'13" 171 38 140/80 not staged NA 8/30/99 n 4 44 na 65? 40 120/80 2? 09/01/00 n 5 66 fem 71 0 41 140/90 4 w Sep 14th y, sepsis 6 71 unknown 172 199 38 >160/110 3 b unknown y, died 7 45 m? 204 32 140 sys 105 dias 1 b 12/25/00 n 8 34 m NA 145 36 130 3 w July 97 n 9 13 m 66 161 39 166/115 2a w 06/06/99 n 10 66 m 68 176 41 1120/80 3 w 01/21/58 n Average 45 65 155 38 25

Acknowledgement Guideline for data collection and data entry http://biostat.mc.vanderbilt.edu/wiki/main/theresascott 10 Data Entry Commandments, Spreadsheet from Heaven/Hell http://biostat.mc.vanderbilt.edu/wiki/main/danielbyrne 26