Creating a Database. Frank Friedenberg, MD



Similar documents
Creating a Database. Frank Friedenberg, MD

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

IBM SPSS Statistics for Beginners for Windows

Chapter 2 Introduction to SPSS

Data analysis process

Analyzing Research Data Using Excel

IBM SPSS Direct Marketing 19

Mail Merge Using Thunderbird. Bob Booth February 2009 AP-Tbird2

GETTING YOUR DATA INTO SPSS


2: Entering Data. Open SPSS and follow along as your read this description.

Pay Equity Software Instructions

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Introduction Course in SPSS - Evening 1

TEKScore: Scanning & Scoring

FAO Standard Seed Security Assessment CREATING MS EXCEL DATABASE

Results CRM 2012 User Manual

Introduction to PASW Statistics

This book serves as a guide for those interested in using IBM SPSS

PowerSchool Parent Portal User Guide. PowerSchool 7.x Student Information System

Data exploration with Microsoft Excel: analysing more than one variable

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Welcome to the Data Analytics Toolkit PowerPoint presentation on EHR architecture and meaningful use.

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

SPSS (Statistical Package for the Social Sciences)

A. Grouping to Obtain Counts

Steps to Data Analysis

SPSS Manual for Introductory Applied Statistics: A Variable Approach

2.1 Data Collection Techniques

IBM SPSS Direct Marketing 20

Chapter 7 Section 7.1: Inference for the Mean of a Population

SPSS Workbook 1 Data Entry : Questionnaire Data

4. Descriptive Statistics: Measures of Variability and Central Tendency

Learning Management System (LMS) Guide for Administrators

NDSR Utilities. Creating Backup Files. Chapter 9

Using SPSS, Chapter 2: Descriptive Statistics

January 26, 2009 The Faculty Center for Teaching and Learning

Introduction to SPSS 16.0

Improving Productivity using IT - Level 3 Scenario Assignment Sample Test 4 Version SampleMQTB/1.0/IP3/v1.0. Part 1 Performance

Introduction to Microsoft Access 2003

Creating an online survey using Sona Systems

How To Edit An Absence Record On A School Website

Creating an Excel Database for a Mail Merge on a PC. Excel Spreadsheet Mail Merge. 0 of 8 Mail merge (PC)

Importing and Exporting With SPSS for Windows 17 TUT 117

Sunrise Acute Care (SAC) Module 1 New Provider Basic Course

Estimating and Vendor Quotes An Estimating and Vendor Quote Workflow Guide

DIBELS Data System Data Management for Literacy and Math Assessments Contents

Microsoft Access 2010 Overview of Basics

Market Pricing Override

Introduction to IBM SPSS Statistics

SPSS The Basics. Jennifer Thach RHS Assessment Office March 3 rd, 2014

Events Forensic Tools for Microsoft Windows

Word processing software

Setting up a basic database in Access 2003

Excel Add-ins Quick Start Guide

ADP Workforce Now V3.0

Guidelines for Data Collection & Data Entry

Access Database Design

MySQL for Beginners Ed 3

BEST PRACTICES Implementing the Paperless Office

Critical Care EEG Database Public Edition. User Manual

Information Technology

Client Marketing: Sets

SPSS: Getting Started. For Windows

SAS Marketing Automation 5.1. User s Guide

Appendix III: SPSS Preliminary

Easily Identify the Right Customers

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

Best Practices in Data Visualizations. Vihao Pham 2014

Comparing Excel, Access and REDCap as Data Management Tools for Human Health Research Data

Instructions for data-entry and data-analysis using Epi Info

Clinical Data Management (Process and practical guide) Nguyen Thi My Huong, MD. PhD WHO/RHR/SIS

Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students

LS-DYNA Material Table. HyperMesh 7.0

Understanding Data De-duplication. Data Quality Automation - Providing a cornerstone to your Master Data Management (MDM) strategy

Winzer Corporation 1 Revision: 4.0

SPSS Step-by-Step Tutorial: Part 1

H-ITT CRS V2 Quick Start Guide. Install the software and test the hardware

How To Understand The Basic Concepts Of A Database And Data Science

Construct the MDM file for a HLM2 model using SPSS file input

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Converting Excel Spreadsheets or Comma Separated Values files into Database File or Geodatabases for use in the USGS Metadata Wizard

The Chi-Square Test. STAT E-50 Introduction to Statistics

Analyzing and interpreting data Evaluation resources from Wilder Research

Best Practices for REDCap Database Creation

PORTFOLIOCENTER USING DATA MANAGEMENT TOOLS TO WORK MORE EFFICIENTLY

SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Data Analysis and Statistical Software Workshop. Ted Kasha, B.S. Kimberly Galt, Pharm.D., Ph.D.(c) May 14, 2009

Transcription:

Creating a Database Frank Friedenberg, MD

Classic Data Management Flow for Clinical Research Scientific Hypotheses Identify Specific Data Elements Required to Test Hypotheses Data Acquisition Instruments (optional) (paper forms) Computer Program for Data Collection and Organization (e.g. Access) Output to Analytical Software (e.g. SPSS)

Data Management Process Define data set to acquire, create codes (e.g. F=0, M=1) Paper Data forms Enter data into computer Accumulate in database Data quality control (check for missing, out of range and illogical responses) Transfer to statistical and presentation software Do statistical analysis Prepare graphs and tables

Differences between a clinical and research database Clinical database (e.g. PACS ) Report oriented so data is displayed for clinical decision making (e.g. abdominal X-ray report) Emphasis on displaying or reporting of individual data rather than accumulating multiple records Not a lot of variables recorded Research database Table oriented so that data is accumulated for eventual export to a statistical package for data analysis and reporting Less emphasis on individual records Create Forms to represent tables that are storing information

Example of a Form

Here is the Table Storing Results From the Form

Databases versus Excel Excel has some capabilities to sort data but its primary function is to create financial spreadsheets Can be used for small and limited research data sets and simple lists Databases (e.g. ACCESS) are designed to collect, sort, and manipulate data Can create Data Quality Control features that ensure valid data is entered and missing data is eliminated A relational database allows for linking of an unlimited number of tables and therefore an unlimited number of variables

Relational Database Example Subject Info Id Name Age 10 Smith 50 11 Jones 55 12 Doe 60 Anthropometrics ID Weight (lb) Weight (kg) 10 230 104.5 11 212 96.4 12 199 90.4 Physical Activity ID KCAL KCAL/kg 10 2400 23.1 11 2652 27.5 12 2350 25.9 Treadmill Performance ID V02 V02/kg 10 2.8 26.7 11 3.2 33.1 12 2.1 23.2

Relationships Between Tables Subject ID Links All the Tables

Databases versus Excel Databases are more suitable for importing data from multiple sources Imports of different data types (e.g. SAS files and Dbase files) into different tables can be linked via common identifiers such as subject ID Merging multiple data sources into Excel can be a challenge

How is a database organized? Multiple linked tables Tables store records (rows in the database) Tables have a collection of fields (these are the columns) Patient identifiers Name, DOB, address,..are stored in separate fields

Table = Records and Fields Fields Records

Need to Code Data for Each Field

Advantages of a Database Collection of data in a centralized location Data stored so as to appear to users in one location A relational database brings it all together

Sharing and Exchanging Data Multiple users can access the same database via a network Can be local or over the internet Access via a client application or web interface Server allows remote access over the internet from anywhere

Database Design Considerations What to collect What questions are to be answered? Think of the data tables in your future publications Focus on the key data elements rather than collect as much as possible What statistical package will be used? May need to format the data to accommodate program it will be exported to For example, gender can be recorded in the database as M or F but statistical package may require 0 and 1

What needs to be in the research database? Research variables directly related to the hypotheses being tested-yes Clinical measures used for screening-maybe Blood work, ECG, medical history Administrative data-no Contact information Scheduling

Examples of Data Collection Forms Paper then into Excel; tedious, mistakes common Scantron - Must scan in; many opportunities for mistakes Directly into Excel -tedious Direct computer entry via forms Allow real-time validity/completeness checks, prompts Inexpensive but requires expertise and maintenance

Designing the Questions Try to collect continuous data convert to categorical during analysis period. (e.g. age) Use validated scales/instruments Don t build your own unless unavoidable Consider asking in more than one way a question concerning a critical variable under study. (allows for validity assessment) Avoid measurements that cluster at one position or one end of a scale E.g. measuring body temperature on healthy outpatients Pilot the form for 10-20 patients, then revise

What is Data Coding? A group of letters, numbers, or symbols and the rules that form a link to a specific terminology (e.g. male =1, female =2) Coded references should be incorporated into a data dictionary Dictionaries should be based on standardized terminology e.g. if doing a study on cirrhotics categorize patient on MELD score or Child-Pugh classification

Why code? Forces data into analyzable structure, format Speeds data input/transcription Vastly simplifies data analysis/reporting

Messed Up Coding

Rules for Data Entry Decide which variables require a number or string code e.g. code subject s name or MRN as a string variable Continuous values are entered directly Missing values must be different values from a real response Common formats are 99 or bullets Don t use 99 if the variable is a continuous data field! Don t know is a response do not leave blank. Have a code for don t know Don t use 0 for missing, just leave blank Coding instructions should be in coding dictionary

Avoid open-ended questions in subject/patient surveys What is your gender? -correct responses could be man, woman, male, female What is your level of education? -the answers 9 th grade, did not finish high school, and no college education all are correct. -provide specific choices for reply. -electronic surveys avoid these problems

Use Pre-Coded Response Forms When Possible

Be Careful With Scales Subject forgot to put tick mark

Data Validation

Data in Spreadsheet Finding Errors Subject ID Gender Age 1001 Male 52 1002 Male 54 103 Mael 65 1004 Female 54 5 Female 52 1006 Female 52 1007 Femele 75 1008 Male 48 1009 M 37 1010 Female 73 11 F 54

Database Entry Parameters Avoid These Errors

Exploratory Data Analysis Explore why there is missing data Is this a software or coding glitch? Did you delete by accident? (always back-up) Examine for unusual consistency /inconsistency issues Is every patient measured one month 6 10 Good time to use histograms and calculate skewness Find inadmissible ranges and codes Patient coded as age 205 instead of 25.

Attributes of Successful Data Management Attention to detail Explicit structure and process Robust designs Anticipate failures, lapses and mistakes Design systems that identify and correct them