Creating a Database. Frank Friedenberg, MD



Similar documents
Creating a Database. Frank Friedenberg, MD

Analyzing Research Data Using Excel

SPSS Workbook 1 Data Entry : Questionnaire Data

IBM SPSS Statistics for Beginners for Windows

Chapter 2 Introduction to SPSS

Guidelines for Data Collection & Data Entry

Best Practices for REDCap Database Creation

Data analysis process

IBM SPSS Direct Marketing 22

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

4. Descriptive Statistics: Measures of Variability and Central Tendency

2: Entering Data. Open SPSS and follow along as your read this description.

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

IBM SPSS Direct Marketing 23

National Child Measurement Programme 2015/16. IT System User Guide Part 3. Pupil Data Management

A. Grouping to Obtain Counts

In This Issue: Excel Sorting with Text and Numbers

Mail Merge Using Thunderbird. Bob Booth February 2009 AP-Tbird2

This chapter reviews the general issues involving data analysis and introduces

Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students


Pay Equity Software Instructions

Learning Management System (LMS) Guide for Administrators

Creating an online survey using Sona Systems

Steps to Data Analysis

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Introduction to SPSS 16.0

January 26, 2009 The Faculty Center for Teaching and Learning

FAO Standard Seed Security Assessment CREATING MS EXCEL DATABASE

Creating a Participants Mailing and/or Contact List:

SPSS (Statistical Package for the Social Sciences)

STAB22 section 1.1. total = 88(200/100) + 85(200/100) + 77(300/100) + 90(200/100) + 80(100/100) = = 837,

IBM SPSS Direct Marketing 19

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

STATISTICAL DATA ANALYSIS

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

MICROSOFT ACCESS TABLES

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

School account creation guide

Microsoft Access Introduction

Online Management System FAQs

How To Understand The Basic Concepts Of A Database And Data Science

GETTING YOUR DATA INTO SPSS

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

Self Service Time Entry Time Only

Creating an Excel XY (Scatter) Plot

This book serves as a guide for those interested in using IBM SPSS

Creating an Excel Database for a Mail Merge on a PC. Excel Spreadsheet Mail Merge. 0 of 8 Mail merge (PC)

kalmstrom.com Business Solutions

PORTFOLIOCENTER USING DATA MANAGEMENT TOOLS TO WORK MORE EFFICIENTLY

Using Computer Programs for Quantitative Data Analysis

SPSS for Windows importing and exporting data

Microsoft Access 2007 Introduction

Estimating and Vendor Quotes An Estimating and Vendor Quote Workflow Guide

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

Importing and Exporting With SPSS for Windows 17 TUT 117

Oracle Database 10g: Introduction to SQL

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

H-ITT CRS V2 Quick Start Guide. Install the software and test the hardware

Using Microsoft Excel to Manage and Analyze Data: Some Tips

Introduction Course in SPSS - Evening 1

Lecture 1: Review and Exploratory Data Analysis (EDA)

Data exploration with Microsoft Excel: analysing more than one variable

EXTENDED LEARNING MODULE A

Create Custom Tables in No Time

DIBELS Data System Data Management for Literacy and Math Assessments Contents

Chapter 7 Section 7.1: Inference for the Mean of a Population

SPSS: Getting Started. For Windows

SPSS The Basics. Jennifer Thach RHS Assessment Office March 3 rd, 2014

Analyzing and interpreting data Evaluation resources from Wilder Research

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Introduction to PASW Statistics

Setting up a basic database in Access 2003

Framing Business Problems as Data Mining Problems

Comparing Excel, Access and REDCap as Data Management Tools for Human Health Research Data

ACCESS Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818)

TheEducationEdge. Export Guide

Foundation of Quantitative Data Analysis

Basic Concepts in Research and Data Analysis

Client Marketing: Sets

How to Use a Data Spreadsheet: Excel

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Description and Documentation for the Cooperative Database Company Dataset Version 1.0

Data Mining is the process of knowledge discovery involving finding

Microsoft Word 2010 Mail Merge (Level 3)

How To Read Data Files With Spss For Free On Windows (Spss)

Instructions for Analyzing Data from CAHPS Surveys:

The Chi-Square Test. STAT E-50 Introduction to Statistics

ICS Service Desk Call and Ticket Handling Procedures

IBM SPSS Direct Marketing 20

Clean Up Rules Quick Steps Search Tools Change Views Export Data Convert to tasks Contact Groups. Outlook Functions

Transcription:

Creating a Database Frank Friedenberg, MD

Objectives Understand the flow of data in a research project Introduce a software based database (Access) Tips for avoiding common coding and data entry mistakes Introduce concept of Exploratory Data Analysis

Clinical and Research Databases Research database Table oriented so that data is accumulated for eventual export to a statistical package for data analysis and reporting Create Forms to represent tables that are storing information Different than a Clinical database (e.g. EPIC ) Emphasis on displaying results for individual data rather than accumulating multiple records

Data Management Flow for Clinical Research Scientific Hypotheses Identify Specific Data Elements Required to Test Hypotheses (e.g. patient age, BMI, etc.) Data Acquisition (paper forms) Computer Program for Data Acquisition/Entry/ Organization (e.g. ACCESS) Computer Program for Data Entry and Organization (e.g. EXCEL) Output to Analytical Software (e.g. SPSS)

Data Entry Paper Tedious sometimes 100 s of columns Mistakes common and they can be hard to find Subject s BMI = 24.6 and you typed in 26.4 (you ll never know you made this mistake) Fill in data on wrong row Hard to remember codes in coding dictionary

Database Software versus Excel Excel has some capabilities to sort data but its primary function is to create financial spreadsheets Can be used for small research data sets becomes unusable as the number of columns gets > 50 100 Database software (e.g. ACCESS) are designed to collect, sort, and manipulate data Can create Data Quality Control features that ensure valid data is entered and missing data is eliminated A relational database allows for linking of an unlimited number of tables and therefore an unlimited number of variables

Example of a Form

Here is the Table Storing Results From the Form

How is a database organized? Multiple linked tables Tables store records (rows in the database) Tables have a collection of fields (these are the columns) e.g. Patient identifiers Name, DOB, address,..are stored in separate fields

Table = Records and Fields Fields Records

Need to Code Data for Each Field Example Audit Questions Format =Scientific Numbers Minimum = 0 Maximum = 4

Relational Database-Linking Tables Subject Info Id Name Age 10 Smith 50 11 Jones 55 12 Doe 60 Anthropometrics ID Weight (lb) Weight (kg) 10 230 104.5 11 212 96.4 12 199 90.4 Physical Activity ID KCAL KCAL/kg 10 2400 23.1 11 2652 27.5 12 2350 25.9 Treadmill Performance ID V02 V02/kg 10 2.8 26.7 11 3.2 33.1 12 2.1 23.2

Database Software versus Excel Databases are also more user friendly for importing data from multiple sources Imports of different data types (e.g. SAS files and Dbase files) into different tables can be linked via common identifiers such as subject ID Actually SPSS does this very well Merging multiple data sources into Excel can be a challenge

Sharing and Exchanging Data Multiple users can access the same database via a network (true for Excel also) Can be local or over the internet Indirect access to the database via a client application (e.g. Survey Monkey)

Database Design Considerations

What to collect What questions are to be answered? Think of the data tables in your future publications Focus on the key data elements rather than collect as much as possible Variables will often evolve stop early and often and assess what you have

What needs to be in the research database? Research variables directly related to the hypotheses being tested YES Clinical measures used for screening MAYBE Blood work, ECG, medical history Administrative data NO Contact information Scheduling

Designing the Questions Try to collect continuous data convert to categorical during analysis period. (e.g. age) Use validated scales/instruments Don t build your own unless unavoidable Consider asking questions in more than one way concerning a critical variable under study. (allows for validity assessment) Question 10: Do you have diarrhea? Question 23: Are your stools sometimes very loose? Consider reverse questioning Avoid measurements that cluster at one position or one end of a scale E.g. measuring body temperature on healthy outpatients Pilot the form for 2 5 patients, then revise

Use Standard Terminology and Scales Example 1: categorize patients as febrile or afebrile dichotomized at a measurement of 100.4 0 F Example 2: if doing a study on cirrhosis categorize on subject s MELD score or Child Pugh classification Example 3: Severity of illness APACHE, etc.

Data Coding Use numbers and link them to a specific characteristic (e.g. male =1, female =2) Other examples: race, type of insurance Required for summary and inferential statistical analysis Speeds data input/transcription Avoids problem of entering erroneous scripts Coded references should be incorporated into a data dictionary

Erroneous Scripts

Rules for Data Entry Decide which variables require a number or string code e.g. code subject s name or MRN as a string variable Continuous values are entered directly Missing values must be different values from a real possible response Don t use 0 or 99 if the variable is a continuous data field just leave blank! Don t know is a response do not leave blank. Have a code for don t know Coding instructions should be in coding dictionary

Avoid open ended questions in subject/patient surveys What is your gender? correct responses could be man, woman, male, female What is your level of education? the answers 9 th grade, did not finish high school, and no college education all are correct. provide specific choices for reply.

Use Pre Coded Response Forms When Possible

Be Careful With Scales Subject forgot to put tick mark

Data Validation

Data in Spreadsheet Finding Errors Subject ID Gender Age 1001 Male 52 1002 Male 54 103 Mael 65 1004 Female 54 5 Female 52 1006 Female 52 1007 Femele 75 1008 Male 48 1009 M 37 1010 Female 73 11 F 54

Database Entry Forms Avoid These Errors

Exploratory Data Analysis Explore why there is missing data Is there a pattern to it? E.g. embarrassing question, poorly worded question Did you delete by accident? Examine for unusual consistency /inconsistency issues Is every patient measured one month 6 1 Good time to use histograms and calculate skewness Find inadmissible ranges and codes Patient coded as age 205 instead of 25.

Attributes of Successful Data Management Attention to detail Explicit structure and process Robust designs plan, plan, plan