Pre-entry Review. Industry Applications. NESUG '96 Proceedings 330

Similar documents
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

GMA/FPA SmartBrief. ASTA SmartBrief. The premier source of daily news delivered to the desktops of travel agents and executives.

Instructions for Analyzing Data from CAHPS Surveys:

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Financial Mathemetics

DEFINING %COMPLETE IN MICROSOFT PROJECT

Vembu StoreGrid Windows Client Installation Guide

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Updating the E5810B firmware

Small pots lump sum payment instruction

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

E-learning Vendor Management Checklist

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Canon NTSC Help Desk Documentation

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Software project management with GAs

GENESYS BUSINESS MANAGER

Trivial lump sum R5.0

Texas Instruments 30X IIS Calculator

Project Networks With Mixed-Time Constraints

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

A Performance Analysis of View Maintenance Techniques for Data Warehouses

What is Candidate Sampling

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Brigid Mullany, Ph.D University of North Carolina, Charlotte

1. Measuring association using correlation and regression

Simple Interest Loans (Section 5.1) :

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

One Click.. Ȯne Location.. Ȯne Portal...

IMPACT ANALYSIS OF A CELLULAR PHONE

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

Multiple-Period Attribution: Residuals and Compounding

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Design and Development of a Security Evaluation Platform Based on International Standards

IT09 - Identity Management Policy

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

Politecnico di Torino. Porto Institutional Repository

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

= (2) T a,2 a,2. T a,3 a,3. T a,1 a,1

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

CHAPTER 14 MORE ABOUT REGRESSION

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Types of Injuries. (20 minutes) LEARNING OBJECTIVES MATERIALS NEEDED

An Alternative Way to Measure Private Equity Performance

Using Series to Analyze Financial Situations: Present Value

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

8 Algorithm for Binary Searching in Trees

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Tuition Fee Loan application notes

Can Auto Liability Insurance Purchases Signal Risk Attitude?

VIP X1600 M4S Encoder module. Installation and Operating Manual

Meta-Analysis of Hazard Ratios

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Time Value of Money Module

Hosted Voice Self Service Installation Guide

Enterprise Content Management

The OC Curve of Attribute Acceptance Plans

the Manual on the global data processing and forecasting system (GDPFS) (WMO-No.485; available at

Lecture 2: Single Layer Perceptrons Kevin Swingler

Calculation of Sampling Weights

Traffic State Estimation in the Traffic Management Center of Berlin

Introducing Online Reporting Your step-by-step guide to the new online copy report Online Reporting

Statistical Methods to Develop Rating Models

Question 2: What is the variance and standard deviation of a dataset?

A powerful tool designed to enhance innovation and business performance

Enterprise Master Patient Index

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

iavenue iavenue i i i iavenue iavenue iavenue

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

EE31 Series. Manual. Logger & Visualisation Software. BA_EE31_VisuLoggerSW_01_eng // Technical data are subject to change V1.0

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

MONITORING METHODOLOGY TO ASSESS THE PERFORMANCE OF GSM NETWORKS

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

Testing Database Programs using Relational Symbolic Execution

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Section 5.3 Annuities, Future Value, and Sinking Funds

Demographic and Health Surveys Methodology

Computer-assisted Auditing for High- Volume Medical Coding

The Current Employment Statistics (CES) survey,

A Study on Secure Data Storage Strategy in Cloud Computing

A Master Time Value of Money Formula. Floyd Vest

Construction Rules for Morningstar Canada Target Dividend Index SM

Alarm Task Script Language

Transcription:

ndustry Applcatons THE ROLE OF SAS PROGRAMMERS N CLNCAL TRAL DATA ANALYSS Mng Wang ndependent SAS Consultant Abstract Ths artcle shows n-depth the role of SAS programmers n clncal tral data analyss. t outlnes the task flow of clncal tral data analyss, dscusses project programmng plannng and documentng procedures, descn'bes SAS programmers' tasks and programmng sldls, and provdes nsght on how to work wth people n your team: clncal data managers, statstcans, programmer manager and project manager. Ths paper s a valuable overvew for programmers who are startng out or plan on beng nvolved wth clncal trals, and for data managers and statstcans who work wth SAS programmers n a clncal tral team. t s also a valuable management gudelne for experenced SAS programmers. ntroducton SAS s wdely used n clncal tral data analyss n pharmaceutcal, botech and clncal research companes. SAS programmers play an mportant role n clncal tral data analyss. n addton to doctors and clncans who collect clncal tral data, the group conductng data analyss ncludes statstcans, clncal data managers (COMs) and SAS programmers. Statstcans provde the deas and methods of the data analyss, clncal data managers manage the collected data and control the data qualty. n between, SAS programmers mplement the analyss methods on the collected data and provde the study summary tables, data lstng and graphs to the statstcans andlor clncans to wrte study report. SAS programmers work closely wth statstcans and data managers. They provde the lnk between raw data and the analyss. Ths paper dscusses the SAS programmers' roles n the clncal tral data analyss task flow, descrbes the SAS programmers' tasks and sklls, and provdes nsght on how to work wth people n the team. Task Flow of Clncal Tral Data Analyss A Case Report Form (CRF) desgned for a study s used to collect clncal tral data. The collected data are stored n a correspondng database. These data wll be analyzed and the results wll be ncluded n the study report. An example task flow of clncal data analyss s shown n Fgure 1. The detal steps may vary from company to company. Fnal Blank CRF After the CRF s desgned by clncans, the data analyss group should revew the CRF and make sure that all the felds for analyss can be computerzed. The fnal CRF wll be used to desgn the database and dstrbuted to the ste to. collect data. n the real world t happens sometmes that dfferent versons of CRFs are used to collect data. Ths wll cause extra work n data analyss. Annotate CRF For each secton of the CRF, for nstance demographc secton, adverse event(ae) secton, etc, a database table s usually desgned. For each queston andlor answer on the CRF, a feld n the database table s desgned. The attrbutes of the felds, such as, feld type (numerc or character), length, format, etc. are also consdered. To annotate the CRF s to desgn the database on paper by wrtng down the name of the table, sngle or multple records per patent andlor vst, feld name, type, length, and assocated format. The database tables wll be converted to SAS data sets eventually and an annotated CRF wth the data set name, varable name, type, length, and SAS format s very helpful for programmng and data analyss. Though COMs may wrte an annotaton for the purpose of database desgn, t s suggested that SAS programmers wrte another one for the purpose of SAS programmng. Database Desgn and TestnK n some companes SAS programmers also desgn the database, whle n other companes only COMs desgn the database. A well desgned database provdes convenence n SAS. programmng. Before enterng real data nto a database, the desgned database needs to be tested usng test data and real data to make sure t works the way you expected. The test result should be documented and approved by managers. A data entry nstructon may be wrtten along wth the desgned database to specfy general data entry rules, e.g. all character felds are captalzed, etc. and specfc rules for each fle f needed. Ths wll help data entry people to have a better understandng of the database and wll result n hgher qualty data entry. Pre-entry Revew NESUG '96 Proceedngs 330

\. ~ *, 1 Fnal Blank CRF ( \ ( j Annotate CRF r---~~~! l "... '. ~, y ) {,.!Test Database ------/ y ( Data Entry, Clncal Ste y l r ) WO,~;g CRF J Y... Database Desgn 1 \ )... \ ( J! ~ Pre-entry Revew \ j, \ ndustry Applcatons SAS Datasets r-.. SAS... Wlen Database "Oean r Database wth Data ~,,- nput Programs ) 1 ~ l Test Programs Data Cleanng Programs.. Data Clarfcaton ~ ~ tems. ) Clncal Ste Database Audt 'W " ) l. Database Lockup ( \ FnalSAS l Datasets! ~,.. Programs for Tablesllstngs graphs.. Report Tables! Lstngs /Graphs..... ~ ( Analyss PlanlTable Shells '-- Edt Database Fnal Revew byqcgroup t Study Report.'_._. -... Fgure 1 331 NESUG '96 Proceedngs

ndustry Applcatons When COMs get the workng CRFs from clncans they need to make sure that the data from the CRF can be entered nto database. Ths procedure may provde nput to database modfcaton and generate manual data clarfcaton tems. Data Entry Usually the data entry group has a standard operatng procedure to ensure the data entry qualty. SAS nput Program!! SAS programmers need to wrte programs to read database data to SAS data sets and also generate a SAS format lbrary. A program to prnt out data contents and data book s also useful. Data Cleanng System A SAS programmer wll wrte SAS programs to clean the database data accordng to the edt specfcatons provded by COMs. Ths data cleanng system can be desgned to run at nght. Data clarfcaton tems wll be generated by ths data cleanng system and wll be sent to the clncal ste for a resoluton. Edt Database CDMs wll edt the database accordng to the resoluton of the data clarfcaton tems. The data cleanng cycle wll repeat untl all of the data are entered and all of the queres are resolved. Report Table. Lstng. and Graph Programs n the mean tme, whle data s beng re-entered and cleaned, SAS programmers may start to wrte SAS programs to generate report tables, lstngs, and graphs accordng to the statstcans' analyss plan and tables shells. Drafts of tables, lstngs and graphs wll be revewed by the statstcans and COMs and more data clarfcaton tems may also be generated n ths process. Database Lockup and Fnal Report Tables. Lstngs. and graphs When the database s clean, t wll be locked and fnal SAS data sets wll be generated. An audt for the entre database or random selected ponts may be performed to ensure the qualty of the database. f the audt s passed, the SAS programs wll be rerun to generate tables, lstngs, and graphs, whch wll be revewed by the qualty control group and fnal tables, lstngs and graphs wll be generated and ncluded n the study report. n some companes there s a SOP (Standard Operaton Procedure) for each box n the dagram. t descrbes each of the detaled steps n the procedures. SAS Programmers' Tasks n Clncal Tral Data Analyss Project Drectory Setup For each project create a project drectory. n each project drectory, the basc subdrectores are the followng, though you may not need all of them dependng on the system desgn n your company: raw data (database data or ASC fles) SAS nput programs (SAS programs to convert database data to SAS data sets) SAS format lbrares SAS data sets data cleanng programs SAS macro lbrary lstng programs/output table programs/output graph programs/output tnscellaneous programs documentaton/memos n each drectory, t s convenent to create an ndex fle to ndcate program names, author and table ttles, etc. You may put the above drectores n a development drectory and a producton drectory. Programs are developed and tested under the development drectory. programs can be coped to the producton drectory after they are fully tested and revewed by a programtnng manager, and can be run under the producton drectory to produce fnal tables. There may also be a group wde macro lbrary and SAS format lbrares. You can specfy your macro lbrares, format lbrares, system optons, etc n you autoexec.sas fle. Wrtng Prop=args to Convert or Read n Data from Database to SAS Data sets There are many ways to convert data from a database to SAS data sets dependng on the system setup n your company. For example, Use SAS/ACCESS to access database data drectly and save to temporary or permanent SAS data sets. Output database data to ASC fles, and wrte SAS programs to read data from ASC fle to SAS data sets. Use DBMSCOPY, software whch can convert database data to SAS data sets, or vce versa. n addton to data sets, you also need to create format program to generate format lbrary. You can create more than one SAS format lbrares f necessary. Wrtng Programs to Conduct Data Cleanng NESUG '96 Proceedngs 332

ndustry Applcatons n many companes, an automated data cleanng system s created. t stll needs to be updated or revsed for new projects. A data cleanng specfcaton usually s wrtten by CDMs to specfy what to check on whch data set and whch tem. The checkng s usually performed wthn a data set or across data sets. t ncludes range checkng, mssng value checkng, logc checkng, protocol volaton checkng, etc. f a varable fals to pass the checkng rules, the name and value of ths varable wll be put nto a data set along wth the key varables n the same record, such as patent number, vst data, etc. COMs wll revew and edt ths data set and check whether ths "error" s caused by data entry or orgnally recorded on the CRF. f t s a data entry error, t needs to be corrected n the database. f t s orgnally recorded on the CRF, COMs need to send the data clarfcaton form to the clncal ste. COMs also need to enter an dentfer to the records he/she revewed so that the system won't generate the same data clarfcaton tem agan n the next run. Wrte Provams for Study Tables. Lstngs. and Graphs Programmng report tablesllstngs/graphs (T/UG) s the major task of SAS programmers n a clncal tral study. Appendx lstngs are lsts of varables by patent; report tables are the tables contanng summary nformaton across patents, and usually are counts and statstcal results; graphs can present nformaton for sngle patent or summary nformaton across patents. n T/UG programs, there are usually two parts. The frst part s to prepare the data set through SAS data steps and SAS procedures to get the data set and varables for the output. The second part n TL programs s usually data _NULL_ to put the value of output varables to each column and row along wth table ttles, column ttles, etc. The second part n graph programs s SAS graph procedures. - Prepare Data set Common programmng strateges n ths step nclude subsettng data sets; combnng data sets; reshapng data sets, such as, changng multple records.per patent to. sngle record per patent or vce versa; select statstcal parameters from statstcal procedures, such as mean, medan, mnmum, maxmum, standard devaton, p value, and save them n a SAS data set, etc. t requres programmng and data analyss sklls. You need to: Understand the data structure, such as sngle or multple records per patent, sngle or multple records per vst, etc. Understand the rules n the study. For nstance, the algorthm to collapse Af. records. The algorthms for collapsng AE records n AE lstngs and summary tables can be very complex dependng on how the data were collected. n the lstngs the varaton of the severty, study drug relatedness, etc. of a contnuous AE may be shown. n the summary tables a patent may be counted only once for hs most severe AB. Understand the methods of calculaton. For nstance, to calculate the mean of drug dose over all the patents, where each patent was dosed at many study perods, the mean dose for each patent may be calculated frst, and then the mean of these ndvdual means s calculated. Another example s to calculate the percentage of the counts n each cell n the table. The denomnator could be the overall total, or the row total, or the column total. You need to confrm every detal of the calculaton methods wth statstcans. Understand how SAS works. - Output Lstngs and Tables Usng Data NULL Usng Oata _NULL_and PUT statements, you can output varable values along wth table ttles and column ttles to an output fle. The followng tems are the common ssues. n ths step. Formatted output. Character varables may have rght or left justfed output; numercal varables need to be lned up by decmal ponts; use formatted values for coded varables. f output data set s sorted by patent and vst date (vs_date), output patent number only at frst. patent, output vst date only at trst. vs date. Page Break Pont. Page break pont can be desgned for each new patent number, or vst number, or at any pont when the page s full. For a contnuaton of a patent from prevous page, prnt the patent number followng wth (Cont'd). Calculate the poston of each column. Some SAS programmers wrote macros and tools to enhance the flexblty n ths step. These methods make t easer even when you delete or add columns later on. Prnt program names, generaton date, database lockup date, varable values. etc, n the header or footnote. Make the above varables nto macro varables for the portablty. Wrap long text felds. For nstance a 200 333 NESUG '96 Proceedngs

ndustry Applcatons Prnt page of n. character comment feld needs to be wrapped to 30 characters and put under the comment column. Macros have been developed by SAS programmers to handle ths ssue. They also work for wrappng more than one long text felds. - Common SAS Graph Procedures PROC GPLOT, PROC GCHART and SAS/GRAPH statements are commonly used. PROC GREPLA Y can present more than one graphs on one page. Mscellaneous Database Audtng, Spell checkng, data book programmng, etc Create Tools Wrte SAS macros or Programs to create tools to enhance effcency and standardzaton for your project and/or the entre programmng group. Study Documentaton The followng tems could be ncluded n the study documentaton. Backup and retreval nformaton on archved fles. Study drectores. Study data sets, programs, output fles, and report tablellstng/graph ttles. Rules and algorthms. used n the study data analyss. Backup Data sets and Programs At the end of each study, archve all study related data sets, programs, output fles, memos, etc. on tapes or dskettes. They wll be put to storage along wth other paper documents, such as CRFs, etc. Basc Sklls SAS Programmers Should Have Base SAS, SAS procedures, common Statstcal procedures, SAS/Graph, SAS Macro Processng. Operatng Systems and edtors Fle transfer between dfferent operatng systems, such as between Unx and PC, ncludng text fle and SAS data sets, etc. Convert data set from spread sheets or other database to SAS or vce versa. Experences n programmng usng C, Fortran, etc. Database sklls. Work wth People n Your Team SAS programmers are the lnk between the data and the report tables. So you are also the lnk between statstcans and CDMs. t s your responsblty to dentfy nconsstences between the report table specfcaton and the data n the database, report these nconsstences and fnd a soluton through team dscusson. A regular team meetng s very helpful to summarze, dscuss and solve these knds of problems. Through team meetngs you also can confrm the algorthms or rules you use to subset data sets wth both CDMs and statstcans. These rules may be related to protocols, CRF desgn and the way that the data are recorded. You also need to make sure that the database s consstent for you to apply these rules. Workng wth Clpcal Data Managen fcdmsl You are gong to work wth CDMs on the data and the database. You mayor may not get nvolved n database desgn and the data cleanng process dependng on each company's setup. CDMs wll provde you a database wth data. Durng the process of generatng report lstngs, tables and graphs, report any data problems to CDMs (such as duplcate records, mssng code for AEs, nvald data, nconsstent codng methods, etc.) through e-mal or memo and keep a copy for yourself for trackng and documentaton purpose. When you prnt records wth data problems to CDMs, prnt the records wth key varables, such as patent number, vst date, etc, so that CDMs can locate the records easly from the database. Prnt a ttle or wrte a memo n the prntout to ndcate the problems n the prntout. Workng wth Statstcans You work wth statstcans on report lstngs, tables and graphs. Frst you'll have a dscusson wth the statstcan about the specfcatons the statstcan provdes to you. The specfcaton may be very detal orented wth vara):lle names and algorthms, or just a table shell. A very effcent way found s to study the table shell, wrte down the data sets, varables, and possble algorthms you need for each lstng and table. Dscuss your understandng and plan wth statstcans before you really start to program. Ths process gves you an opportunty to understand the database and the tables. Often you may fmd an nconsstency between database and the table specfcatons. Statstcans may change the specfcatons NESUG '96 Proceedngs 334

ndustry Applcatons based on your nput. Keep copes of confrmed specfcatons. After you've fnshed a table or lstng, the statstcan wll revew t to make sure that t s rght and accurate. Before you gve the tables and lstngs to the statstcan, some QC steps are necessary. Check the SAS log to see whether there are any errors, warnngs, unntalzed, etc, messages. Make sure that there are no bugs n you program. Check your programmng logc and accuracy. Check lstng results aganst database prntout (data book). Check tables aganst lstngs. Check statstcal numbers aganst statstcal output. Check format and layout. Make sure varable formats are used correctly. Numbers are lned up by decmal ponts; calculated numbers, such as means, are rounded frst then formatted to the output, etc. f calculated p-value s 0.0000, then present t as <0.0001. Also check spellng, justfcaton, etc. communcatons wthn your project team, your deadlnes, and your dffcultes n meetng these deadlnes. The programmng manager also expects you to develop macros, tools or new programmng methods for the whole group to enhance effcency and standardzaton. n some companes, the programmng manager s the prmary revewer of you performance n your annual evaluaton. t s mportant to show that you can ndependently work n a project as a programmer, and n the mean tme constantly provde the programmng manager your performance nformaton. Summary Enhancng SAS programmng sklls, accumulatng clncal tral data analyss experences, and keepng postve atttude n a team work envronment are the keys to success. The fewer errors that a statstcan fnds from your lstngs and tables, the hgher grade you may get from hmlher at the annual evaluaton. Workng wth The Project Manager The Project manager can be the statstcan, or clncal scentst. Dscussng wth the programmng manager, the project manager wll desgn tme lnes and deadlnes. So followng tme lnes and meetng deadlnes are the most mportant thngs that the project manager expects from you. n some companes, you also need to be aware of the budget. f you follow tme lnes and meet deadunes, you usually help managers to control the budget. Wrtng down your own plan for each secton of work s very helpful for you to meet the "bg deadlne". Keep records that explan any dffcultes n meetng the scheduled deadlne. Maybe t s because the database s not clean, specfcatons changed, or a tough SAS programmng ssue. Tme lnes are usually developed by experenced programmers. n addton to planng the tme for studyng specs, wrtng programs, checkng results, you also need to make sure that you have tme for revsons. Desgnng a precse tme lne s dffcult. Havng good communcatons n the group s the most mportant thng to adjust nternal deadlnes n order to meet the bg deadune. Work wth The Programmng Manager The programmng manager s usually your supervsor. He/she mayor may not be n the project that you are workng on. The programmng manager wll help you solve your SAS programmng or computer problems. Keep the programmng manager nformed of 335 NESUG '96 Proceedngs