Computer-asssted Audtng for Hgh-Volume Medcal Codng Computer-asssted Audtng for Hgh- Volume Medcal Codng by Danel T. Henze, PhD; Peter Feller, MS; Jerry McCorkle, BA; and Mark Morsch, MS Abstract The volume of documents beng processed by computer-asssted codng (CAC) has rased the bar regardng the need for audt methods sutable for producton control and qualty assurance. In ths hghvolume producton envronment, t becomes vtally mportant to adapt and mplement technques that have become a fundamental requrement for producton operatons management (POM). We present technques and statstcal methods that are developed and mplemented for audtng the medcal codng process and producng scores that accurately reflect the qualty of the codng work, are comparable across tme and between coders and audtors, and employ statstcal methods for producton control. The technques and methods here descrbed are patent pendng and are mplemented n the A-Lfe Medcal, Inc. CoAudt system that s commercally avalable for audtng both computerzed and human codng. Introducton The advent of CAC n hgh-volume envronments demands the use of modern statstcal producton control and QA methods. Tradtonally, codng has been done manually by human coders. Because the volume of medcal documents beng manually coded at any one locaton has been relatvely small, qualty assurance (QA) has prmarly depended on the ndvdual sklls, tranng, and contnung educaton of the coders. In the feld of medcal codng, QA methods hstorcally consst of an ad hoc revew of some fxed number or percentage of the coders work product wth ad hoc or subjectve scorng and evaluaton of audt results. Audt results across tme and between coders are, therefore, not mathematcally comparable. Addtonally, these methods do not scale to hgh-volume processng. Although some aspects of QA and producton control can be handled automatcally, there s stll the need for human audt of the codng work product. However, codng s a complex matter, and for some sgnfcant percentage of medcal documents there wll be a measurable dversty of opnon as to how they ought correctly to be coded. Further, the process s suffcently complex that even the audtors are expected to make errors, though presumably at a lower error level than the coders or processes that are beng audted. Consderaton for both matters of opnon (subjectve judgment) and error must be taken nto account when devsng a medcal codng audt methodology. Background Faced wth the problem of provdng clents usng the Actus CAC applcaton wth a means to audt codes and perform producton control, we embarked on a process of frst solctng clent nput and then developng consensus on a specfcaton lmt methodology for scorng around whch we developed and/or appled the necessary methodologes for the followng research ssues.
2 Perspectves n Health Informaton Management, CAC Proceedngs; Fall 2006 Research Questons Begnnng from the descrpton of how codng professonals audt one another, we address the followng ssues:. Sample Selecton: Calculatng the sample sze for audts 2. Specfcaton and Control Lmts: Establshng and nterpretng a. Specfcaton lmts that measure the acceptablty of ndvdual coded documents wth a method for audt scorng that produces results sutable for ncorporaton n statstcal QA and producton control, but whch are also desgned so that the composte sample scores track the subjectve judgment of human audtors when evaluatng a computerzed or human codng process to be acceptable, margnally acceptable, or unacceptable b. Control lmts that measure the acceptablty of a computerzed or human coder 3. Calbratng for Audtor Varablty: Adjustng the statstcal methods to calbrate for audtor subjectvty and error so audt results can be meanngfully compared across tme and between audtors, coders, and CAC Methods Correspondng to the research questons, the followng methods are employed. Sample Selecton. Sample selecton s governed frst by dentfyng the populaton from whch the sample wll be drawn, second by applyng some statstcal method to determne the sample sze, and thrd by selectng a random sample of the determned sze from the populaton. For purposes of code audtng, the populaton must be selected n accord wth the objectves of the audt. Audt objectves should frst be specfed n terms of the target computerzed or human coder to be audted wth the populaton beng then lmted to codes produced by the target. The populaton may further be stratfed accordng to subcharacterstcs such as partcular provders, procedures, or dagnoses. It may further be necessary to temporally lmt the populaton to some perod when the codes and gudelnes for the audt objectve were unform. Gven a target populaton, the sample sze must be calculated. We accept the sample sze calculaton employed by the Offce of the Inspector General (OIG) and as descrbed and mplemented n the audt tool Rat-Stats as canoncal for code audt purposes. Consderng the Rat-Stats functon for Attrbute Sample Sze selecton, we note that although the calculated sample sze s guaranteed to be mnmal, the confdence nterval may be asymmetrcal around the pont estmate. Wth no harm to the accuracy or valdty of the audt, we use a more basc calculaton of sample sze as gven by many ntroductory statstcs texts and also at the Natonal Insttute of Standards and Technology, resultng n a symmetrc confdence nterval at the possble cost of a slghtly larger sample sze. 2 Specfcaton and Control Lmts. Two sets of performance lmts are defned, the specfcaton lmts and the control lmts. The specfcaton lmts are wth respect to ndvdual components of the producton tems under test. These can be judged as ether correct or ncorrect (pass/fal), and f ncorrect (fal) then, optonally, as ether of consequence or not of consequence. The control lmts are the statstcally defned lmts that ndcate whether the overall codng process under audt s n control or not. For the medcal codng applcaton, only the upper control lmt s of true nterest n that there s no adverse consequence f the process, as measured n terms of proporton of errors, falls below the lower control lmt (n fact that s a good thng and ndcates that the process s performng better than requred or expected). Formulas Sample Sze and Control Lmts The followng defnes the parameters and formulas for selectng an unrestrcted random sample fpc n from a populaton of sze N. Defect number x s recalculated to provde X whch s the defect number modfed to account for the expected subjectvty and error of the audtor accordng to the formula X = x ( CV P fpc n). The ratonal for ths formula s that f the error level of the audtor s CV and
Computer-asssted Audtng for Hgh-Volume Medcal Codng 3 the audtee s expected to make proporton P errors, then the number of correct audtee codes that were ncorrectly judged to be errors by the audtor s CV P fpc n, whch should be subtracted from the raw defect number x. CV s the expected or observed judgment subjectvty/error proporton of the audtor. CL s the desred confdence level as a percent where CL 00 ( CV ) s preferred. Z s the area under the tals of the dstrbuton for the desredcl. H s the half wdth of the desred confdence nterval where H (CV / 2). H ( CV / 2) + 0.005 s preferred. P s the expected audtee proporton of errors. N s the sze of the populaton of documents to be sampled. 2 2 n s the unadjusted sample sze where n = ( Z P ( P)) / H. fpc s the fnte populaton correcton factor where fpc = ( N n) /( N ). fpc n s the fnte populaton adjusted sample sze. x s the observed defect/error number. X s the defect/error number adjusted for the audtor error rate where X = x ( CV P fpc n). e s the sample proporton of defects where e = x / fpc n. E s the adjusted sample proporton of defects where E = X / fpc n. UCL s the upper control lmt where UCL = P + Z P ( P) / fpc n. LCL s the lower control lmt where LCL = P Z P ( P) / fpc n. Specfc matters of gudance n usng the formulae are:. CV, the expected or observed audtor subjectvty and error, conforms to CV 0.03. 2. The half-wdth H of the desred confdence nterval should be greater than CV/ 2, the error proporton of the audtor,.e. no matter how large the sample, we cannot be more confdent of our audt results than we are of our audtor. Increasng the sample sze, whch s the practcal effect of decreasng H, wll not truly mprove precson once H =(CV/ 2). H ( CV / 2) + 0.005 s recommended. 3. CL 00 ( CV ) because, smlar to H, we cannot expect to acheve a confdence level n the audt that s greater than the maxmum accuracy that the audtor can acheve. Formulas Specfcaton Lmts In the CoAudt mplementaton dagnoses and fndngs are coded usng the U.S. Department of Health and Human Servces Internatonal Classfcaton of Dseases, 9th Clncal Modfcaton (ICD-9- CM), and procedures and level of servce are coded usng the Amercan Medcal Assocaton Current Procedural Termnology (CPT). Other codng systems may be substtuted. The followng defnes the core scorng method for the audt codng of ndvdual documents:. Dagnoss and fndngs codes each receve a weght of and are judged as correct or ncorrect (pass/fal). 2. Dagnoss and fndngs codes may further be judged as of consequence or of no consequence. 3. Procedure and level of servce codes each receve a weght of 2 and are judged as correct or ncorrect (pass/fal). 4. The modfer codes assocated wth a procedure or level of servce code each have a weght of and are judged as correct or ncorrect (pass/fal). Note that modfer codes are optonal,
4 Perspectves n Health Informaton Management, CAC Proceedngs; Fall 2006 and so there may correctly be no modfer codes and so no modfer code score for any gven procedure or level of servce code. 5. All modfer code scores are consdered to be of consequence. 6. The relatonal lnks between dagnoss or fndngs codes and procedure or level of servce codes whereby a partcular dagnoss or fndngs code s ndcated as the support for partcular procedure or level of servce code each receve a weght of and are judged as correct or ncorrect (pass/fal). 7. All procedure and level of servce codes must be lnked to at least one dagnoss or fndngs code. 8. Lnks are all judged to be of consequence. 9. The ranked order n whch procedure and level of servce codes appears relatve to other procedure codes and/or the level of servce code receves a weght of and s judged as correct or ncorrect (pass/fal). 0. In the preferred mplementaton, ranked order of the procedure and level of servce codes s always judged to be of consequence.. The unt value of a procedure code receves a weght of and s judged correct or ncorrect (pass/fal). 2. The unt value of a procedure code s always judged to be of consequence. 3. The document score d = 00 ( ModCnt / TotCnt ) 00 where: max( yc, yo) ModCnt = ECPTpos + ECPTcode + ECPTu + ECPTm + ECPTl + = max( zc, zo) EICDcode j j= max( yc, yo) TotCnt = wcptpos ( wcptl = ICDc j + wcptcode max( CPTlc, CPTlo ) + + ( wcptu max( zc, zo) j= wicdcode CPTu ICDc ) + ( wcptm CPTm yc s the number of post-audt procedure and/or level of servce codes n the document zc s the number of post-audt dagnoss and/or fndngs codes n the document yo s the number of pre-audt procedure and/or level of servce codes n the document zo s the number of pre-audt dagnoss and/or fndngs codes n the document CPTu = f procedure code has unts, else 0 CPTm = f procedure code has modfer, else 0 CPTlc = the post-audt number of lnks for the procedure code CPTlo = the pre-audt number of lnks for the procedure code ECPTl s the dfference between the max( CPTlc, CPTlo) and the number of lnks that are dentcal (lnk to the same ICD-9 code) both pre-audt and post-audt ECPTpos = wcptpos f post-audt rank order poston of procedure code pre-audt poston, else 0 ECPTcode = wcptcodef post-audt code pre-audt code, else 0 ECPTu = wcptu f post-audt unt pre-audt unt, else 0 ECPTm = wcptm f post-audt modfer pre-audt modfer, else 0 EICDcode = wicdcode f post-audt code pre-audt code, else 0 wcptpos =, the weght for procedure rank order wcptcode = 2, the weght for a procedure or level of servce code wcptu =, the weght for a procedure unt wcptm =, the weght for a procedure modfer ) +
Computer-asssted Audtng for Hgh-Volume Medcal Codng 5 wcptl =, the weght for a procedure lnk wicdcode =, the weght for a dagnoss or fndngs code ICDc = f the dagnoss or fndngs code audt change s of consequence, else 0 fpc. The sample score n s = d = / fpc n 2. The defect level x = ( s fpc n) /00 Calbratng for Audtor Varablty An ntal CV can be establshed by makng an educated estmate of the audtor s accuracy, but audtors should be perodcally tested to provde a benchmark CV value. Wthout ths calbraton, audt results across tme and between audtors wll not be meanngfully comparable. The objectve of the testng s to track the CV value of each audtor across tme usng standardzed benchmark tests. The benchmark test conssts of a set of coded documents for the audtor to audt. The benchmark test must conform to three prncples:. From one test sesson to the next, a sgnfcant porton of the test (at least 50 percent n the preferred mplementaton) must consst of the same documents wth the same codes as were present on the prevous test. The remanng documents wll be new. In the preferred mplementaton, the order of the documents from test to test wll be randomzed. 2. Over tme, the test documents must be selected so as to reflect the dstrbuton of encounter and document types that coders would be expected to work wth under actual producton condtons. 3. Test sessons must be separated by suffcent tme and test sze must be suffcently large n order that audtors would not reasonably be expected to remember a sgnfcant percentage of ther edts from one test sesson to the next. Audtor scores on the benchmark tests consst of two parts. Frst, determne CV as calculated on the recurrng documents from one test sesson to the next. Second, the relatve varances between audtors who take the same test are calculated and may be used as a cross-check on the ntra-audtor CV varance. Results and Dscusson The CoAudt methodology s desgned to correlate to the qualtatve judgment of human audtors who may judge a codng process, as defned by the sample selecton parameters, to be acceptable, margnally acceptable, or unacceptable. As such, the results of an audt are meanngful prmarly when represented as a tme seres aganst the system control lmts as n an X-bar chart. Because standards of acceptablty may vary wth tme and between organzatons, emprcal tests must be performed perodcally for calbraton purposes, but P = 0., H = 0. 02 and CV = 0. 03 are recommended startng parameters. A process may have sample scores that are consstently n control (acceptable), occasonally out of control (margnally acceptable), or consstently out of control (unacceptable). As a startng pont, monthly audts are recommended wth more than two sample scores out of control n a year beng consdered unacceptable and requrng nterventon to brng the system back n control. Statstcal sgnfcance tests (e.g. χ -square) can be used to measure the effectveness of nterventons. At the tme of wrtng, alpha tests on sx coders ndcate that the ntal objectves have been met, but mnor adjustments are expected as a result of beta testng.
6 Perspectves n Health Informaton Management, CAC Proceedngs; Fall 2006 Danel Henze, PhD, s the Chef Technology Offcer of A-Lfe Medcal, Inc. n San Dego, CA. Peter Feller, MS, s a Senor Software Engneer at A-Lfe Medcal, Inc. n San Dego, CA. Jerry McCorkle, BA, s the Vce Presdent of Clent Servces & Operatons of A-Lfe Medcal, Inc. n San Dego, CA. Mark Morsch, MS, s the Vce Presdent of NLP & Software Engneerng at A-Lfe Medcal, Inc. n San Dego, CA. Notes. Department of Health and Human Servces Offce of Inspector General Offce of Audt Servces. Rat- Stats Companon Manual. September 200. Avalable onlne at http://og.hhs.gov/organzaton/oas/ratstats/ratstatsmanual.pdf, last accessed July 8, 2006. 2. Natonal Insttute of Standards and Technology. Engneerng Statstcs Handbook. June 2003. Avalable onlne at http://www.tl.nst.gov/dv898/handbook/, last accessed July 8, 2006.