Detecting Fraud in Health Insurance Data: Learning to Model Incomplete Benford s Law Distributions
|
|
|
- Nathan Beasley
- 10 years ago
- Views:
Transcription
1 Detecting Fraud in Health Insurance Data: Learning to Model Incomplete Benford s Law Distributions Fletcher Lu 1 and J. Efrim Boritz 2 1 School of Computer Science, University of Waterloo & Canadian Institute of Chartered Accountants, 66 Grace Street, Scarborough, Ontario, Canada, M1J 3K9 [email protected] 2 School of Accountancy, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada, N2L 3G1 [email protected] Abstract. Benford s Law [1] specifies the probabilistic distribution of digits for many commonly occurring phenomena, ideally when we have complete data of the phenomena. We enhance this digital analysis technique with an unsupervised learning method to handle situations where data is incomplete. We apply this method to the detection of fraud and abuse in health insurance claims using real health insurance data. We demonstrate improved precision over the traditional Benford approach in detecting anomalous data indicative of fraud and illustrate some of the challenges to the analysis of healthcare claims fraud. 1 Introduction In this paper we explore a new approach to detecting fraud and abuse by using a digital analysis technique that utilizes an unsupervised learning approach to handle incomplete data. We apply the technique to the application area of healthcare insurance claims. We utilize real health insurance claims data, provided by Manulife Financial, to test our new technique and demonstrate improved precision for detecting possible fraudulent insurance claims. A variety of techniques for detecting fraud have been developed. The most common are supervised learning methods, which train systems on known instances of fraud patterns to then detect these patterns in test data. A less common approach is to have a pattern for non-fraudulent data and then compare the test data to this pattern. Any data that deviates significantly from the non-fraudulent pattern could be indicative of possible fraud [2]. The difficulty with the latter approach for fraud detection is in obtaining a pattern that one is confident is free of fraud. Digital analysis is an approach which addresses this difficulty. Benford s Law [1] is one digital analysis technique that specifies a model of non-fraudulent data that test data may be compared against. In 1938 Frank Benford demonstrated that for many naturally occurring phenomena, the frequency of occurrences of digits within recorded data follows a certain logarithmic probability distribution (a Benford distribution). This Benford s Law can only be
2 applied to complete recorded data. However, incomplete records are very common. We introduce an algorithm to detect and adjust the distribution to take into account missing data. By doing so, we allow for true anomalies such as those due to fraud and abuse to be more accurately detected. In this paper, we consider the situation where data is contiguously recorded so that the only missing data are due to cutoffs below and/or above some thresholds. Our algorithm, which we will call Adaptive Benford, adjusts its distribution of digit frequencies to account for any missing data cutoffs and produces a threshold cutoff for various ranges of digits. Our algorithm then uses those learned values to analyse test data. We return any digits exceeding a learned set of threshold bounds. We apply our Adaptive Benford algorithm to the analysis of real healthcare insurance claims data provided by Manulife Financial. The data is a list of health, dental and drug insurance reimbursement claims for a three year period covering a single company s group benefits plan with all personal information removed. 2 Background 2.1 Benford s Law and Fraud Detection As Benford s Law is a probability distribution with strong relevance to accounting fraud, much of the research on Benford s Law has been in areas of statistics [3, 4] as well as auditing [5, 2]. The first machine learning related implementation was done by Bruce Busta & Randy Weinberg [6]. The significant advantage of using the digital analysis approach over previous supervised learning methods for fraud detection is that we are not restricted to already known instances of fraud [7, 8]. By looking for anomalies that deviate from the expected Benford distribution that a data set should follow, we may discover possible new fraud cases. 2.2 Digit Probabilities Benford s Law is a mathematical formula that specifies the probability of leading digit sequences appearing in a set of data. What we mean by leading digit sequences is best illustrated through an example. Consider the set of data S = {231, 432, 1, 23, 634, 23, 1, 634, 2, 23, 34, 1232}. There are twelve data entries in set S. The digit sequence 23 appears as a leading digit sequence (i.e. in the first and second position) 4 times. Therefore, the probability of the first two digits being 23 is The probability is computed out of 9 because only 9 entries have at least 2 digit positions. Entries with less than the number of digits being analysed are not included in the probability computation. The actual mathematical formula of Benford s law is: P(D = d) = log 10 (1 + 1 ), (1) d
3 where P(D = d) is the probability of observing the digit sequence d in the first y digits and where d is a sequence of y digits. For instance, Benford s Law would state that the probability that the first digit in a data set is 3 would be log 10 ( ). Similarly, the probability that the first 3 digits of the data set are 238, would be log 10 ( ). The numbers 238 and would be instances of the first three digits being 238. However this probability would not include the occurrence 3238, as 238 is not the first three digits in this instance. 2.3 Benford s Law Requirements In order to apply equation 1 as a test for a data set s digit frequencies, Benford s Law requires that: 1. The entries in a data set should record values of similar phenomena. In other words, the recorded data cannot include entries from two different phenomena such as both census population records and dental measurements. 2. There should be no built-in minimum or maximum values in the data set. In other words, the records for the phenomena must be complete, with no artificial start value or ending cutoff value. 3. The data set should not be made up of assigned numbers, such as phone numbers. 4. The data set should have more small value entries than large value entries. Further details on these rules may be found in [9]. Under these conditions, Benford noted that the data for such sets, when placed in ascending order, often follows a geometric growth pattern. 3 Under such a situation, equation 1 specifies the probability of observing specific leading digit sequences for such a data set. The intuitive reasoning behind the geometric growth of Benford s Law is based on the notion that for low values it takes more time for some event to increase by 100% from 1 to 2 than it does to increase by 50% from 2 to 3. Thus, when recording numerical information at regular intervals, one often observes low digits much more frequently than higher digits, usually decreasing geometrically. 3 Adaptive Benford As section 2.3 specifies, one of the requirements to be able to apply Benford s law is that there are no built-in minimum or maximum values. However, often data is only partially observed, such as when only a single month of expenses are reported. Adaptive Benford s Law has been designed to handle such missing data situations. 3.1 Missing Data Inflating The problem with traditional Benford s Law and incomplete data is that the frequency of the digits that are observed become inflated when computed as a probability. For instance, Benford s Law states that in a data set, a first digit of 4 should occur with probability log 10 ( ) Suppose with complete data, out of 100 observations, 4 3 Note: The actual data does not have to be recorded in ascending order. This ordering is merely an illustrative tool to understand the intuitive reasoning for Benford s law.
4 appeared as a first digit 10 times, which closely approximates the Benford probability. However if the data set is incomplete with only 50 observations recorded, but all 10 occurrences of first digit 4 are still recorded, then we get a probability of 10/50 = 0.20, essentially inflating the probability of digits that are observed higher due to the missing digits not being included in the total count for the probability computation. 3.2 Algorithm Under the condition that we are aware that the observed data follows a Benford distribution and is contiguous, if we are missing data only above or below an observed cutoff, we can use this knowledge to artificially build the missing data. First, let d be a leading digit sequence of length i. f d,observed be the frequency that the leading digit sequence d occurs in the data set, and P(D i ) be the Benford probability for digit sequence D i, where D i is any digit sequence of length i. Now, consider that to compare the actual frequency of occurrence of a leading digit sequence d to the actual Benford probability we would compute the ratio: f d,observed D i f Di,observed P(D i = d), (2) where the denominator is summed over all digit sequences of the same length as digit sequence d, in other words over all digits of length i. Let C i = D i f Di,observed. (3) Then equation 2 can be rearranged to f d,observed P(D i = d) D i f Di,observed = C i, (4) C i is essentially a constant scaling factor for all digit sequences of the same length i. If there are missing digit sequences in our observed data due to cutoff thresholds, we can compute C i using the observed digit sequences, since those sequences that do still appear should still follow the Benford s Law probabilities. In order to produce a best fit for the missing data, we average over all possible C i values for a given digit sequence length i. Therefore, let C i = f d,observed /P(D i = d) digit sequences of length i. (5) This scaling factor C i will be used to fill-in the missing data of our Benford distribution.
5 As an example, C 2 would be the averaged constant scaling factor over all first two digit frequencies. We use C i to multiply our Benford probabilities for the digit sequences of length i and use that as a benchmark to compare the frequencies of the actual observed data against. Appendix A illustrates the Adaptive Benford algorithm. The major steps of the Adaptive Benford algorithm are: 1. Compute the C i constant values for various leading digit sequence lengths. 2. Compute artificial Benford frequencies for the digit sequence lengths. 3. Compute a standard deviation for each of the sequence lengths. 4. Flag any digit sequences in the recorded data that deviate more than an upper bound number of standard deviations from the artificial Benford frequencies. We compute the artificial Benford frequencies as follows: f d,expected = C i log 10 (1 + 1 ). (6) d We scale up to actual frequencies in contrast to dividing by a sum total of observed instances that would produce probabilities. By doing so, we avoid the inflating effect we noted in section 3.1. We may compute a variance against observed data by: σ 2 expected,i = 1 n i D i (f Di,observed f Di,expected) 2, (7) where n i is the number of different digit sequences of size i. We compute an upper bound U i based on a number of standard deviations from the artificial Benford frequencies. We use this upper bound to determine if the observed data deviates enough to be considered anomalous and potentially indicative of fraud or abuse. 4 Experiments For the purposes of our experiments, we will analyse up to the first 3 leading digit sequences. The choice of digit sequence lengths to be analysed is dependent on the data set s entries (the digit lengths of the data set s elements as well as the number of elements in the data set). For a further discussion on choice of digit sequence length see [10]. 4.1 Census Data Tests As an initial test of our system we use as a test database the year 1990 population census data for municipalities in the United States, which has been analysed previously by Nigrini [9] and been verified to follow a Benford Law distribution. Table 1 records the amount of conformity for complete and incomplete census data whereby we measure conformity as the percentage of digit sequences that fall within ±2 standard deviations of the Benford estimate value out of the total number of digit
6 Table 1. Census Data: Percentage of Digits within ± two standard deviations of Benford and Adaptive Benford distributions. Range of Values x 10 4 Data Classic Adaptive Size Benford Benford Complete Census % 94.9% 100,000-1,100, % 89.4% 200,000-1,200, % 96.9% 300,000-1,300, % 94.4% 400,000-1,400, % 91.7% 500,000-1,500, % 87.2% 600,000-1,600, % 84.0% 700,000-1,700, % 82.4% 800,000-1,800, % 82.7% 900,000-1,900, % 73.5% sequences that had non-zero frequency. 4 We modified the 1990 census data to include sets of various population ranges of municipalities. Notice that for the range 100,000-1,100,000 all leading digit sequences may start with any of 1,2,...,9. However, for the range 900,000-1,900,000, the only possible leading digits start with 9 or 1. With fewer possible leading digit sequences, the inflating effect mentioned in section 3.1 becomes more likely, resulting in lower conformity as the population range shifts higher. The Adaptive Benford, which compensates for the cutoff data, produces higher conformity values, ranging from 73.5% to 96.9%. 4.2 Health Insurance Data Tests We now analyse health insurance claims data covering general health, dental and drug claims for financial reimbursement to Manulife Financial covering a single company s group benefits plan for its employees from 2003 to With recorded data before 2003 cutoff, we expect inflated percentages of anomalous digits with traditional Benford compared with our Adaptive Benford method. Our goal is to detect anomalies in our data sets that may be indicative of fraud activity. Table 2 reports the percentage of anomalous digit sequences for the insurance database for various data sets. As expected, by handling missing data ranges due to cutoff levels, Adaptive Benford can be more precise, reporting fewer actual anomalous digit sequences then traditional Benford. We used a 95% upper bound confidence interval as our anomaly threshold. The main advantage of our Adaptive Benford algorithm over traditional Benford for fraud detection is its improved precision for detecting anomalies. The goal, once anomalous digit sequences have been identified, is then to determine the data entries that are causing the high amounts of anomalous digit sequences. A forensic auditor 4 The two standard deviations should cover approximately 95% of the digit sequences if the data conforms to Benford s distributions. The standard deviation used for all tests of table 1 are computed using the complete 1990 census data.
7 Table 2. Health Insurance Fraud Detection: Comparing Traditional and Adaptive Benford against percentage of anomalous digit sequences. Data Set Description Data Size Classic Adaptive Benford Benford 1 Misc. Dental Charges % 11.48% 2 Expenses Submitted by Provided 31, % 3.44% 3 Submitted Drug Costs 6, % 10.40% 4 Submitted Dispensing Fee 7, % 9.09% 5 Excluded Expenses 1, % 5.19% 6 Expenses after deductibles 29, % 3.13% 7 Coinsurance reductions 3, % 6.19% 8 Benefit Coordination Reductions % 6.06% 9 Net Amount Reimbursed 28, % 3.70% may then, for instance, decide whether these entries are likely cases of fraud or abuse. Making such decisions is often a qualitative judgement call dependent often on factors related to the specific application area. Anomalies, in some cases, may be due to odd accounting or data entry practices that are not actual instances of fraud. We therefore have avoided here labeling our reported anomalies as actual fraud. Instead, we emphasize that these digit sequence anomalies are to be used as a tool to indicate possible fraud. 5 Discussion & Conclusions In contrast to typical supervised learning methods which will train on known fraud instances, our Adaptive Benford algorithm models the data to an expected non-fraudulent Benford data pattern and any large anomalies are reported as possible fraud. 5 Our Adaptive Benford algorithm allows us to analyse data even when the data is partially incomplete. We made such an analysis with incomplete health insurance data, which only included data for a three year period. Our Adaptive Benford algorithm reports fewer anomalous digit sequences, avoiding the transient effect due to artificial cutoff start and end points for recorded data. This produces a more precise set of anomalous leading digit sequences than traditional Benford for forensic auditors to analyse for fraud. In effect, our Adaptive Benford algorithm removes requirement 2 of the rules specified in section 2.3 needed for Benford s Law to be applied. Our Adaptive Benford algorithm therefore expands the areas where Benford s Law may be applied. 5 This modeling is under the pre-condition that, if we had a complete data set without fraud activity, it should follow a Benford distribution. (i.e. The complete, non-fraudulent, data set satisfies the requirements of section 2.3.
8 Acknowledgments We would like to thank Manulife Financial for providing the insurance data and Mark Nigrini for providing the census data. We would also like to thank the Canadian Institute of Chartered Accountants and the Natural Sciences and Engineer Research Council (NSERC) for providing funding. A Adaptive Benford Algorithm Let S be the observed testing data Let U i be an upper bound on the number of standard deviations For digit sequences d = 1,2,3,...,Upperbound: f d,observed = number of times digit sequence d appears as a leading digit sequence in S For i = 1...Upperbound on digit length: Let n i be number of digits of length i that appeared at least once in S Compute over all digit sequences D i of length i: Ĉ i = 1 f Di,observed n i D i log 10 (1+ D 1 ) i Compute for each digit sequence d of length i: f d,expected = Ĉi log 10 (1 + 1 ) d Compute over all digit sequences D i of length i: ˆσ i 2 = 1 n i (f D Di,observed f Di i,expected) 2 Compute for each digit sequence d of length i: if U i < f d,observed f d,expected σ i then store d as anomalous References 1. Benford, F.: The Law of Anomalous Numbers. In: Proceedings of the American Philosophical Society. (1938) Crowder, N.: Fraud Detection Techniques. Internal Auditor April (1997) Pinkham, R.S.: On the Distribution of First Significant Digits. Annals of Mathematical Statistics 32 (1961) Hill, T.P.: A Statistical Derivation of the Significant-Digit Law. Statistical Science 4 (1996) Carslaw, C.A.: Anomalies in Income Numbers: Evidence of Goal Oriented Behaviour. The Accounting Review 63 (1988) Busta, B., Weinberg, R.: Using Benford s Law and neural networks as a review procedure. In: Managerial Auditing Journal. (1998) Fawcett, T.: AI Approaches to Fraud Detection & Risk Management. Technical Report WS-97-07, AAAI Workshop: Technical Report (1997) 8. Bolton, R.J., Hand, D.J.: Statistical Fraud Detection: A Review. Statistical Science 17(3) (1999) Nigrini, M.J.: Digital Analysis Using Benford s Law. Global Audit Publications, Vancouver, B.C., Canada (2000) 10. Nigrini, M.J., Mittermaier, L.J.: The Use of Benford s Law as an Aid in Analytical Procedures. In: Auditing: A Journal of Practice and Theory. Volume 16(2). (1997) 52 67
Adaptive Fraud Detection Using Benford s Law
Adaptive Fraud Detection Using Benford s Law Fletcher Lu 1, J. Efrim Boritz 2, and Dominic Covvey 2 1 Canadian Institute of Chartered Accountants, 66 Grace Street, Scarborough, Ontario, M1J 3K9 [email protected]
A Guide to Benford s Law. A CaseWare IDEA Research Department Document
A Guide to Benford s Law A CaseWare IDEA Research Department Document January 30, 2007 CaseWare IDEA Inc. is a privately-held software development and marketing company, with offices in Toronto and Ottawa,
Benford s Law and Digital Frequency Analysis
Get M.A.D. with the Numbers! Moving Benford s Law from Art to Science BY DAVID G. BANKS, CFE, CIA September/October 2000 Until recently, using Benford s Law was as much of an art as a science. Fraud examiners
The Effective Use of Benford s Law to Assist in Detecting Fraud in Accounting Data
Journal of Forensic Accounting 1524-5586/Vol.V(2004), pp. 17-34 2004 R.T. Edwards, Inc. Printed in U.S.A. The Effective Use of Benford s Law to Assist in Detecting Fraud in Accounting Data Cindy Durtschi
Invoice Number Vendor Number Amount 129304 A543891 $1,035.71 129304 A543891 $1,035.71
Fraud Detection: Using Data Analysis Techniques A new approach being used for fraud prevention and detection involves the examination of patterns in the actual data. The rationale is that unexpected patterns
2.1 The Present Value of an Annuity
2.1 The Present Value of an Annuity One example of a fixed annuity is an agreement to pay someone a fixed amount x for N periods (commonly months or years), e.g. a fixed pension It is assumed that the
THE IMPACT AND REALITY OF FRAUD AUDITING BENFORD S LAW: WHY AND HOW TO USE IT
THE IMPACT AND REALITY OF FRAUD AUDITING BENFORD S LAW: WHY AND HOW TO USE IT Benford's Law is used to find abnormalities in large data sets. Examples of the diversity of data sets include currency amounts,
THE FIRST-DIGIT PHENOMENON
THE FIRST-DIGIT PHENOMENON T. P. Hill A century-old observation concerning an unexpected pattern in the first digits of many tables of numerical data has recently been discovered to also hold for the stock
Performing Audit Procedures in Response to Assessed Risks and Evaluating the Audit Evidence Obtained
Performing Audit Procedures in Response to Assessed Risks 1781 AU Section 318 Performing Audit Procedures in Response to Assessed Risks and Evaluating the Audit Evidence Obtained (Supersedes SAS No. 55.)
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
Utility-Based Fraud Detection
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Utility-Based Fraud Detection Luis Torgo and Elsa Lopes Fac. of Sciences / LIAAD-INESC Porto LA University of
Qi Liu Rutgers Business School ISACA New York 2013
Qi Liu Rutgers Business School ISACA New York 2013 1 What is Audit Analytics The use of data analysis technology in Auditing. Audit analytics is the process of identifying, gathering, validating, analyzing,
Benford s Law and Fraud Detection, or: Why the IRS Should Care About Number Theory!
Benford s Law and Fraud Detection, or: Why the IRS Should Care About Number Theory! Steven J Miller Williams College [email protected] http://www.williams.edu/go/math/sjmiller/ Bronfman Science
Cheating behavior and the Benford s Law Professor Dominique Geyer, (E-mail : [email protected]) Nantes Graduate School of Management, France
Cheating behavior and the Benford s Law Professor Dominique Geyer, (Email : [email protected]) Nantes Graduate School of Management, France Abstract Previous studies (Carslaw, ; Thomas, ; Niskanen &
EC3070 FINANCIAL DERIVATIVES. Exercise 1
EC3070 FINANCIAL DERIVATIVES Exercise 1 1. A credit card company charges an annual interest rate of 15%, which is effective only if the interest on the outstanding debts is paid in monthly instalments.
Statistical estimation using confidence intervals
0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena
APPLICATIONS AND MODELING WITH QUADRATIC EQUATIONS
APPLICATIONS AND MODELING WITH QUADRATIC EQUATIONS Now that we are starting to feel comfortable with the factoring process, the question becomes what do we use factoring to do? There are a variety of classic
BINOMIAL OPTIONS PRICING MODEL. Mark Ioffe. Abstract
BINOMIAL OPTIONS PRICING MODEL Mark Ioffe Abstract Binomial option pricing model is a widespread numerical method of calculating price of American options. In terms of applied mathematics this is simple
Dan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector
Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication
Forensic Analytics and Employee Fraud
Chapter Meeting Announcement Forensic Analytics and Employee Fraud Presented by Mark J. Nigrini, Ph.D. DATE: Friday, April 24, 2015 PLACE: Education Service Center Region 19 6611 Boeing Drive El Paso,
What s Wrong with Multiplying by the Square Root of Twelve
What s Wrong with Multiplying by the Square Root of Twelve Paul D. Kaplan, Ph.D., CFA Director of Research Morningstar Canada January 2013 2013 Morningstar, Inc. All rights reserved. The information in
Datamining. Gabriel Bacq CNAMTS
Datamining Gabriel Bacq CNAMTS In a few words DCCRF uses two ways to detect fraud cases: one which is fully implemented and another one which is experimented: 1. Database queries (fully implemented) Example:
Ohio Medicaid Program
Ohio Medicaid Program A Compliance Audit by the: Medicaid/Contract Audit Section September 2011 AOS/MCA-12-005C September 29, 2011 Michael Linville, LPN 4932 Lebanon Rd. South Lebanon, OH 45065 Dear Mr.
Forensic Analytics and Employee Fraud. Presented by. Mark J. Nigrini Ph.D. Professor, West Virginia University
The Charleston Chapter of the ACFE 2015 Training Day Date: May 8, 2015 Time: 8:00 4:30 Forensic Analytics and Employee Fraud Presented by Mark J. Nigrini Ph.D Professor, West Virginia University Location:
Benford s Law: Tables of Logarithms, Tax Cheats, and The Leading Digit Phenomenon
Benford s Law: Tables of Logarithms, Tax Cheats, and The Leading Digit Phenomenon Michelle Manes ([email protected]) 9 September, 2008 History (1881) Simon Newcomb publishes Note on the frequency
INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS
INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING (Effective for audits of financial statements for periods beginning on or after December 15, 2004) CONTENTS Paragraph Introduction...
Means, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
EDUCATION AND EXAMINATION COMMITTEE SOCIETY OF ACTUARIES RISK AND INSURANCE. Copyright 2005 by the Society of Actuaries
EDUCATION AND EXAMINATION COMMITTEE OF THE SOCIET OF ACTUARIES RISK AND INSURANCE by Judy Feldman Anderson, FSA and Robert L. Brown, FSA Copyright 25 by the Society of Actuaries The Education and Examination
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Digital analysis: Theory and applications in auditing
Digital analysis: Theory and applications in auditing Tamás Lolbert, auditor, State Audit Office of Hungary E-mail: [email protected] Newcomb in 1881 and Benford in 1938 independently introduced a phenomenon,
A Study of a Health Enterprise Information System, Part 6 - Coalescing the Analysis of the ER Diagrams, Relational Schemata and Data Tables
A Study of a Health Enterprise Information System, Part 6 - Coalescing the Analysis of the ER Diagrams, Relational Schemata and Data Tables Jon Patrick, PhD Health Information Technology Research Laboratory
Northumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
U S I N G D A T A A N A L Y S I S T O M E E T T H E R E Q U I R E M E N T S O F R I S K B A S E D A U D I T I N G S T A N D A R D S
U S I N G D A T A A N A L Y S I S T O M E E T T H E R E Q U I R E M E N T S O F R I S K B A S E D A U D I T I N G S T A N D A R D S A C a s e W a r e I D E A R e s e a r c h R e p o r t CaseWare IDEA Inc.
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
Review of Basic Options Concepts and Terminology
Review of Basic Options Concepts and Terminology March 24, 2005 1 Introduction The purchase of an options contract gives the buyer the right to buy call options contract or sell put options contract some
Triage in Forensic Accounting using Zipf s Law
Triage in Forensic Accounting using Zipf s Law Adeola Odueke & George R. S. Weir 1 Department of Computer and Information Sciences, University of Strathclyde, Glasgow G1 1 XH, UK [email protected]
Audit Evidence. AU Section 326. Introduction. Concept of Audit Evidence AU 326.03
Audit Evidence 1859 AU Section 326 Audit Evidence (Supersedes SAS No. 31.) Source: SAS No. 106. See section 9326 for interpretations of this section. Effective for audits of financial statements for periods
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
Safety Analysis for Nuclear Power Plants
Regulatory Document Safety Analysis for Nuclear Power Plants February 2008 CNSC REGULATORY DOCUMENTS The Canadian Nuclear Safety Commission (CNSC) develops regulatory documents under the authority of paragraphs
Analysis of Fire Statistics of China: Fire Frequency and Fatalities in Fires
Analysis of Fire Statistics of China: Fire Frequency and Fatalities in Fires FULIANG WANG, SHOUXIANG LU, and CHANGHAI LI State Key Laboratory of Fire Science University of Science and Technology of China
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Management/Non-union Employees, Accountability Officers and Elected Officials Compensation and Benefits
EX14.7 STAFF REPORT ACTION REQUIRED Management/Non-union Employees, Accountability Officers and Elected Officials Compensation and Benefits Date: March 31, 2016 To: From: Wards: Employee & Labour Relations
Table of Contents. Data Analysis Then & Now 1. Changing of the Guard 2. New Generation 4. Core Data Analysis Tasks 6
Table of Contents Data Analysis Then & Now 1 Changing of the Guard 2 New Generation 4 Core Data Analysis Tasks 6 Data Analysis Then & Now Spreadsheets remain one of the most popular applications for auditing
A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
GUIDE TO THE HEALTH FACILITY DATA QUALITY REPORT CARD
GUIDE TO THE HEALTH FACILITY DATA QUALITY REPORT CARD Introduction No health data is perfect and there is no one definition of data quality.. No health data from any source can be considered perfect: all
FRAUD PREVENTION STRATEGY FOR UGU DISTRICT MUNICIPALITY (UGU)
FRAUD PREVENTION STRATEGY FOR UGU DISTRICT MUNICIPALITY (UGU) CONTENTS 1. Introduction.. 3 2. Characteristics of Fraud.. 5 3. Fraud Strategy..... 6 4. Building the Fraud Prevention Plan........ 8 Fraud
Life Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
Data classification methods in GIS The most common methods
University of Thessaly, Department of Planning and Regional Development Master Franco Hellenique POpulation, DEveloppement, PROspective Volos, 2013 Data classification methods in GIS The most common methods
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected].
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected] This paper contains a collection of 31 theorems, lemmas,
Standard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS
INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS Paragraph Introduction... 1-2 Definitions... 3-12 Audit Evidence... 13-17 Risk Considerations
Using Data Analytics to Detect Fraud
Using Data Analytics to Detect Fraud Fundamental Data Analysis Techniques 2016 Association of Certified Fraud Examiners, Inc. Discussion Question For each data analysis technique discussed in this section,
Accounting, M.S. About the Program. Admission Requirements and Deadlines. Program Requirements. Contacts. Department Web Address:
Accounting, M.S. 1 Accounting, M.S. FOX SCHOOL OF BUSINESS AND MANAGEMENT (http://www.fox.temple.edu) About the Program Areas of Specialization: An optional concentration in Corporate Accounting is offered.
The Difficulty of Faking Data Theodore P. Hill
A century-old observation concerning the distribution of significant digits is now being used to help detect fraud. The Difficulty of Faking Data Theodore P. Hill Most people have preconceived notions
UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS
TW72 UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS MODULE NO: EEM7010 Date: Monday 11 th January 2016
Package benford.analysis
Type Package Package benford.analysis November 17, 2015 Title Benford Analysis for Data Validation and Forensic Analytics Version 0.1.3 Author Carlos Cinelli Maintainer Carlos Cinelli
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
4/1/2017. PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY
PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY 1 Oh the things you should learn How to recognize and write arithmetic sequences
Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG
Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG MACPA Government & Non Profit Conference April 26, 2013 Isaiah Goodall, Director of Business
Data analysis for Internal Audit
Data analysis for Internal Audit What is Data Analytics Analytics is the process of obtaining an optimal or realistic decision based on existing data. Wikipedia Data analytics is the science of examining
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Introductions, Course Outline, and Other Administration Issues. Ed Ferrara, MSIA, CISSP [email protected]. Copyright 2015 Edward S.
MIS 520 Week 2 Fraud Detection & Prevention Introductions, Course Outline, and Other Administration Issues Ed Ferrara, MSIA, CISSP [email protected] Fraud Awareness & Internal Controls Awareness Internal
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Webinar: PCAOB Inspections of Small Firm Broker-Dealer Auditors. January 15, 2015
Webinar: PCAOB Inspections of Small Firm Broker-Dealer Auditors January 15, 2015 Introductory Remarks Mary Sjoquist, Director Office of Outreach and Small Business Liaison Caveat The views we express today
How to Calculate your Portfolio s Rate of Return A guide to better understanding and calculating various portfolio returns
How to Calculate your Portfolio s Rate of Return A guide to better understanding and calculating various portfolio returns Justin Bender, CFP Investment Advisor PWL CAPITAL Toronto, Ontario April 2011
1.2. Successive Differences
1. An Application of Inductive Reasoning: Number Patterns In the previous section we introduced inductive reasoning, and we showed how it can be applied in predicting what comes next in a list of numbers
Using Technology to Automate Fraud Detection Within Key Business Process Areas
Using Technology to Automate Fraud Detection Within Key Business Process Areas 2013 ACFE Canadian Fraud Conference September 10, 2013 John Verver, CA, CISA, CMA Vice President, Strategy ACL Services Ltd
5178 EOI Direct Digital Controls Systems Strategic Plan Schedule A. Technical Submission. City of Richmond
Schedule A Technical Submission City of Richmond TABLE OF CONTENTS... 3 1 Company information... 4 1.1 Service Organization... 4 1.2 Construction Organization... 4 2 Programming Language... 5 2.1 Example
Benford s Law: Can It Be Used to Detect Irregularities in First Party Automobile Insurance Claims?
Benford s Law: Can It Be Used to Detect Irregularities in First Party Automobile Insurance Claims? Scott Daniel Petucci Utica Mutual Insurance Group Abstract The research described in this article demonstrates
INTERNATIONAL STANDARD ON AUDITING 320 AUDIT MATERIALITY CONTENTS
INTERNATIONAL STANDARD ON AUDITING 320 AUDIT MATERIALITY (This Standard is effective, but contains conforming amendments that become effective at a future date) * CONTENTS Paragraph Introduction... 1-3
In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.
7.0 - Chapter Introduction In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. Basic Improvement Curve Concept. You may have learned about improvement
Anomaly detection. Problem motivation. Machine Learning
Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions
IMPLEMENTATION NOTE Subject: Category: Capital No: A-1 Date: January 2006 I. Introduction The term rating system comprises all of the methods, processes, controls, data collection and IT systems that support
Lesson 4 Annuities: The Mathematics of Regular Payments
Lesson 4 Annuities: The Mathematics of Regular Payments Introduction An annuity is a sequence of equal, periodic payments where each payment receives compound interest. One example of an annuity is a Christmas
An Auditor s Guide to Data Analytics
An Auditor s Guide to Data Analytics Natasha DeKroon, Duke University Health System Brian Karp Services Experis, Risk Advisory May 11, 2013 1 Today s Agenda Data Analytics the Basics Tools of the Trade
CHAPTER 8 SPECIALIZED AUDIT TOOLS: SAMPLING AND GENERALIZED AUDIT SOFTWARE
A U D I T I N G A RISK-BASED APPROACH TO CONDUCTING A QUALITY AUDIT 9 th Edition Karla M. Johnstone Audrey A. Gramling Larry E. Rittenberg CHAPTER 8 SPECIALIZED AUDIT TOOLS: SAMPLING AND GENERALIZED AUDIT
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Audit Sampling. AU Section 350 AU 350.05
Audit Sampling 2067 AU Section 350 Audit Sampling (Supersedes SAS No. 1, sections 320A and 320B.) Source: SAS No. 39; SAS No. 43; SAS No. 45; SAS No. 111. See section 9350 for interpretations of this section.
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
I remember that when I
8. Airthmetic and Geometric Sequences 45 8. ARITHMETIC AND GEOMETRIC SEQUENCES Whenever you tell me that mathematics is just a human invention like the game of chess I would like to believe you. But I
Chapter 14 Managing Operational Risks with Bayesian Networks
Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian
