TECHNOLOGY YOU CAN USE AGAINST THOSE WHO USE TECHNOLOGY FRAUD ANALYTICS: TAKING DATA ANALYSIS TO THE NEXT LEVEL With the large volumes of data handled by organizations today, the ability to analyze this data as part of fraud investigations has become essential. Gain better knowledge of the various analysis techniques that can be applied to these datasets and how to leverage the data available to you in conducting your investigation. JANET MCHARD, CFE, CPA, CFFA, CFF Founding Partner McHard Accounting Consulting, LLC Albuquerque, New Mexico Janet M. McHard is the Founding Partner of McHard Accounting Consulting, LLC, a firm specializing in forensic accounting, fraud prevention, and accounting reconstruction. Ms. McHard is a Certified Fraud Examiner (CFE), a certification bestowed upon examination by the Association of Certified Fraud Examiners (ACFE). She holds a CPA in the State of New Mexico. She is also certified by the National Association of Certified Valuation Analyst (NACVA) as a Certified Forensic Financial Analyst (CFFA). She is certified in Financial Forensics by the American Institute of Certified Public Accountants, a designation awarded based on education and experience. Ms. McHard has received special training in fraud prevention and investigation from the ACFE, the National Association of Certified Valuation Analysts, and through the University of New Mexico s Financial Investigators Certificate Program. Ms. McHard provides assistance, including expert testimony, in the areas of fraud and forensic accounting. She also has experience in database management and class action administration. Previously, she was a senior manager at the regional full-service accounting firm Meyners + Company, located in Albuquerque. She was also a staff accountant and litigation support specialist with an international accounting and consulting firm. Her background also includes work as a legal secretary and administrative assistant for a law firm and business manager for a medical office. Janet M. McHard holds a Bachelor of Arts from the University of New Mexico as well as an M.B.A. from the University of New Mexico s Robert O. Anderson Graduate School of Management. She is a member of the National Board of Advisors and past president of the Board of Directors of Keshet Dance Company. She was previously a board member of the Albuquerque Softball/Baseball Hall of Fame and has volunteered her time with the American Softball Association and the New Mexico United States Specialty Sports Association. She is a member of Women in Leadership of the United Way of Central New Mexico. Association of Certified Fraud Examiners, Certified Fraud Examiner, CFE, ACFE, and the ACFE Logo are trademarks owned by the Association of Certified Fraud Examiners, Inc. The contents of this paper may not be transmitted, re-published, modified, reproduced, distributed, copied, or sold without the prior consent of the author. 2012
The Need for a Formal Process Although the crux of data analysis involves running targeted tests on data to identify anomalies, the ability of such tests to help detect fraud depends greatly on what the examiner does before and after actually performing the data analysis techniques. Consequently, to ensure the most accurate and meaningful results, a formal data analysis process should be applied that begins several steps before the tests are run and concludes with active and ongoing review of the data. While the specific process will vary based on the realities and needs of the organization, the following approach contains steps that should be considered and implemented, to the appropriate extent, in each data analysis engagement: 1. Planning phase: a. Understand the data. b. Articulate examination objectives. c. Build a profile of potential frauds. 2. Preparation phase: a. Identify the relevant data. b. Obtain the data. c. Verify the data. d. Cleanse and transform the data. 3. Testing and interpretation phase: a. Analyze the data. 4. Post-analysis phase: a. Respond to the analysis findings. b. Monitor the data. Planning Phase As with most tasks, proper planning is essential in a data analysis engagement. Without sufficient time and attention devoted to planning early on, the examiner risks analyzing the data inefficiently, lacking focus or direction for the 2012 1
engagement, running into avoidable technical difficulties, and possibly overlooking key areas for exploration. Understanding the Data As a first step long before determining which tests to run the examiner must know what data is available to be analyzed and how that data is structured. This might mean reviewing the database schema and technical documentation and consulting with the data administrator to learn what fields and records exist and in which tables the information is stored. The examiner will also need to know how the tables are linked together. Helpful tools in getting a full view of the organization s data include a data inventory and a data map. Such documents can be used to identify and record information about the data source, file formats, business process owners, and key systems used as they relate to specific projects. This documentation provides a good starting point for documenting the engagement history and for managing the data analytics process. Understanding the structure of the existing data will not only help ensure that the examiner builds workable tests to be run on the data, but might also help identify additional areas for exploration that might otherwise have been overlooked. Articulate Examination Objectives and Scope The amount of data housed by most organizations is extremely voluminous; many companies process millions of transactions every single day. Going through every piece of data would be impossible. And, although data analysis techniques can greatly increase investigative efficiency, performing every possible 2012 2
analysis of the data would be prohibitively timeconsuming. Consequently, at the outset, the examiner must define and articulate the objectives and scope of the expected analysis. This includes considering: The impetus for the data analysis engagement The structure and size of the business The target area of examination, if restricted The resources (time, personnel, etc.) available for the engagement Whether any predication exists that a particular fraud is occurring Any existing thresholds or preferences for frauds to be considered material Build a Profile of Potential Frauds To maximize the potential success of detecting fraud through data analysis, the analyses performed should be based on an understanding of the entity s existing fraud risks. To do so, the examiner must first build a profile of potential frauds by identifying the organization s risk areas, the types of frauds possible in those risk areas, and the resulting exposure to those frauds. In organizations that have a formal fraud risk assessment process, this step should simply involve referring to the outcomes of that process and forming a fraud detection approach based on the risks identified. In organizations lacking such a process, however, this step can be quite time-consuming, as it involves gaining a sufficient understanding of organizational operations to identify the full spectrum of fraud scenarios possible within the company. Preparation Phase The results of a data analysis test will only be as good as the data used for the analysis. Thus, before running tests on 2012 3
the data, the fraud examiner must make certain the data being analyzed are relevant and reliable for the objective of the engagement. Identify the Relevant Data Using the profile of potential frauds as a guide, the fraud examiner must identify the target data for analysis. Specifically, for each specific fraud scenario assessed to be a high risk to the organization, the examiner should determine which data fields and records would be affected by such a scheme. The examiner must then identify the logistics involved with obtaining this information, including: What specific data (i.e., fields, records) is available Who generates and maintains the data Where the data is stored Timing of the data extraction (e.g., date range, cutoff dates/times) How the examiner will receive and store the data Data format Storage/transfer mechanism Control totals needed for verification Potential corroborating sources of data Obtain the Data The examiner should prepare and submit a formal request for the desired data, outlining the specifics determined in the previous step. Depending on the objectives of the engagement and the operations of the organization, the examiner might be given a file containing the data to work with, or he might be provided read-only access to the data within the organization s information system. 2012 4
This step in the process can be particularly challenging. In some cases, obtaining the data involves working with overloaded or uncooperative data managers, IT departments, or other parties. Other potential obstacles at this stage include data that are housed in different systems or on different platforms and data that are maintained manually. Verify the Data Once the data have been received by the fraud examiner, it must be verified. The first step is simply to ensure that the data analysis software that will be used is able to open and read the data as provided. The examiner should then validate that the data received contains all requested fields and records. Once the success of the transfer process has been confirmed, the examiner should perform the following tests to verify the integrity of data: Confirm control totals. Confirm the time period covered by the data is appropriate and as requested. Sort the file in ascending or descending order to test for leading or lagging errors. Check for gaps in applicable fields, including system-assigned identifying record number, to identify missing records. Confirm the format of data in format-specific fields, such as date fields and numeric fields. Check for blank fields where information should be entered. Check for inappropriate duplicate fields or records. The examiner can also use tests for reasonableness or logical relationships to verify data integrity. Examples of such tests include: 2012 5
If transactions average X per day, verify that a monthly file includes approximately X times the number of business days. Divide average transaction size by the number of transactions in the file. Compare fields such as items orders to items shipped to confirm that the amount shipped is equal to or less that the amount ordered. Cleanse and Normalize the Data Depending on how the data were collected and processed, as well as the results of the data verification process, the examiner might need to cleanse and convert the data to a format suitable for analysis before executing any data analysis tests. For example, certain field formats (e.g., date, time, currency, etc.) might need to be modified to make the information consistent and ready for testing. The data must also be normalized so that all data being imported for analysis can be analyzed consistently. Common data fields from multiple systems must be identified, and data must be standardized. In normalizing the data for analysis, table layout, fields/records, data length, data format, and table relationships are all important considerations. Additionally, the following inconsistencies in the data must be addressed: Known errors Blanks or missing data Duplicated data Special/unreadable characters in the data Other unusable entries 2012 6
When possible, such situations should be addressed by fixing, isolating, or eliminating them. Any issues that cannot be cleaned up will require special consideration during the testing and interpretation phase. Testing and Interpretation Phase If the planning and preparation stages have been conducted effectively, the testing and interpretation phase should yield results that are helpful in uncovering red flags of fraud in the data. Analyze the Data When using data analysis to detect fraud, the examiner should organize and analyze the data in a way designed to uncover patterns that are consistent with the specific fraud scenarios identified during the planning stage. Grouping the data into homogenous groups can facilitate the examiner s ability to spot outliers. The specific groupings should be based on the target fraud scenario and data being analyzed, but such categorization might include grouping data by: Geographical location Business division or unit Time period Dollar value Sales person Another useful step involves running some high-level tests, such as summarization or statistical analysis, to get an overview of the data and provide some context for any outliers identified during later testing. From here, the examiner should run tests to uncover possible indicators of fraud including unusual trends, data anomalies, and control breakdowns. The data 2012 7
analysis techniques used should be chosen based on the high-risk fraud scenarios being considered. Tests designed to detect specific fraud schemes will be covered throughout the later sections of this course. In analyzing the results of the tests run, the following issues merit special consideration. The Role of Concealment The fraud examiner must remember that the goal of the data analysis engagement is to identify fraudulent transactions transactions that, by their very nature, involve deception and attempted concealment. The examiner is looking for data that have been intentionally manipulated, rather than just erroneously recorded. Consequently, he should focus his analysis on data that are strategically missing or altered, such as lacking contact information, or transactions that narrowly circumvent organizational policies. Additionally, because fraudsters can be creative in concealing their schemes, fraud examiners must be creative in their searches, often combining several data analysis techniques and tests. Addressing False Positives A false positive is a transaction identified by a data analysis test as an anomaly within the data set even though it is not actually a fraudulent transaction. Such results can occur for a number of reasons, including: Data validity/integrity issues Data merging difficulties Legitimate data that falls outside the field norm (e.g., entries for non-u.s. addresses within a state field) The goal in running data analysis tests is to identify anomalies without generating too many false positive results. Sorting through and analyzing false positives takes time and resources that could be spent further 2012 8
investigating other potentially fraudulent transactions. However, proper planning and preparation should help minimize the occurrence of false positives in the test results. Post-Analysis Phase The data analysis tests will likely reveal many potential areas of exploration. After executing the desired tests, the fraud examiner will need to determine how to respond to the findings and how to watch for future anomalies. Respond to the Analysis Findings The findings of the data analysis engagement will rarely be sufficient on their own to conclude that fraud is or is not occurring. Any anomalies identified will require follow-up and further examination to determine the reason for the discrepancy in the data. Often, this will include investigatory procedures such as document examination and interviews. The specific resultant actions should be based on the fraud scenarios in question and the particular red flags identified in the data. Additionally, the terms and objectives of the engagement will dictate whether a formal report should be issued and, if so, the form and audience of the resulting report. Such a report should typically by easy to understand, summarize the analysis findings, and provide insightful and applicable information for the relevant readers. Monitor the Data Based on the results of the analysis and the needs of the organization, the examiner should work with management to determine whether any of the tests performed should be repeated on a periodic or 2012 9
continuous basis. In areas deemed to be at heightened risk for fraud, continuous monitoring can provide increased assurance of early detection of any fraud schemes and can even serve as a deterrent to misconduct. 2012 10