Why is Internal Audit so Hard?

Size: px
Start display at page:

Download "Why is Internal Audit so Hard?"

Transcription

1

2 Why is Internal Audit so Hard?

3 Why is Internal Audit so Hard?

4 Why is Internal Audit so Hard? Waste Abuse Fraud

5 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets The end of hand calculation

6 2 nd Wave: ERPs ERPs all our data in one place Database analysis Opens the Age of Rules Personal Computers Electronic Spreadsheets The end of hand calculation

7 2 nd Wave Also Opens the Age of CAATs Beginner s CAATs: Basic database manipulation: join, summarize, append, stratify, sample, extract Basic testing: duplicates, gaps Intermediate CAATs: Automate our rules and (limited) automated testing. (for example in purchase-to-pay) o o o o o o o o o P.O. with blank / zero amount Split P.O.s Duplicate invoices Invoice amount paid > goods received Invoices with no matching receiving report Multiple invoices for same P.O. and date Pattern of sequential invoices from a vendor Non-approved vendors Employee and vendor with same: Name, address, bank, etc

8 3 rd Wave: Predictive Analytics Predictive Analytics focuses our attention on important / suspect transactions. Comes in many different flavors o Each somewhat more sophisticated o Each making audit work more accurate and our lives easier (GTAG 16, 2011, The use of data analysis can significantly reduce audit risk by honing the risk assessment and stratifying the population ) Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Sophisticated Statistical Insights True Predictive & Continuous Audit

9 5 Levels of Predictive Analytics 1. Statistical Insights 2. Fuzzy Logic 3. Clustering 4. Predictive Modeling 5. Big Data Analytics

10 Statistical Insights: Benford s Law The most famous name in forensic accounting does not belong to an accountant. In 1938 at the age of 55 he published a paper titled The Law of Anomalous Numbers. Benford s Law is a statement about the occurrence of digits in lists of data. Useful in detecting fraudulent invoices or other numbered documents Frank Benford ( ), an American physicist.

11 Benford s Law Distribution of 1 st Digits Benford s Distribution Observed Distribution

12 Which to Investigate? For distributions that appear to be anomalous: 1. Calculate the Kolmogorov- Smirnov distance between the vendor s first digit distribution and the ideal Benford distribution. 2. Investigate those with the largest numerical scores. Benford s Law of first digit distribution follows a logarithmic pattern and applies to a large number of surprising datasets including country populations, Twitter users by follower count and many more. See testingbenefordslaw.com for more examples. Kolmogorov-Smirnov distance is the absolute value of the greatest distance between the cumulative distribution functions (CDF). Source: Graph: Pivotal, Inc., Machine Learning for Forensic Accounting,

13 Fuzzy Logic Duplicate Invoice Detection Problem: Deterministic rules expect key information to be exactly the same. Vendor name Address Phone Invoice amount Date Bank account TIN If the criteria is kept tight: Too many false negatives missed duplicates. If the criteria is made loose: Too many false positives result in too many items to investigate

14 Fuzzy Matching Using Natural Language Processing Vendors are considered close matches when: Vendor names Remit vendor Address & Phone Other text-based of your choosing are identical or sufficiently similar Steps in Natural Language Processing (NLP) 1. Tokenize the vendor names 2. Remove stop words and special characters (of, and, the, ) 3. Process synonyms and abbreviations. 4. Calculate the tf-idfs for each word (term frequency inverse document frequency) 5. Calculate the cosine similarity between documents to identify close matches

15 Fuzzy Matching in Numerical Strings Numerical Values (strings) are considered close when: Invoice IDs Edit distance is small Dates Are the same Are within 7 days of each other Are inversed (3/11/14 vs 11/3/14) Payments Amounts are identical Edit distances are small TINS, Bank Accounts, Other Numerics Edit distances are small Substitutions Additions Deletions Transposes Edit Distance calculated with the Damerau-Levenschtein value

16 Fuzzy Matching Using as many features of the invoice as desired o Not limited to 3 dimensions 1. Determine the best distance metric for each dimension o o Some are text-based Others numerical strings 2. Calculate the distance between invoices 3. Adjust the measurement values to yield the best true positive result 4. Investigate any pair of invoices where the distance is within your threshold

17 Clustering Identify Invoice Anomalies with Vendor Baselining Vendors will tend to have patterns in their billing but may have more than one pattern based on service, ordering business unit, specific users, delivery address, etc. There may be multiple normal behaviors. Identify the true outliers for investigation by: Payments ~$1,000 to $5,000 Bus Unit: Bldg Maintenance Users: Loc 1, Loc 2, Loc 3 Paid by ACH To address ABC Payments <$700 Bus Unit: Security Users: Loc Z Paid by check To address GHI Featurizing the invoices (see fuzzy logic) Run a clustering algorithm such as K-Means Identify clusters with low populations and low density as potential anomalies. Vendor A Payments >$100,000 Bus Unit: Construction Users: Loc 4 Paid by ACH To address DEF

18 Predictive Modeling: Time Travel in the 21 st Century

19 Type 1: Prediction by Scoring ML continuously monitors and scores from 1 to 100 examine only the high scoring items. Your Financial System Future You Do this once - ML learns what is FWA Examine lots of possible FWA invoices every month Machine Learning System Current You

20 Type 2: Prediction by Actual Value Example from Insurance $ Premium SIC code # employees Address $ Sales N 1 N 100 Claim File N 1 N 100 Machine Learning System Historical data from many sources is combined to train the ML System to predict the correct $ premium Predicted Premium Actual Premium Paid variance $ 10,254 $ 9,946-3% $ 25,687 $ 26,971 5% $ 5,621 $ 5,452-3% $ 96,321 $ 98,247 2% $ 85,741 $ 72,880-18% Investigate the outliers Accuracy can be very high in the range of 90% to 98% based on historical data used.

21 So What is a Machine Learning System? ML Mathematical Cores Regression K-Means Bayesian Classifiers Decision Trees CART / CHAID Support Vector Machines Artificial Neural Nets (ANN) Genetic Programs Systems (very partial list) Advanced CAATS Pivotal Oversight (as a service) EMC Proprietary General Purpose SAS IBM SPSS RapidMiner Open Source Do It Yourself PSPP Weka R Python

22 4th Wave: Big Data Analytics Big Data Analytics o Addresses new concerns regarding social media and other risks from text and image based sources. o Continues to improve the accuracy of predictive analytics further reducing false positives and false negatives. o Allows true continuous audit of even the largest enterprises as computation costs drop to fractions of previous investments. Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Statistical Insights True Predictive & Continuous Audit

23 Got Big Data? Volume High Terabytes or Petabytes Very long retrieval and processing times Variety Structured Unstructured Semistructured All at once Velocity Batch Near time Real Time Streams

24 It s Really About Big Data Technology Search & Retrieve The database Source: EMC

25 What are Big Data Analytics? 1 st The haystack gets a lot bigger Traditional structured data Unstructured data o Documents o o Web content o Social Media 2 nd Thanks to Hadoop and Massive Parallel Processing Query and retrieval times are short Cost of even massive storage is very low 3 rd Many predictive modeling techniques can also be applied to structured and unstructured data Models become more accurate 4 th New techniques for unstructured data based on NLP Sentiment analysis

26 Focus on Social Media Risks* *Risk also arises from other types of unstructured and semi-structured data: Internal documents Images stored centrally or on users machines

27 Social Media Risks They gave me financial aid then I cancelled all my classes and kept the money Sit in at the Chancellor s Office at 3:00 Joe sold me the answers to tomorrow s test Can t believe how much I made on ebay today I ll fix them. I put a virus on the lab computer. Professor X is such a perv The instructor said I could make money after school fixing cars in the auto shop I just downloaded a bunch of student financial data from the finance system I found out they re cutting my budget. I m going to the union before this gets out Did you hear we re losing accreditation. Don t sign up next term Source: 2014 Internal Audit Capabilities and Needs Survey Report, Protiviti

28 You Don t Need to be a Data Scientist, Just a Smart Tool User The Age of Smart CAATs Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Statistical Insights True Predictive & Continuous Audit Social media, text, image Improved accuracy Cost effective continuous audit

29 Questions Contact Information Bill Vorhies President & Chief Data Scientist Data-Magnum I shall find a way or make one. Admiral Robert Peary Big Data & Predictive Analytics

Dan French Founder & CEO, Consider Solutions

Dan French Founder & CEO, Consider Solutions Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The

More information

2/5/2013. Session Objectives. Higher Education Headlines. Getting Started with Data Analytics. Higher Education Headlines.

2/5/2013. Session Objectives. Higher Education Headlines. Getting Started with Data Analytics. Higher Education Headlines. + Getting Started with Data Analytics Prepared for the UCOP Auditor s Symposium January 30, 2013 and February 14, 2013 Session Objectives 2 Higher Education Headlines New IIA Guidance Visual Risk IQ s

More information

Using Technology to Automate Fraud Detection Within Key Business Process Areas

Using Technology to Automate Fraud Detection Within Key Business Process Areas Using Technology to Automate Fraud Detection Within Key Business Process Areas 2013 ACFE Canadian Fraud Conference September 10, 2013 John Verver, CA, CISA, CMA Vice President, Strategy ACL Services Ltd

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

An Auditor s Guide to Data Analytics

An Auditor s Guide to Data Analytics An Auditor s Guide to Data Analytics Natasha DeKroon, Duke University Health System Brian Karp Services Experis, Risk Advisory May 11, 2013 1 Today s Agenda Data Analytics the Basics Tools of the Trade

More information

Forensic Audit and Automated Oversight Federal Audit Executive Council September 24, 2009

Forensic Audit and Automated Oversight Federal Audit Executive Council September 24, 2009 Forensic Audit and Automated Oversight Federal Audit Executive Council September 24, 2009 Dr. Brett Baker, CPA, CISA Assistant Inspector General for Audit U.S. Department of Commerce OIG Overview Forensic

More information

Qi Liu Rutgers Business School ISACA New York 2013

Qi Liu Rutgers Business School ISACA New York 2013 Qi Liu Rutgers Business School ISACA New York 2013 1 What is Audit Analytics The use of data analysis technology in Auditing. Audit analytics is the process of identifying, gathering, validating, analyzing,

More information

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

More information

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst ESG Brief IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst Abstract: Many enterprise organizations claim that they already

More information

ACL WHITEPAPER. Automating Fraud Detection: The Essential Guide. John Verver, CA, CISA, CMC, Vice President, Product Strategy & Alliances

ACL WHITEPAPER. Automating Fraud Detection: The Essential Guide. John Verver, CA, CISA, CMC, Vice President, Product Strategy & Alliances ACL WHITEPAPER Automating Fraud Detection: The Essential Guide John Verver, CA, CISA, CMC, Vice President, Product Strategy & Alliances Contents EXECUTIVE SUMMARY..................................................................3

More information

LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY. Bill Franks Chief Analytics Officer Teradata July 2013

LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY. Bill Franks Chief Analytics Officer Teradata July 2013 LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY Bill Franks Chief Analytics Officer Teradata July 2013 Agenda Defining The Problem Defining The Opportunity Analytics For Compliance Analytics For

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Fraud Workshop Finding the truth in the transactions

Fraud Workshop Finding the truth in the transactions Your Trusted Partner for Audit Analytics Fraud Workshop Finding the truth in the transactions Copyright 2011 ACL Services Ltd. Robin Clough, ACDA ACL Certified Trainer Copyright 2011 ACL Services Ltd.

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

The Big Data Paradigm Shift. Insight Through Automation

The Big Data Paradigm Shift. Insight Through Automation The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: combines powerful analytical techniques with existing fraud detection and prevention efforts

More information

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you. DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you. JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition

More information

Making critical connections: predictive analytics in government

Making critical connections: predictive analytics in government Making critical connections: predictive analytics in government Improve strategic and tactical decision-making Highlights: Support data-driven decisions using IBM SPSS Modeler Reduce fraud, waste and abuse

More information

Improve Model Accuracy with Unstructured Data

Improve Model Accuracy with Unstructured Data IBM SPSS Modeler Premium Improve Model Accuracy with Unstructured Data Highlights Easily access, prepare and integrate structured data and text, Web and survey data Support the entire data mining process

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Auditing Applications. ISACA Seminar: February 10, 2012

Auditing Applications. ISACA Seminar: February 10, 2012 Auditing Applications ISACA Seminar: February 10, 2012 Planning Objectives Mapping Controls Functionality Tests Complications Financial Assertions Tools Reporting AGENDA 2 PLANNING Consideration / understanding

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Chapter 1. Contrasting traditional and visual analytics approaches

Chapter 1. Contrasting traditional and visual analytics approaches Chapter 1 Understanding Big Data Analytics In This Chapter Defining Big Data Understanding Big Data Analytics Contrasting traditional and visual analytics approaches The era of Big Data is upon us. The

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Using Data Analytics to Detect Fraud

Using Data Analytics to Detect Fraud Using Data Analytics to Detect Fraud Fundamental Data Analysis Techniques 2016 Association of Certified Fraud Examiners, Inc. Discussion Question For each data analysis technique discussed in this section,

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

Maximizing Return and Minimizing Cost with the Decision Management Systems

Maximizing Return and Minimizing Cost with the Decision Management Systems KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Alex Pinto Chief Data Scientist Niddel / MLSec Project @alexcpsec @MLSecProject @NiddelCorp MLSec Project / Niddel MLSec

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

IPT 2015 Sales & Use Tax Symposium

IPT 2015 Sales & Use Tax Symposium IPT 2015 Sales & Use Tax Symposium Data Analytics Sell Side 1.00 pm 2.15 pm Tuesday, September 29, 2015 Agenda 2 Introductions Session Description Learning Objectives Survey Questions Define Data Analytics

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Big Data Executive Survey

Big Data Executive Survey Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:

More information

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

More information

INTRODUCING AZURE MACHINE LEARNING

INTRODUCING AZURE MACHINE LEARNING David Chappell INTRODUCING AZURE MACHINE LEARNING A GUIDE FOR TECHNICAL PROFESSIONALS Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents What is Machine Learning?... 3 The

More information

SAP SE - Legal Requirements and Requirements

SAP SE - Legal Requirements and Requirements Finding the signals in the noise Niklas Packendorff @packendorff Solution Expert Analytics & Data Platform Legal disclaimer The information in this presentation is confidential and proprietary to SAP and

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

SIFMA Society: IAS Seminar. A Look at Data Mining from a Business Perspective. Analyzing Data to Increase Audit Efficiency. October 25, 2011.

SIFMA Society: IAS Seminar. A Look at Data Mining from a Business Perspective. Analyzing Data to Increase Audit Efficiency. October 25, 2011. SIFMA Society: IAS Seminar Analyzing Data to Increase Audit Efficiency A Look at Data Mining from a Business Perspective October 25, 2011 PwC Table of Contents Section 1 Introduction 2 Data Mining Basics

More information

Fuzzy Matching in Audit Analytics. Grant Brodie, President, Arbutus Software

Fuzzy Matching in Audit Analytics. Grant Brodie, President, Arbutus Software Fuzzy Matching in Audit Analytics Grant Brodie, President, Arbutus Software Outline What Is Fuzzy? Causes Effective Implementation Demonstration Application to Specific Products Q&A 2 Why Is Fuzzy Important?

More information

Data mining life cycle in fraud auditing

Data mining life cycle in fraud auditing Data mining life cycle in fraud auditing ELENA MONICA SABĂU Faculty of Accounting and Management Information Systems Academy of Economic Studies 6, Romană Square, District 1, Bucharest emsabau@gmail.com

More information

Completing an Accounts Payable Audit With ACL (Aired on Feb 15)

Completing an Accounts Payable Audit With ACL (Aired on Feb 15) AuditSoftwareVideos.com Video Training Titles (ACL Software Sessions Only) Contents Completing an Accounts Payable Audit With ACL (Aired on Feb 15)... 1 Statistical Analysis in ACL The Analyze Menu (Aired

More information

Benford s Law and Digital Frequency Analysis

Benford s Law and Digital Frequency Analysis Get M.A.D. with the Numbers! Moving Benford s Law from Art to Science BY DAVID G. BANKS, CFE, CIA September/October 2000 Until recently, using Benford s Law was as much of an art as a science. Fraud examiners

More information

DATAOPT SOLUTIONS. What Is Big Data?

DATAOPT SOLUTIONS. What Is Big Data? DATAOPT SOLUTIONS What Is Big Data? WHAT IS BIG DATA? It s more than just large amounts of data, though that s definitely one component. The more interesting dimension is about the types of data. So Big

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Predictive Analysis Risk Analysis

Predictive Analysis Risk Analysis Predictive Analysis Risk Analysis MARYLAND ASSOCIATION OF CPAS GOVERNMENT AND NOT-FOR-PROFIT CONFERENCE April 25, 2014 Overview Forensic Audit and Automated Oversight Data Analytics for Grant Oversight

More information

Framework for Audit Oversight INTERNATIONAL WORKSHOP ON ACCOUNTABILITY IN SCIENCE AND RESEARCH FUNDING JUNE 2 4, 2011

Framework for Audit Oversight INTERNATIONAL WORKSHOP ON ACCOUNTABILITY IN SCIENCE AND RESEARCH FUNDING JUNE 2 4, 2011 Framework for Audit Oversight 1 INTERNATIONAL WORKSHOP ON ACCOUNTABILITY IN SCIENCE AND RESEARCH FUNDING JUNE 2 4, 2011 Overview 2 Forensic Audit and Oversight Forensic Techniques Identify Anomalies Framework

More information

Better planning and forecasting with IBM Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 11.1 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

More information

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit

More information

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues

More information

Fraud Detection In Insurance Claims. Bob Biermann Bob_Biermann@Yahoo.com April 15, 2013

Fraud Detection In Insurance Claims. Bob Biermann Bob_Biermann@Yahoo.com April 15, 2013 Fraud Detection In Insurance Claims Bob Biermann Bob_Biermann@Yahoo.com April 15, 2013 1 Background Fraud is common and costly for the insurance industry. According to the Federal Bureau of Investigations,

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

More information

How Big Data is Different

How Big Data is Different FALL 2012 VOL.54 NO.1 Thomas H. Davenport, Paul Barth and Randy Bean How Big Data is Different Brought to you by Please note that gray areas reflect artwork that has been intentionally removed. The substantive

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2 Oracle 11g DB Data Warehousing ETL OLAP Statistics Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2 Data Mining Charlie Berger Sr. Director Product Management, Data

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Fighting Fraud with Data Mining & Analysis

Fighting Fraud with Data Mining & Analysis Fighting Fraud with Data Mining & Analysis Leonard W. Vona December 2008 Fraud Auditing, Inc. Phone: 518-784-2250 www.fraudauditing.net E-mail: leonard@leonardvona.com Copyright 2008 Leonard Vona and Fraud

More information

Reduce Audit Time Using Automation, By Example. Jay Gohil Senior Manager

Reduce Audit Time Using Automation, By Example. Jay Gohil Senior Manager Reduce Audit Time Using Automation, By Example Jay Gohil Senior Manager Today s Session Speaker Bio: Jay Gohil, Protiviti Jay is a Senior Manager in the ERP Services practice in Atlanta. In the past seven

More information

Data Mining and Pattern Recognition for Large-Scale Scientific Data

Data Mining and Pattern Recognition for Large-Scale Scientific Data Data Mining and Pattern Recognition for Large-Scale Scientific Data Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory October 15, 1998 We need an effective

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data White Paper A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data Contents Executive Summary....2 Introduction....3 Too much data, not enough information....3 Only

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

More information

Web Data Mining: A Case Study. Abstract. Introduction

Web Data Mining: A Case Study. Abstract. Introduction Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 okgupta@pvamu.edu Abstract With an enormous amount of data stored

More information

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation

More information

2010 Data Miner Survey Highlights

2010 Data Miner Survey Highlights Predictive Analytics World Washington, DC October 2010 2010 Data Miner Survey Highlights The Views of 735 Data Miners Karl Rexer, PhD President Rexer Analytics www.rexeranalytics.com 2010 Data Miner Survey:

More information

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission White Paper: SAS and Apache Hadoop For Government Unlocking Higher Value From Business Analytics to Further the Mission Inside: Using SAS and Hadoop Together Design Considerations for Your SAS and Hadoop

More information