Risk Analysis Approaches to Rank Outliers in Trade Data

Size: px
Start display at page:

Download "Risk Analysis Approaches to Rank Outliers in Trade Data"

Transcription

1 Risk Analysis Approaches to Rank Outliers in Trade Data Vytis Kopustinskas and Spyros Arsenis Abstract The paper discusses ranking methods for outliers in trade data based on statistical information with the objective to prioritize anti-fraud investigation activities. The paper presents a ranking method based on risk analysis framework and discusses a comprehensive trade fraud indicator that aggregates a number of individual numerical criteria. 1 Introduction The detection of outliers in trade data can be important for various practical applications, in particular for prevention of the customs fraud or data quality. From the point of view of a customs inspector, trade transactions detected as outliers may be of interest as due to possible on-going fraud activities. For example, low price outliers might indicate that the specific transaction is undervalued to evade import duties. As another example, low and high price outliers may be indicators of other frauds: VAT fraud or trade based money laundering. The statistical algorithms used to detect outliers in large trade datasets typically produce high number of transactions classified as outliers (Perrotta et al. 2009). Large number of transactions flagged as suspicious are difficult to handle. Therefore the detected outliers must be ranked according to certain criteria in order to prioritize the investigation actions. Different criteria could be used for ranking purposes and they are derived from at least two very distinct information sources: statistical information of outlier diagnostics and customs in-house information systems. V. Kopustinskas ( ) S. Arsenis European Commission, Joint Research Center, Institute for the Protection and Security of the Citizen, Via E. Fermi 2748, Ispra (VA), Italy vytis.kopustinskas@jrc.ec.europa.eu A. Di Ciaccio et al. (eds.), Advanced Statistical Methods for the Analysis of Large Data-Sets, Studies in Theoretical and Applied Statistics, DOI / , Springer-Verlag Berlin Heidelberg

2 138 V. Kopustinskas and S. Arsenis This paper discusses low price outliers ranking methods based on statistical information with the objective to prioritize anti-fraud investigation actions. The presented methodology to rank low price outliers is not generic, but can be extended to other type of fraud patterns by using the same principles. 2 Risk Analysis Framework The risk analysis framework is applicable to the ranking problem of outliers in trade data. The fundamental questions in quantitative risk analysis are the following: 1. What can go wrong? 2. How likely it will happen? 3. If it happens, what consequences are expected? To answer question 1, a list of initiating events should be defined. The likelihood of the events should be estimated and the consequences of each scenario should be assessed. Therefore, quantitatively risk can be defined as the following set of triplets (Kaplan and Garrick 1981): R D< S i ;P i ;C i >; i D 1;:::;n: (1) where S i ith scenario of the initiating events; P i likelihood (probability or frequency) of the scenario i; C i consequence of the scenario; n number of scenarios. In case of outliers in trade data, the triplet can be interpreted in the following way: P likelihood that an outlier is a real fraudulent transaction; C consequence of the fraudulent trade transaction (e.g. unpaid taxes or duties). The interpretation of S can be important only if several methods are used to detect outliers or more than one fraud pattern is behind the observed outlier. 3 Approaches to Rank Low Price Outliers in Trade Data There are several approaches to obtain a numerical ranking of low price outliers in trade data based on the risk analysis framework. The most suitable method is to multiply P and C,whereC is an estimate of the loss to the budget (unpaid duties) and P is probability that the specific transaction is a fraud. The multiplication R D P C provides an average fraud related damage estimate caused by specific trade activity and allows ranking them according to their severity. In orderto use the risk analysisframework (1) we have to estimate the probability.p / of fraud in the trade transaction. This is not an easy quantity to estimate, but we assume that p-value produced by statistical tests for outliers could be a suitable measure. It means that the lower the p-value, the more likely the transaction is

3 Risk Analysis Approaches to Rank Outliers in Trade Data 139 fraudulent. In practice, this can also be a data typing error and not a fraud, but in general low price outliers are suspicious and should be investigated. As the relationship between P and p-value is reverse, the transformation is used: ( P D log 10.pvalue/ 10 ; if pvalue P D 1; if pvalue <10 10 (2) By transformation (2)thep-value is transformed into scale [0, 1]. The scale here is arbitrary and chosen mainly for the purpose of convenience, driven by the fact that extremely low p-values are no more informative for the ranking purposes. The consequence part.c / of (1) can be estimated by multiplying the traded quantity.q/ and transaction unit price difference.u / from the recorded to the estimated fair price: C D Q U D Q.UF U/,whereUF the estimated fair transaction unit price determined by the regression after outliers have been removed; U the unit price as recorded.u D V=Q/; V value of the transaction as recorded. The interpretation of C is an average loss to the budget if the underlying transaction is fraudulent. In fact,.c / value already provides a ranking of outliers and such a ranking has been applied. The fraud risk (RI) can be computed as follows: RI D P QU. The indicator can also be transformed into the [0, 10] scale, as to make its use more standard for investigators. The investigator should start the investigation process from the outlying trade transactions with the highest values of RI. The RI is a simple and easy to understand indicator, however the dataset of detected outliers contains additional statistical information. Table 1 provides a number of criteria which could be used for the development of a comprehensive ranking structure for low price outliers. The criteria listed in Table 1 are all numerical and their higher value is associated with the higher impact to trade fraud risk indicator (FI). Most of the criteria.i 1 I 7 / are easy to understand and compute as they reflect basic statistical information about the dataset. The criterion I 8 reflects inferential statistics from the method that was Table 1 Numerical criteria for the development of ranking structure, indicating their original scale and rescaling method. V PO Trade value by aggregating all destinations; Q PO Trade quantity by aggregating all destinations; MaxN maximum number of non-zero trade transactions No Criteria Original scale Rescaling I 1 Quantity traded, Q [0, 1] log and in-max translformation to [0, 1] I 2 Value traded, V [0, 1] log and min-max transformation to [0, 1] I 3 Average loss, Q U [0, 1] log and min-max transformation to [0, 1] I 4 Ratio UF=U [0, 1] log and min-max transformation to [0, 1] I 5 Ratio V=V PO [0, 1] No I 6 Ratio Q=Q PO [0, 1] No I 7 Number of obs./maxn [0, 1] No I 8 P -value [0, 0.1] log transformation to [0, 1] as in (2) I 9 Final goodness of fit R 2 [0, 1] No I 10 Final R 2 /initial R 2 [0, 1] log and min-max transformation to [0, 1]

4 140 V. Kopustinskas and S. Arsenis used to detect outliers. The ranking structure can be adapted to other price outlier detection methods by using the same principles. The criteria I 9 and I 10 take into account the model goodness of fit by using the coefficient of determination of the linear regression for Q versus V variables. The initial R 2 is computed on all the trade data in a particular trade flow assuming linear regression as the base model, while the final R 2 is computed on the remaining data points after removal of the outliers. The initial R 2 and the final R 2 are used as it might be important to reflect for the change in the model goodness of fit after removal of the outliers. After rescaling as shown in Table 1, the individual criteria.i i / are comparable among themselves. The log-transformation was used for a number of criteria to make the ranking smoother, because for many real data cases it is rather stair-like. The criteria when transformation is actually needed could be more elaborated in the future.the specific weights.w i / must be assigned to each criterion to determine its relative impact to the final indicator score. The most popular method to combine different criteria into a single numerical indicator is to compute a weighted sum: FI D P m id1 w i I i,wherem number of individual criteria. A complication arises from the fact that some criteria could be highly correlated and therefore their correlation matrix must be examined before assignment of weights to each criterion. Without prior knowledge, equal weights could be assigned to non-correlated criteria. However, the correlation matrix analysis is not enough and weights cannot be derived from statistical considerations only, but must by defined by subject matter experts and be closely related to the specific type of fraud in mind. One possible method is analytic hierarchy process (AHP) which provides a rational framework for integrating opinions of many subject matter experts into a single quantitative estimate (Zio 1996). The list of possible criteria presented in Table 1 is not complete and could be modified in future applications. One possible type of analysis that could improve the ranking structure is correspondence analysis. As various associations between trade flow variables could be important and quantitative information about the existing links in the data could be integrated in the ranking: for example, quantitative relationship between products and exporting countries (for import datasets) among all the outliers could be important to determine whether fraudulent activities might be linked to specific origin countries or products. The presented ranking methodology was developed for the situation when trade data represent a single population and low price outliers are detected within it assuming linear regression to be a model of the data. However, the problem of ranking becomes more interesting when outliers are detected in the mixtures of populations (Atkinson et al. 2004): when several populations of different price levels exist and it is not obvious from which populations the outliers are detected. It is an important problem in fraud detection, where fraudulent transactions are hidden within a mixture of several populations. For example, continuous systematic underpricing of selected imports into one country could not be detected by doing single

5 Risk Analysis Approaches to Rank Outliers in Trade Data 141 country analysis. In the case of mixed populations, the ranking structure needs to be further developed. 4 Application of the Ranking Criteria The ranking was applied for the low price outliers detected in the monthly aggregated trade data of agricultural product imports into the EU 27 member states during (dataset containing: product, reporting countries, volume and value). In total, 1,109 low price outliers were detected by using backward search based outlier detection method. The numerical criteria as shown in Table 1 were computed and their mutual correlation is presented in Table 2. As evident from Table 2, several pairs are highly correlated (higher than 0.6). It is not surprising that quantity and value based numerical criteria (I 1, I 2, I 5 and I 6 ) are highly correlated because larger quantity trade transactions are associated with larger value transactions. Inclusion of all these criteria in the ranking at equal weights would have double counting effect on the total score. In customs fraud, misdeclaration of value happens to be much more frequent than misdeclaration of quantity. Considering this, the numerical criteria I 2 (value traded) and I 5 (ratio of value) should be eliminated from the ranking structure. In fact, the decision to eliminate them could have been done before the computations (following the reasoning as above). The high correlation of quantity.i 1 / and average loss.i 3 / is also expected as average loss is a function of quantity. In this case the weight can be equally divided between the two criteria. The same approach can be used for the remaining two highly correlated numerical criteria I 4 and I 10. This correlation is very interesting: ratio of fair price versus recorded price gives similar information as ratio of final model (without outliers) R 2 versus initial (all the data) model goodness of fit R 2.In this case, equal weights of 0.5 were applied. Table 2 Correlation matrix of the numerical criteria I 1 I 10 for the selected application I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I1 1:00 0:79 0:70 0:07 0:20 0:13 0:12 0:13 0:05 0:02 I2 0:79 1:00 0:74 0:34 0:22 0:01 0:22 0:05 0:06 0:20 I3 0:70 0:74 1:00 0:33 0:05 0:23 0:20 0:19 0:15 0:25 I4 0:07 0:34 0:33 1:00 0:20 0:37 0:06 0:19 0:27 0:69 I 5 0:20 0:22 0:05 0:20 1:00 0:76 0:38 0:06 0:12 0:23 I 6 0:13 0:01 0:23 0:37 0:76 1:00 0:38 0:05 0:07 0:22 I 7 0:12 0:22 0:20 0:06 0:38 0:38 1:00 0:22 0:26 0:06 I 8 0:13 0:05 0:19 0:19 0:06 0:05 0:22 1:00 0:07 0:16 I 9 0:05 0:06 0:15 0:27 0:12 0:07 0:26 0:07 1:00 0:24 I10 0:02 0:20 0:25 0:69 0:23 0:22 0:06 0:16 0:24 1:00

6 142 V. Kopustinskas and S. Arsenis 0,90 0,80 Fraud indicator value 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0, Trade flow number Fig. 1 Ranking of the detected outliers in the EU import trade data (sorted decreasingly) Table 3 The weighting of the ranking structure. No Criteria Weight w i I 1 Quantity traded, Q 0:5 I 2 Value traded, V 0 I 3 Average loss, Q U 0:5 I 4 Ratio UF=U 0:5 I 5 Ratio V=V PO 0 I 6 Ratio Q=Q PO 1 I 7 Number of obs/(maxn D 36) 1 I 8 P -value 1 I 9 Coefficient of determination R 2 1 I 10 Final R 2 =initialr 2 0:5 The weights applied for the ranking procedure are shown in Table 3. The ranking indicator value can be further normalized to scale [0, 1] by dividing by the sum of weights. The computed FI values are shown in Fig. 1. It reflects typical in risk rankings Pareto distribution, where the highest risk is associated with a small number of outliers, while the risk of the rest is distributed more smoothly. The highest and the lowest ranked trade outliers are shown in Figs. 2 and 3. The results of the ranking are as expected: the highest ranked outliers are severe outliers in terms of low price being far away from the regression line and the lowest ranked outliers being close to it. The next step to improve the ranking procedure would be to validate the ranking based on real fraud cases and involve fraud experts in the process of weight

7 Risk Analysis Approaches to Rank Outliers in Trade Data 143 Value, thousands euro Linear trend (no outliers) Linear trend (all data) 200 Outliers Non-outliers Quantity, tons Fig. 2 The highest ranked low price outlier (EU import trade dataset) Linear trend (no outliers) 35 Value, thousands euro Linear trend (all data) 10 5 Outliers Non-outliers Quantity, tons Fig. 3 The lowest ranked low price outlier (EU import trade dataset)

8 144 V. Kopustinskas and S. Arsenis estimation. Both options require a lot of resources for implementation and especially feedback for the ranking validation. Preliminary validation information suggests that severe price outliers could be more linked to data errors than fraudulent activities. Further development of the ranking structure by adding other indicators could address the data quality issues. 5 Final Remarks The paper discusses the trade data outliers ranking methods with the objective to prioritize anti-fraud investigation actions. The risk analysis framework was used to develop a ranking structure based only on available statistical information in trade dataset. A comprehensive trade fraud risk indicator is discussed that combines a number of individual numerical criteria. An application study is presented that produced a ranking of the detected outliers in the aggregated European import trade data during The ranking produced cannot be considered as final due to arbitrary weights that were used for the computations. Derivation of weights is an important part of the ranking methodology, however it cannot be produced only by statistical considerations. Subject matter expert opinions would be valuable to define the weights based on the type of fraud under investigation. The results of the test study show that even arbitrary weights can produce reasonable results. Further fine-tuning of the methodology is depended on feedback and judgments from fraud experts. References Atkinson A.C., Riani M., Cerioli A. (2004) Exploring Multivariate Data With the Forward Search, Springer, New York. Perrotta D., Riani M. and Torti F. (2009) New robust dynamic plots for regression mixture detection. Advances in Data Analysis and Classification, 3(3), Kaplan, S., Garrick, B. J. (1981) On the quantitative definition of risk, Risk Analysis, 1(1), Zio E. (1996) On the use of the analytic hierarchy process in the aggregation of expert judgments, Reliability Engineering and System Safety, 53(2),

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation

More information

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process Business Intelligence and Decision Making Professor Jason Chen The analytical hierarchy process (AHP) is a systematic procedure

More information

Management Science Letters

Management Science Letters Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

An introduction to using Microsoft Excel for quantitative data analysis

An introduction to using Microsoft Excel for quantitative data analysis Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

More information

EcOS (Economic Outlook Suite)

EcOS (Economic Outlook Suite) EcOS (Economic Outlook Suite) Customer profile The International Monetary Fund (IMF) is an international organization working to promote international monetary cooperation and exchange stability; to foster

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit

More information

Vendor Evaluation and Rating Using Analytical Hierarchy Process

Vendor Evaluation and Rating Using Analytical Hierarchy Process Vendor Evaluation and Rating Using Analytical Hierarchy Process Kurian John, Vinod Yeldho Baby, Georgekutty S.Mangalathu Abstract -Vendor evaluation is a system for recording and ranking the performance

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

QoS EVALUATION OF CLOUD SERVICE ARCHITECTURE BASED ON ANP

QoS EVALUATION OF CLOUD SERVICE ARCHITECTURE BASED ON ANP QoS EVALUATION OF CLOUD SERVICE ARCHITECTURE BASED ON ANP Mingzhe Wang School of Automation Huazhong University of Science and Technology Wuhan 430074, P.R.China E-mail: mingzhew@gmail.com Yu Liu School

More information

INVOLVING STAKEHOLDERS IN THE SELECTION OF A PROJECT AND PORTFOLIO MANAGEMENT TOOL

INVOLVING STAKEHOLDERS IN THE SELECTION OF A PROJECT AND PORTFOLIO MANAGEMENT TOOL INVOLVING STAKEHOLDERS IN THE SELECTION OF A PROJECT AND PORTFOLIO MANAGEMENT TOOL Vassilis C. Gerogiannis Department of Project Management, Technological Research Center of Thessaly, Technological Education

More information

IMPROVING THE CRM SYSTEM IN HEALTHCARE ORGANIZATION

IMPROVING THE CRM SYSTEM IN HEALTHCARE ORGANIZATION IMPROVING THE CRM SYSTEM IN HEALTHCARE ORGANIZATION ALIREZA KHOSHRAFTAR 1, MOHAMMAD FARID ALVANSAZ YAZDI 2, OTHMAN IBRAHIM 3, MAHYAR AMINI 4, MEHRBAKHSH NILASHI 5, AIDA KHOSHRAFTAR 6, AMIR TALEBI 7 1,3,4,5,6,7

More information

Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY

Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY This chapter highlights on supply chain performance measurement using one of the renowned modelling technique

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Study of data structure and algorithm design teaching reform based on CDIO model

Study of data structure and algorithm design teaching reform based on CDIO model Study of data structure and algorithm design teaching reform based on CDIO model Li tongyan, Fu lin (Chengdu University of Information Technology, 610225, China) ABSTRACT CDIO is a new and innovative engineering

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

IBM SPSS Data Preparation 22

IBM SPSS Data Preparation 22 IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release

More information

Customer Life Time Value

Customer Life Time Value Customer Life Time Value Tomer Kalimi, Jacob Zahavi and Ronen Meiri Contents Introduction... 2 So what is the LTV?... 2 LTV in the Gaming Industry... 3 The Modeling Process... 4 Data Modeling... 5 The

More information

SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria

SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN

More information

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Knowledge Discovery in Stock Market Data

Knowledge Discovery in Stock Market Data Knowledge Discovery in Stock Market Data Alfred Ultsch and Hermann Locarek-Junge Abstract This work presents the results of a Data Mining and Knowledge Discovery approach on data from the stock markets

More information

Validation of measurement procedures

Validation of measurement procedures Validation of measurement procedures R. Haeckel and I.Püntmann Zentralkrankenhaus Bremen The new ISO standard 15189 which has already been accepted by most nations will soon become the basis for accreditation

More information

Topic (i): Automated editing and imputation and software applications

Topic (i): Automated editing and imputation and software applications WP. 5 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Neuchâtel, Switzerland, 5 7 October

More information

An effective approach to preventing application fraud. Experian Fraud Analytics

An effective approach to preventing application fraud. Experian Fraud Analytics An effective approach to preventing application fraud Experian Fraud Analytics The growing threat of application fraud Fraud attacks are increasing across the world Application fraud is a rapidly growing

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Research on supply chain risk evaluation based on the core enterprise-take the pharmaceutical industry for example

Research on supply chain risk evaluation based on the core enterprise-take the pharmaceutical industry for example Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):593-598 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Research on supply chain risk evaluation based on

More information

Contractor selection using the analytic network process

Contractor selection using the analytic network process Construction Management and Economics (December 2004) 22, 1021 1032 Contractor selection using the analytic network process EDDIE W. L. CHENG and HENG LI* Department of Building and Real Estate, The Hong

More information

Aspen Collaborative Demand Manager

Aspen Collaborative Demand Manager A world-class enterprise solution for forecasting market demand Aspen Collaborative Demand Manager combines historical and real-time data to generate the most accurate forecasts and manage these forecasts

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

IBM SPSS Neural Networks 22

IBM SPSS Neural Networks 22 IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,

More information

CHAPTER 3 IDENTIFICATION OF MOST PREFERRED KEY PERFORMANCE INDICATOR IN INDIAN CALL CENTRES

CHAPTER 3 IDENTIFICATION OF MOST PREFERRED KEY PERFORMANCE INDICATOR IN INDIAN CALL CENTRES 49 CHAPTER 3 IDENTIFICATION OF MOST PREFERRED KEY PERFORMANCE INDICATOR IN INDIAN CALL CENTRES 3.1 INTRODUCTION Key Performance Indicators (KPIs) is means for assessment of an organisations current position

More information

Didacticiel Études de cas

Didacticiel Études de cas 1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Model Validation Techniques

Model Validation Techniques Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost

More information

Improving proposal evaluation process with the help of vendor performance feedback and stochastic optimal control

Improving proposal evaluation process with the help of vendor performance feedback and stochastic optimal control Improving proposal evaluation process with the help of vendor performance feedback and stochastic optimal control Sam Adhikari ABSTRACT Proposal evaluation process involves determining the best value in

More information

Impact / Performance Matrix A Strategic Planning Tool

Impact / Performance Matrix A Strategic Planning Tool Impact / Performance Matrix A Strategic Planning Tool Larry J. Seibert, Ph.D. When Board members and staff convene for strategic planning sessions, there are a number of questions that typically need to

More information

Using Analytic Hierarchy Process (AHP) Method to Prioritise Human Resources in Substitution Problem

Using Analytic Hierarchy Process (AHP) Method to Prioritise Human Resources in Substitution Problem Using Analytic Hierarchy Process (AHP) Method to Raymond Ho-Leung TSOI Software Quality Institute Griffith University *Email:hltsoi@hotmail.com Abstract In general, software project development is often

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Marketing: it s the marketing portion of a CRM like Salesforce.com. This database comes with the following tables

Marketing: it s the marketing portion of a CRM like Salesforce.com. This database comes with the following tables Database Structure This demo dataset is based in a CRM standard structure on a B2B company in the computer components and software development industry. It s a snapshot of that CRM on July 5th, 2013, and

More information

Applying the Analytic Hierarchy Process to Health Decision Making: Deriving Priority Weights

Applying the Analytic Hierarchy Process to Health Decision Making: Deriving Priority Weights Applying the to Health Decision Making: Deriving Priority Weights Tomás Aragón, MD, DrPH Principal Investigator, Cal PREPARE,. CIDER UC Berkeley School of Public Health Health Officer, City & County of

More information

Analytic Hierarchy Process, a Psychometric Approach. 1 Introduction and Presentation of the Experiment

Analytic Hierarchy Process, a Psychometric Approach. 1 Introduction and Presentation of the Experiment Analytic Hierarchy Process, a Psychometric Approach Christine Choirat and Raffaello Seri Dept. of Economics, Università dell Insubria Varese, Italy Abstract. The Analytic Hierarchy Process, or AHP for

More information

Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts

Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Page 1 of 20 ISF 2008 Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Andrey Davydenko, Professor Robert Fildes a.davydenko@lancaster.ac.uk Lancaster

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Advanced Forecasting Techniques and Models: ARIMA

Advanced Forecasting Techniques and Models: ARIMA Advanced Forecasting Techniques and Models: ARIMA Short Examples Series using Risk Simulator For more information please visit: www.realoptionsvaluation.com or contact us at: admin@realoptionsvaluation.com

More information

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

CANNEX Payout Annuity Yield (PAY) Index TM USA

CANNEX Payout Annuity Yield (PAY) Index TM USA CANNEX Payout Annuity Yield (PAY) Index TM USA June 2015 Copyright CANNEX Financial Exchanges Limited, 2015. All rights reserved. No part of this publication may be reproduced, transmitted, transcribed,

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Analytic hierarchy process (AHP)

Analytic hierarchy process (AHP) Table of Contents The Analytic Hierarchy Process (AHP)...1/6 1 Introduction...1/6 2 Methodology...1/6 3 Process...1/6 4 Review...2/6 4.1 Evaluation of results...2/6 4.2 Experiences...3/6 4.3 Combinations...3/6

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Engineering, Business and Enterprise

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Benchmarking Residential Energy Use

Benchmarking Residential Energy Use Benchmarking Residential Energy Use Michael MacDonald, Oak Ridge National Laboratory Sherry Livengood, Oak Ridge National Laboratory ABSTRACT Interest in rating the real-life energy performance of buildings

More information

Industry Environment and Concepts for Forecasting 1

Industry Environment and Concepts for Forecasting 1 Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6

More information

GLOBAL TELECOMMUNICATIONS LEADER IMPROVES SALES CONVERSION WITH ROBUST QUALITY ASSURANCE SOLUTION

GLOBAL TELECOMMUNICATIONS LEADER IMPROVES SALES CONVERSION WITH ROBUST QUALITY ASSURANCE SOLUTION GLOBAL TELECOMMUNICATIONS LEADER IMPROVES SALES CONVERSION WITH ROBUST QUALITY ASSURANCE SOLUTION SUCCESS AT A GLANCE CHALLENGE An ecommerce company that invested in an online chat sales channel knew it

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Capacity planning for fossil fuel and renewable energy resources power plants

Capacity planning for fossil fuel and renewable energy resources power plants Capacity planning for fossil fuel and renewable energy resources power plants S. F. Ghaderi *,Reza Tanha ** Ahmad Karimi *** *,** Research Institute of Energy Management and Planning and Department of

More information

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification Gold Standard for Quantitative Data Processing Because of the sensitivity, selectivity, speed and throughput at which MRM assays can

More information

Archetypal Analysis in Marketing Research: A New Way of Understanding Consumer Heterogeneity

Archetypal Analysis in Marketing Research: A New Way of Understanding Consumer Heterogeneity Archetypal Analysis in Marketing Research: A New Way of Understanding Consumer Heterogeneity Dr. Paul Riedesel President, Action Marketing Research Introduction Archetypal analysis is a method for analyzing

More information

An Evaluation Model for Determining Insurance Policy Using AHP and Fuzzy Logic: Case Studies of Life and Annuity Insurances

An Evaluation Model for Determining Insurance Policy Using AHP and Fuzzy Logic: Case Studies of Life and Annuity Insurances Proceedings of the 8th WSEAS International Conference on Fuzzy Systems, Vancouver, British Columbia, Canada, June 19-21, 2007 126 An Evaluation Model for Determining Insurance Policy Using AHP and Fuzzy

More information

Business Rules Data Validation -and- Data Quality

Business Rules Data Validation -and- Data Quality Business Rules Data Validation -and- Data Quality National Association of State EMS Officials 2012 Annual Meeting Boise Centre Boise, Idaho Tuesday, September 25, 2012 Presented to the Data Managers Council

More information

Landslide hazard zonation using MR and AHP methods and GIS techniques in Langan watershed, Ardabil, Iran

Landslide hazard zonation using MR and AHP methods and GIS techniques in Langan watershed, Ardabil, Iran Landslide hazard zonation using MR and AHP methods and GIS techniques in Langan watershed, Ardabil, Iran A. Esmali Ouri 1* S. Amirian 2 1 Assistant Professor, Faculty of Agriculture, University of Mohaghegh

More information

Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information

Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Export Pricing and Credit Constraints: Theory and Evidence from Greek Firms. Online Data Appendix (not intended for publication) Elias Dinopoulos

Export Pricing and Credit Constraints: Theory and Evidence from Greek Firms. Online Data Appendix (not intended for publication) Elias Dinopoulos Export Pricing and Credit Constraints: Theory and Evidence from Greek Firms Online Data Appendix (not intended for publication) Elias Dinopoulos University of Florida Sarantis Kalyvitis Athens University

More information

Supplier Performance Evaluation and Selection in the Herbal Industry

Supplier Performance Evaluation and Selection in the Herbal Industry Supplier Performance Evaluation and Selection in the Herbal Industry Rashmi Kulshrestha Research Scholar Ranbaxy Research Laboratories Ltd. Gurgaon (Haryana), India E-mail : rashmi.kulshreshtha@ranbaxy.com

More information

Strategically Detecting And Mitigating Employee Fraud

Strategically Detecting And Mitigating Employee Fraud A Custom Technology Adoption Profile Commissioned By SAP and Deloitte March 2014 Strategically Detecting And Mitigating Employee Fraud Executive Summary Employee fraud is a universal concern, with detection

More information

Forschungskolleg Data Analytics Methods and Techniques

Forschungskolleg Data Analytics Methods and Techniques Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!

More information

Data Preparation and Statistical Displays

Data Preparation and Statistical Displays Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability

More information

6 Analytic Hierarchy Process (AHP)

6 Analytic Hierarchy Process (AHP) 6 Analytic Hierarchy Process (AHP) 6.1 Introduction to Analytic Hierarchy Process The AHP (Analytic Hierarchy Process) was developed by Thomas L. Saaty (1980) and is the well-known and useful method to

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Analyzing Big Data: The Path to Competitive Advantage

Analyzing Big Data: The Path to Competitive Advantage White Paper Analyzing Big Data: The Path to Competitive Advantage by Marcia Kaplan Contents Introduction....2 How Big is Big Data?................................................................................

More information

THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA

THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA Abstract THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA Dorina CLICHICI 44 Tatiana COLESNICOVA 45 The purpose of this research is to estimate the impact of several

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

College Readiness LINKING STUDY

College Readiness LINKING STUDY College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

More information

Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information