Applying Data Mining Technique to Sales Forecast



Similar documents
Chapter 12 Discovering New Knowledge Data Mining

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining Solutions for the Business Environment

Application of Data mining in predicting cell phones Subscribers Behavior Employing the Contact pattern

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

A Decision Tree for Weather Prediction

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Social Media Mining. Data Mining Essentials

Financial Profiling for Detecting Operational Risk by Data Mining

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Data Mining for Knowledge Management. Classification

Classification and Prediction

Easily Identify Your Best Customers

Keywords data mining, prediction techniques, decision making.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

The Analytical Revolution

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Predictive Dynamix Inc

Web Document Clustering

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Importance or the Role of Data Warehousing and Data Mining in Business Applications

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

ANALYSIS OF INDIAN WEATHER DATA SETS USING DATA MINING TECHNIQUES

Development of an Object-oriented Framework for Environmental Information Management Systems in Horticulture

An Overview of Knowledge Discovery Database and Data mining Techniques

Discretization and grouping: preprocessing steps for Data Mining

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Research of Postal Data mining system based on big data

Data Mining Practical Machine Learning Tools and Techniques

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees What Are They?

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Machine Learning: Overview

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Data mining usage in health care management: literature survey and decision tree application

not possible or was possible at a high cost for collecting the data.

Timişoara, Romania, Str. Agricultorilor nr. 40 Tel: Fax:

Weather forecast prediction: a Data Mining application

A Review of Data Mining Techniques

BIRCH: An Efficient Data Clustering Method For Very Large Databases

Information Management course

Rule based Classification of BSE Stock Data with Data Mining

An Introduction to Data Mining

Decision Tree Learning on Very Large Data Sets

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

KNOWLEDGE BASE DATA MINING FOR BUSINESS INTELLIGENCE

How To Use Neural Networks In Data Mining

COMPANY PROFILE. Net Sales TL Mn. EBITDA TL 39,2 Mn. Net Income TL 29.5 Mn. Net Sales TL Mn. EBITDA TL 79,5 Mn. Net Income TL 62.

Studying Auto Insurance Data

Efficiency in Software Development Projects

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Missouri Soybean Economic Impact Report

Data Mining Methods: Applications for Institutional Research

Big Data Analytics and Decision Analysis for Manufacturing Intelligence to Empower Industry 3.5

A New Approach for Evaluation of Data Mining Techniques

Data mining application in banking sector with clustering and classification methods

Data mining techniques: decision trees

Data quality in Accounting Information Systems

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Data Mining Techniques for Banking Applications

Predicting Student Performance by Using Data Mining Methods for Classification

Data Mining Techniques

Applying Data Mining Techniques in Property~Casualty Insurance. Lijia Guo, Ph.D., ASA

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Data Preprocessing. Week 2

A Property & Casualty Insurance Predictive Modeling Process in SAS

DATA MINING TECHNIQUES AND APPLICATIONS

MBA Data Mining & Knowledge Discovery

Strategic Management System for Effective Health Care Planning (SMS-EHCP)

Course Syllabus. Purposes of Course:

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Classification algorithm in Data mining: An Overview

A fast, powerful data mining workbench designed for small to midsize organizations

CHAID Decision Tree: Reverse Mortgage Loan Termination Example

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining Classification: Decision Trees

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

Using Data Mining Techniques to Increase Efficiency of Customer Relationship Management Process

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

DATA MINING METHODS WITH TREES

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

Intelligent profitable customers segmentation system based on business intelligence tools

Data Analysis. Management Information Systems 13

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

Transcription:

Applying Data Mining Technique to Sales Forecast 1 Erkin Guler, 2 Taner Ersoz and 1 Filiz Ersoz 1 Karabuk University, Department of Industrial Engineering, Karabuk, Turkey erkn.gler@yahoo.com, fersoz@karabuk.edu.tr 2 Karabuk University, Department of Actuarial and Risk Management, Karabuk, Turkey tanerersoz@karabuk.edu.tr Abstract This paper addresses the issues and technique for Agricultural Machinery and sale prices of the machines using data mining techniques. Data mining means the efficient discovery of previously unknown patterns in large databases. It is an interactive information discovery process that includes data acquisition, preprocessing data, data exploration and model building, interpretation and evaluation. In this study, method of data mining have been applied to the sales data of agricultural machinery products that were obtained from CANSA company in between 2011-2013. Data mining techniques to the data obtained from the CHAID algorithm was applied. Classification technique based analyses were used while data mining and decision model about sale amounts and variables affecting sales were found by this method. According to the analysis results; the R&D spending increases, the amount of agricultural machinery sales is increasing. Keywords: Agricultural Machinery, Sales, Data Mining, Decision Trees, Chaid Algorithm 1 Introduction Turkey is one of the few countries which has the most fertile land in the world. In Anatolia many great civilations were founded; agriculture has been the source of livelihood in these soils for inhabitants for centuries allowing them to develop their culture. Agriculture is the process of obtaining any product with soil agricultural activities. It is still used as a firstclass source of income in many countries helping indirectly development that also produce their own food in the country, out in the increase of exports in a country. However, the importance of agriculture is decreased and has been replaced by mechanization after the industrial revolution in Europe. With the begin of industrial mass production the need for manpower has been reduced. The mechanization of agriculture continues that industrialization contrary to reduce manpower in the fields, the mechanization of agriculture became a new source of employment for people in factories. Agricultural Machinery Industry is becoming important to achieve more efficient products, in less time, in terms of operations from year to year, together with the development of agriculture in the world. This sector holds an important place in Turkey's exports. The agricultural machinery sector in Turkey is quite lively. Agricultural machinery manufacturing industry is a sub-sector of the manufacturing industry that producing investment goods. The main objective is to obtain high yields for per unit area, improvement of living and business conditions in agricultural production. However, as a result of intensive agricultural activities, especially in developing countries soil erosion, salinization, soil compaction, drought are emerging. 1 The purpose of this studying, is to determine a company's operating in the sector of agricultural machinery how its sales amount changes depending on which various variable. itherpssibe splits are ed or ls 1

heuristic search is used. 2 Material and Method This reseach material constitutes sales data between 2011-2013 from operating a company in the agricultural machinery industry in Adana, Turkey. In the study, taking the amount of sales as dependent variable price, temperature (Seasonality), monthly rainfall, monthly inflation, promotional expenses, R&D (Research and Development) expenditures, competitive pricing, advertising with the help of the relations between the CHAID analysis were analyzed. Classification technique based analyses were used while data mining and decision model about sale amounts and variables affecting sales were found by this method. Data mining is an interdisciplinary study that uses statistics, database technology, machine learning, artificial intelligence and of visualizing. 2 A data mining process generally includes the following four steps: Data acquisition, preprocessing data, data exploration and model building, interpretation and evaluation. Classification technique based analyses were used while data mining and decision model about sale amounts and variables affecting sales were found by this method. The goal of classification is to develop a model that maps a data item into one of several predefined classes. Once developed, the model is used to classify a new instance into one of the classes. Decision trees are part of the Induction class of DM techniques. An empirical tree represents a segmentation of the data that is created by applying a series of simple rules. Each rule assigns an observation to a segment based on the value of one input. One rule is applied after another, resulting in a hierarchy of segments within segments. The hierarchy is called a tree, and each segment is called a node. The original segment contains the entire data set and is called the root node of the tree. A node with all its successors forms a branch of the node that created it. The final nodes are called leaves. For each leaf, a decision is made and applied to all observations in the leaf. The type of decision depends on the context. In predictive modeling, the decision is simply the predicted value. Specific decision tree method include the count or Chi-squared Automatic Interaction Detection (CHAID) algorithm. CHAID is decision tree techniques used to classify a data set. The following discussion provides a brief description of the CHAID algorithm for building decision trees. For CHAID, the inputs are either nominal or ordinal. The IBM SPSS Modeler packages accept interval inputs and automatically group the values into ranges before growing the tree. For nodes with many observations, the algorithm uses a sample for the split search, for computing the worth and for observing the limit on the minimum size of a branch. The samples in different nodes are taken independently. For binary splits on binary or interval targets, the optimal split is always found. For other situations, the data is first consolidated. A nd 3 Application The purpose of this studying, is to determine a company's operating in the sector of agricultural machinery how its sales amount changes depending on which various variable. The price of the product, which is in the same industry to do research on competing firms to increase the sale price of their machines, the amount of the monthly promotion spending, monthly the amount of research and development activities as an argument made and used. In addition to the model, between the years 2011-2013 of monthly precipitation amounts, monthly temperature values, monthly inflation rates in Turkey and the Turkish Liras has been included in the foreign 2

markets, too. For pre-processing activities in the modeling study data cleansing, merging and conversion process has been done through IBM SPSS Statistics program. Work on the data in IBM SPSS Modeler 11.1, Decision Tree algorithm was applied CHAID algorithms. In this study, the sales amount s minimum and maximum value have found minimum 42 and maximum 1462 units. The average of monthly sales amounted to 358 units. 3.1 Chaid Analysis Results of Data Mining Chaid analysis applied with the algorithm as a result of the dependent variable is the most important variable affecting the amount of agricultural machinery sales, R&D has been seen. Other variables, in order of importance; temperature, the amount of the monthly rainfall promotion and inflation. In the model, the estimation of the amount of sales as the dependent variable selected results are shown below. Variables that most influence the amount of sales in R&D expenditure was found. There are 37 nodes in the analysis. Decision tree all notes are as Figure 3-1. Fig 3-1 Decision Tree All Nodes 3

The detailed results are given to nodes of 13, 28 and 35. The amount of sales forecasts has emerged as the average value, the smallest and largest. Decision tree sales amounts (Minumum) node of 13 is as Figure 3-2. Fig. 0-2 Decision Tree Node of the 13 and Estimated Sales Amount of 13. 4

In the case of being monthly temperature 23,3 C and > 27,3 C, Reseach & Development 2.500 TL and > 2.750 TL, promotion 2.000 TL and Research & Development 2.750 TL; estimated sales amount will be 45 unit as calculated. Decision tree sales amounts (Average) node of 28 is as Figure 3-3. Fig. 0-3 Decision Tree Node of the 28 and Estimated Sales Amount of 28. In the case of being monthly rainfall > 63 mm 3, Reseach & Development 27.500 TL and > 41.250 TL; estimated sales amount will be 462 unit as calculated. Decision tree sales amounts (Maximum) node of 35 is as Figure 3-4. Fig. 0-4 Decision Tree Node of the 35 and Estimated Sales Amount of 35. In the case of being monthly temperature 13,6 C, Reseach & Development > 63.250 TL; estimated sales amount will be 1462 unit as calculated. 5

4 Conclusion and Evaluation The study focused on classification using data mining methods and the amount of sales on a monthly basis in 2011-2013, which were obtained using the classifier model. In this study, the amount of sales in Agricultural Machinery variables affecting the sector, data mining methods have been tried to predict. The data used in this study is taken from CANSA Agricultural Machinery Co. These coefficients are multiplied by a certain coefficient given due to confidentiality requirements modeling was conducted. This information is evaluated in various ways, the connection between the independent variables as the dependent variable were performed to determine their effect. In this study, the sales amount s minimum and maximum value have found minimum 42 and maximum 1462 units. The average of monthly sales amounted to 358 units. The main purpose of the development of the models, the coefficient of the independent variables in the model by looking at how many sales they can occur is to create a model that can predict. In CHAID analysis, in order of importance; R&D spending, promotional expenses, temperature, monthly rainfall and inflation rate were significant and included in the model. According to the analysis results, we may state: the R&D spending increases, the amount of agricultural machinery sales is increasing. There are some obstacles in front of the agricultural machines and data mining applications. Primarily; in Turkey, companies operating in the agricultural machinery sector can not be fully institutionalized. Because of inability of the company to be institutionalized, difficulties in providing data are experienced. In further analyses conducted in the future, the authors hope to continue studies that could lead to further researches and, at the end, strengthened sustainable growth and improved living conditions, in Turkey and other emerging and developing countries. References 1. Turkish Association of Agricultural Machinery and Equipment Manufacturer Magazine, 2008. 2. Ordu B, A Predictive Model For The Manufacturing Of Long-Rolled Products In Iron-Steel Industry with Classifications Techniques Of Data Mining, Karabuk University Social Sciences Institute, Department of Business Administration, Master Thesis, 2013. 6