Data mining is described as the method of comparing large volumes of data, looking

Similar documents
Data Mining. Anyone can tell you that it takes hard work, talent, and hours upon hours of

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Introduction to Data Mining

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Foundations of Business Intelligence: Databases and Information Management

Introduction. A. Bellaachia Page: 1

Fluency With Information Technology CSE100/IMT100

The Data Mining Process

OUTLIER ANALYSIS. Data Mining 1

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Information Management course

Introduction to Data Mining

A Review of Data Mining Techniques

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Database Marketing, Business Intelligence and Knowledge Discovery

1Current. Today distribution channels to the public have. situation and problems

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

III JORNADAS DE DATA MINING

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras

Data Mining System, Functionalities and Applications: A Radical Review

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Chapter 12 Discovering New Knowledge Data Mining

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Building Data Warehousing and Data Mining from Course Management Systems: A Case Study of FUTA Course Management Information Systems

Building Data Cubes and Mining Them. Jelena Jovanovic

Data Mining Part 5. Prediction

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Role of Social Networking in Marketing using Data Mining

Data Mining: Overview. What is Data Mining?

Self-Improving Supply Chains

DATA MINING TECHNIQUES AND APPLICATIONS

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Foundations of Business Intelligence: Databases and Information Management

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Knowledge Discovery and Data. Data Mining vs. OLAP

Data Mining Solutions for the Business Environment

Data, Measurements, Features

Data Mining. Craig Chomsky, Marek Dvorak

Chapter 4 Getting Started with Business Intelligence

Foundations of Business Intelligence: Databases and Information Management

2.1. Data Mining for Biomedical and DNA data analysis

CONTEMPORARY DECISION SUPPORT AND KNOWLEDGE MANAGEMENT TECHNOLOGIES

Chapter 20: Data Analysis

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

An Overview of Knowledge Discovery Database and Data mining Techniques

ETPL Extract, Transform, Predict and Load

Statistics for BIG data

Dynamic Data in terms of Data Mining Streams

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Data Mining Techniques

BUILDING DATA WAREHOUSING AND DATA MINING FROM COURSE MANAGEMENT SYSTEMS: A

Basics of Dimensional Modeling

Customer Classification And Prediction Based On Data Mining Technique

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Foundations of Business Intelligence: Databases and Information Management

White Paper April 2006

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

Introduction to Data Mining

A New Approach for Evaluation of Data Mining Techniques

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Azure Machine Learning, SQL Data Mining and R

Master of Science in Health Information Technology Degree Curriculum

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Responsible Gambling Model at Veikkaus

When to consider OLAP?

Nuggets and Data Mining

Data Mining: An Overview. David Madigan

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

SocrateBI. Functionality overview

Foundations of Artificial Intelligence. Introduction to Data Mining

Course MIS. Foundations of Business Intelligence

Analyzing the footsteps of your customers

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

TIM 50 - Business Information Systems

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Data Mining: An Introduction

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT

Data Mining Applications in Fund Raising

Pentaho Data Mining Last Modified on January 22, 2007

Applications and Trends in Data Mining

A Survey on Web Research for Data Mining

Overview, Goals, & Introductions

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Foundations of Business Intelligence: Databases and Information Management

Hexaware E-book on Predictive Analytics

Obtaining Value from Big Data

Transforming the Telecoms Business using Big Data and Analytics

Comparing Methods to Identify Defect Reports in a Change Management Database

MBA Data Mining & Knowledge Discovery

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Transcription:

Data Mining Shahram Hassas California State University, Northridge General Terms: Data Mining Additional Key Words and Phrases: Data Mining Data mining is described as the method of comparing large volumes of data, looking for more information from the data (in statistics, data are any facts, numbers, or text that can be processed by a computer), and it is defined as the process of analyzing data from different perspectives and summarizing it into useful information which can be used to increase revenue, and cut costs. Today, data mining is considered an important tool by modern businesses to transform data into business intelligence, giving an informational advantage. Basically, a primary reason for using data mining is to assist in the analysis of collections Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 20YY ACM 0000-0000/20YY/0000-0001 $5.00 ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 1 0??.

2 Shahram Hassas of observations and finding correlations among all of the fields and relationships between the facts in databases. It enables companies to determine relationships among the internal factors such as price, cost, product, or staff skills, inventory, payroll, accounting and corporate profits and external factors such as economic status, competition, and customer interest and their satisfaction. For example, by using data mining, a grocery chain store discovered that when men bought diapers on Thursdays and Saturdays, they also attempt to buy beer. It is also showed that these shoppers typically did their weekly grocery shopping on Saturdays and only bought a few items on Thursdays. The grocery chain used this newly discovered information in various ways to increase revenue for instance; they moved the beers closer to the diapers and, they could make sure beer and diapers were sold at full price on Thursdays. This example can be identified as associative mining. Data mining is the results of Classical statistics, artificial intelligence and machine learning. Classical statistical plays the main role in data mining, the concepts such as regression analysis (a method for fitting a curve through a set of points using some goodness of fit criterion), standard distribution, standard deviation, standard variance, discriminant analysis, cluster analysis, and confidence intervals, all of which are used to study data and data relationships. Artificial intelligence applies human interest processing to statistical problems. AI concepts have been adopted by some high end commercial products, such as query optimization modules for Relational Database Management Systems (RDBMS). [?] Machine learning attempts to let software learn about the data they study, such that future decisions are based on the quality of the studied data. [?] Data mining consists of five

3 major elements in order to work properly: I. Extract, transform, and load transaction data onto the data warehouse system. II. Store and manage the data in a multidimensional database system. III. Provide data access to business analysts and information technology professionals. IV. Analyze the data using application software. V. Present the data in a readable format. WalMart Company captures point of sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identify new merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries. Data mining applications goal are prediction and it allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships by exploration or data preparation which may involve cleaning data, data transformations, then considering various models and choosing the best one based on prediction apply it to new data in order to generate predictions or estimates of the expected result. For example, The National Basketball Association (NBA) is exploring a data mining application that can combine and analyze the records of basketball games by using advanced Scout software. This program helps to find patterns derived from game statistics, images, and the movements of the players. [?] Today, data mining applications are available on all size systems for, client server, and PC platforms and the prices range from several thousand dollars for the smallest applications up to

4 Shahram Hassas $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. Data mining has been used in area of science and engineering, such as bioinformatics, genetics, medicine, education, agricultural, electrical power engineering, and law enforcement. In the field of human genetics, usage of data mining attempts to find out how the changes in an individual s DNA sequence affect the risk of developing common diseases such as cancer. This is very important to help improve the diagnosis, prevention and treatment of the diseases. The data mining technique that is used to perform this task is known as multifactor dimensionality reduction. http://www.niehs.nih.gov/research/resource/databases/gac/guid.cfm One of the most recent research topics in data mining is in agricultural industry same as in the area of electrical power engineering. In electrical power engineering data mining techniques have been used for condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to obtain valuable information on the insulation s health status of the equipment. Data mining techniques have also been applied for dissolved gas analysis on power transformers. Also data mining has been used in education researches to study the factors leading students to choose to engage in behaviors which reduce their learning and to understand the factors influencing university student. Law enforcement also taking advantage of data mining techniques which provides them some information can identify criminal suspects as well as catching these criminals by crime type, habit, and other patterns of behaviors. In general, Data mining

5 techniques in different areas speed up data analyzing process; thus, allowing them more time to work on other projects. There is an article in UCLA Anderson library regarding how Blockbuster mines its video rental history database to recommend rentals to individual customers. American Express also suggests products to its cardholders based on analysis of their monthly expenditures. There is also some disadvantages in data mining which one of the main disadvantages is privacy Issues, for example, American Express sold their customers credit card purchases to another company or according to Washing Post, in 1998, CVS had sold their patient s prescription purchases to a different companies. Frauds because of lack of security are another disadvantage of data mining. Companies record and save customers personal information online, and they may not have sufficient security systems in place to protect that information. Data mining also cause Improper, unlawful customer service, since companies have access to customer s records, they may discriminate customers based on purchase history. If you have spent a lot of money or buying a lot of product from one company, your call will be answered soon. So you should not think that your call is really being answer in the order in which it was receive. Data mining techniques contains Classical Techniques and Next Generation Techniques. Classical techniques contain as Classes and clustering. Classes is locating stored data. Clustering is the process of grouping data items according to logical relationships or consumer preferences. Next generation techniques contain Rules and Networks Trees. Each branch is a classification question and the leaves of the tree are partitions of the dataset with their classification. Decision trees can be viewed as segmentations of the original dataset where each segment would be one of the leaves of the tree. The decision tree technology can be used for exploration of datasets or business problems. This is often done by looking at the predictors and values that are chosen for each split of the tree. Often times these predictors provide usable insights or propose questions that need to be answered. Classification tree analysis is a term used when the pre-

6 Shahram Hassas dicted outcome is the class to which the data belongs. Regression tree analysis is a term used when the predicted outcome can be considered a real number. CART analysis is a term used to refer to both of the above procedures. Sometime data mining may impose patterns on data where none exist. This imposition of irrelevant correlation is called data dredging or data fishing. Data dredging is described as seeking more information from a data set than it actually contains. Data dredging is results in relationships between variables announced as significant when, in fact, the data require more study before such an association can be determined. Large data sets invariably happen to have some unusual exciting relationships to that data. Therefore any conclusions reached are likely to be highly suspected. In conclusion, data mining is a powerful asset to the organizations and corporations that mine their data. Data mining has been used in different area of science, engineering, research, education, sports and law enforcement. Today, data mining consider as an important tool by modern business to transform data into business intelligence giving an informational advantage. Although, data mining techniques has been a valuable contribution to modern society, but, there is some disadvantage such as privacy issues, security, and Misuse of information by some company based on customer purchase history. REFERENCES

7 Weisstein, E. Mathematica Wolfram. Head, D. Data Mining: Statistics and More? The American Statistician, Vol 52.