# Data Mining. Anyone can tell you that it takes hard work, talent, and hours upon hours of

Save this PDF as:

Size: px
Start display at page:

Download "Data Mining. Anyone can tell you that it takes hard work, talent, and hours upon hours of"

## Transcription

1 Seth Rhine Math 382 Shapiro Data Mining Anyone can tell you that it takes hard work, talent, and hours upon hours of watching videos for a professional sports team to be successful. Finding the leaks in their opponent s strategy is the ultimate goal for the coaches and captains watching in-game footage, allowing them to devise plays and make key decisions in future games. In the National Basketball Association (NBA), the coaches have a good share of the work done for them already with the help of Advanced Scout, a program that helps find patterns derived from game statistics, images, and the movements of the players themselves. When a pattern emerges from the data provided, Advanced Scout will let the user know why the patterns are so significant, leading the user toward valuable video clips and sparing him from many hours in front of in-game footage (Palace, 1996). Such a process is not exclusive to Advanced Scout, or even the NBA for that matter. Similar processes are used everyday by parties of many facets, and comprise a fairly recently coined field known as data mining. Data mining is defined as the process of seeking interesting or valuable information within large databases (Hand, et al., 2000, p.111). At first glance, this definition might seem more like a new name for statistics, rather than a new field itself. However, data mining is actually performed on sets of data that are far larger than statistical methods can accurately analyze. Some of data mining s 1

2 methods have been used to analyze data sets containing enough data points that their numbers trail far off into the billions. Realistically, these sets would take too much time, money, and painstaking detail for any human to be expected to look over (Hand, p.113). To aid these slow-pokes in the process, it is necessary that we rely on machines to do most of the dirty work, if not all of it. The mere existence of such data sets is allowed by the advancement of modern technologies, i.e. faster computers, larger hard drives, and improved database software, among other things. Many of the techniques used by statisticians on smaller data sets of a few hundred samples simply do not hold when used on larger sets, and must be improved and expanded upon to successfully mine the data. For instance, a company like Wal-Mart will perform over 7 billion transactions annually. To effectively analyze the buying patterns of a customer purchase database of this size requires much more than the human hand and statistical tactics. Consequently, data mining is actually quite complex, consisting of notions from statistics, pattern recognition, computer programming, algorithms, machine learning, and many other disciplines (Hand, et al, 2000, p ). As for how an organization obtains and uses data, Wal-Mart is a prime example. The multi-billion dollar company uses the history of customer transactions as useable data to help the company develop a marketing strategy based upon the structures that can be derived from it. Such structures can be seen as either a model or a pattern, both of which are highly sought by data mining programs. A model is basically defined to be an overall summary of a set or subset of data, while a pattern is a smaller structure that possibly refers to a number of objects that is relatively small compared to the sample size. 2

3 Fig.1 (Hand, et al, 2000) Essentially, patterns are often defined relative to the overall model of the data set from which it is derived. There are many tools involved in data mining that help find these structures and a few of them are exemplified in the next few paragraphs. Some of the most important tools for an analyst would be clustering, regression, rule extraction, and data visualization. Clustering is the act of partitioning data sets of many random items into subsets of smaller size that show commonality between them (Weisstein, 2010). By looking at such clusters, data miners are able to extract statistical models from the data fields. Regression is defined as a method for fitting a curve through a set of points using some goodness-of-fit criterion (Weisstein, 2010). While examining predefined goodness-of-fit parameters, analysts can locate and describe patterns using regression. Rule extraction is the method of using relationships between 3

4 variables to establish some sort of rule, most likely for use in a marketing strategy. For instance, in a large set of data from point of sale purchases at a grocery store, it may be observed that customers who bought products A and B typically purchase product C, as well. This information could possibly help the grocery store develop a marketing strategy to further increase profits. Data visualization is also a key element to the success of data mining. The samples of data being mined are so vast that scatter plots and histograms will often fall short representing any information of realistic value (see Figure 1). For that very reason, the analysts concerned with data mining are constantly looking for better ways to graphically represent data, such as depicted in Figure 2 on page 5 (Hand, et al, 2000, p. 113). No matter what tools analysts will have at their fingertips, the patterns and models being mined will only be as good in quality as the data that it is being derived from. If a database contains biased data or incomplete data, this will often lead to inaccurate results and a large chance that patterns found will actually be due to chance. Since the source of the data is such a large entity, it is almost certain that there will be missing or corrupted data within the database being mined (Hand, 1998). This is one of the biggest reasons that data mining is looked down upon by some statisticians. Suppose that a tenth of one percent of the sample size contains missing or corrupted data. In a small sample size, the numbers are almost neglected. In a large sample size of one billion items, however, we can see that one million damaged items are hardly something the analyst can ignore. Some data corruption occur before it is to be cleaned up for data mining, such as when the actual data is recorded in the first place. Often the people 4

5 recording the data make mistakes or leave out certain information when filling out the appropriate forms, using applications or computer software, etc (Hand, 1998). Fig. 2 (Hand, et al, 2000) Another big problem with data mining is that the programs used to discern structures must use language that is well defined to the computer. For instance, a computer does not know exactly what to look for in the data sets until programmers define what it is exactly that the computer is looking for. As a consequence, programmers must define exactly what they mean by structure, pattern, usefulness, etc. If we look at market basket analysis, the computer programs in this case are told that it is interesting to find products with very high conditional probabilities. In effect, if the probability of buying product A given that the shopper bought product B already is pretty close to 1, the computer will flag it as a structure (Hand, et al, 2000, pp ). Despite the setbacks and criticism that data mining has received over the years, it nonetheless continues to be a part of the global market. To companies like Wal-Mart, Exxon/Mobil, and other Fortune 500 mainstays, data mining is being revered as a 5

6 valuable marketing tool. In fact, over 40% of the Fortune 500 companies in 2002 said they were developing large data sets with the intent of mining and/or programs to help their company find structures from consumer purchases. Mobil Oil said that they intend to generate and store over 100 terabytes of data concerned with oil exploration. Large companies like these generate enough data such that it can be stored in a data warehouse (Hand, et al, 2000, pp ). By warehousing their data, companies focus on streamlining data from various departments of their company. They do this by extracting data from the departments, then categorizing, trimming, and re-storing the data in its new form. For example, an analyst might look at point-of-sale purchases, where each item of data is recorded with multiple facets such as its price, its cost, the time it was purchased, the store it was purchased from, etc. While a lot of this data is useful, the analyst might only want to know how much money said product is making for the company. To help streamline the analyst s process, data warehousing would have already consolidated the items into various categories, helping the data seem more consistent (Fayyad and Uthurusamy, 2002). Warehousing data gives companies an exciting opportunity to find patterns and create models more readily, and with the storage capacity of computers today, it is a necessary step in the data mining process. But what happens when a company like Wal- Mart records 20 million sales transactions per day, or when Google handles 150 million searches? The information derived from this data is certain to be invaluable to companies that are this large, but by the time standard data warehousing and mining procedures are 6

7 performed, the information can be relatively useless. Mining a day s worth of data in these cases can take up to one day s worth of time! A solution to this problem, and perhaps one of the biggest players in the future of data mining, is mining massive data streams (Domingos and Hulten, 2003). Since these companies encounter such high volume of traffic on any given day, it is important for data mining programmers to focus on new algorithms. Programs meant to analyze a stationary database would take days upon weeks to sift through data storage of this magnitude. Currently, programmers are trying to create algorithms for systems that are continuously on, processing records at the speed they arrive, incorporating them into the model it is building eve if it never sees them again (Domingos and Hulten, 2003). By imposing various bounds and limits on what the program is actually searching for, there are programs that can mine infinite data in finite time, allowing the program to keep up with the data, despite the massive amount of data arriving each minute. Mining such data streams do not come without a cost, however. The data streams coming into to these computer programs are so massive, that they enable analysts to create more advanced models than previously thought capable. Ironically, the programs are created to look at the streaming data only one time before moving on to the next item, resulting in mining only the simplest of models (Domingos and Hulten, 2003). It is also programs like these that are to blame for backlash toward data mining in the recent decade. Information derived from data mining does not come without social implications. 7

8 As Danna and Gandy, Jr point out, consumer profiles are created, sorted, and processed, resulting in consumers being graded, sorted, or excluded from opportunities that others enjoy. For instance, two types of customers are found to exist at a bank using mining techniques high income customers with a moderate risk that they might leave, and low income customers with zero risk of leaving. The bank will then cater to the high income customer, offering special rates on loans or accounts, with the full intent of keeping them around. Since the low income customers have almost no risk of leaving the bank, the bank will continue to offer them the same small incentives that have kept them there in the first place, such as no ATM fees, free checking, etc. The problem with this is that the high income customers receive the same benefits as the low income customer, but also receives special treatment to entice him to stay. Preferential treatment such as this leads to the exclusion that Danna and Gandy, Jr. were talking about. Critics like them call for regulation of consumer privacy and data mining techniques a future battle that data mining might very well have to suit up for as its popularity increases. Its no surprise that companies and organizations are interested in the behaviors of the data they collect. Whether it be point-of-sales information, NASA photos, basketball statistics, or credit profiles, the data proves to be a valuable asset to the organization that chooses to store it and mine it. As algorithms are improved upon and computers become more and more powerful, it is only expected to see further advancements in the field of data mining. 8

### Data Mining. Shahram Hassas Math 382 Professor: Shapiro

Data Mining Shahram Hassas Math 382 Professor: Shapiro Agenda Introduction Major Elements Steps/ Processes Examples Tools used for data mining Advantages and Disadvantages What is Data Mining? Described

### Data Mining Solutions for the Business Environment

Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

### Research Note What is Big Data?

Research Note What is Big Data? By: Devin Luco Copyright 2012, ASA Institute for Risk & Innovation Keywords: Big Data, Database Management, Data Variety, Data Velocity, Data Volume, Structured Data, Unstructured

### Wal-Mart s Data Warehouse

Wal-Mart s Data Warehouse SCODAWA 2006 Patrick Öhlinger Vienna University of Technology June 19, 2006 Abstract Wal-Mart is an exceptional company. As professor Strassmann [Stra06] says, Mal-Mart really

### not think the same. So, the consumer, at the end, is the one that decides if a game is fun or not. Whether a game is a good game.

MR CHU: Thank you. I would like to start off by thanking the Central Policy Unit for the invitation. I was originally from Hong Kong, I left Hong Kong when I was 14 years old, it is good to come back with

### Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on

### Security Tools and Their Unexpected Uses

Security Tools and Their Unexpected Uses Maximizing your security resources can be one rewarding way to extend your resources and visibility into your business. Video surveillance isn t new. Neither is

### A Review of Data Mining Techniques

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

### DATA MINING AND WAREHOUSING CONCEPTS

CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

### Application of the Artificial Society Approach to Multiplayer Online Games: A Case Study on Effects of a Robot Rental Mechanism

Application of the Artificial Society Approach to Multiplayer Online Games: A Case Study on Effects of a Robot Rental Mechanism Ruck Thawonmas and Takeshi Yagome Intelligent Computer Entertainment Laboratory

### Data Intensive Scalable Computing. Harnessing the Power of Cloud Computing

Data Intensive Scalable Computing Harnessing the Power of Cloud Computing Randal E. Bryant February, 2009 Our world is awash in data. Millions of devices generate digital data, an estimated one zettabyte

### Blue: C= 77 M= 24 Y=19 K=0 Font: Avenir. Clockwork LCM Cloud. Technology Whitepaper

Technology Whitepaper Clockwork Solutions, LLC. 1 (800) 994-1336 A Teakwood Capital Company Copyright 2013 TABLE OF CONTENTS Clockwork Solutions Bringing Cloud Technology to the World Clockwork Cloud Computing

### Business Intelligence Solutions for Gaming and Hospitality

Business Intelligence Solutions for Gaming and Hospitality Prepared by: Mario Perkins Qualex Consulting Services, Inc. Suzanne Fiero SAS Objective Summary 2 Objective Summary The rise in popularity and

### Application of Business Intelligence in Transportation for a Transportation Service Provider

Application of Business Intelligence in Transportation for a Transportation Service Provider Mohamed Sheriff Business Analyst Satyam Computer Services Ltd Email: mohameda_sheriff@satyam.com, mail2sheriff@sify.com

### Lead Generation for Logistics Services: Who s Job Is It, Anyway?

Lead Generation for Logistics Services: Who s Job Is It, Anyway? Asking salespeople to fill, as well as close, the sales pipeline can lead to inefficiency, poor results and attrition. 1 During a phone

### Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

### Healthcare Measurement Analysis Using Data mining Techniques

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

### Data Analytics in Cloud Computing

Executive Summary Businesses have long used data analytics to help direct their strategy to maximize profits. Ideally data analytics helps eliminate much of the guesswork involved in trying to understand

### Fair Price. Math 5 Crew. Department of Mathematics Dartmouth College. Fair Price p.1/??

Fair Price p.1/?? Fair Price Math 5 Crew Department of Mathematics Dartmouth College Fair Price p.2/?? Historical Perspective We are about ready to explore probability form the point of view of a free

### Introduction to Data Mining

Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

### INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

FINDINGS 1 INDEX 1 2 3 4 Introduction Page 3 Methodology Page 4 Findings Page 5 Conclusion Page 10 INTRODUCTION Our 2016 Data Scientist report is a follow up to last year s effort. Our aim was to survey

### A Beginner s Guide to Financial Freedom through the Stock-market. Includes The 6 Steps to Successful Investing

A Beginner s Guide to Financial Freedom through the Stock-market Includes The 6 Steps to Successful Investing By Marcus de Maria The experts at teaching beginners how to make money in stocks Web-site:

### Capturing Meaningful Competitive Intelligence from the Social Media Movement

Capturing Meaningful Competitive Intelligence from the Social Media Movement Social media has evolved from a creative marketing medium and networking resource to a goldmine for robust competitive intelligence

### Information Stewardship: Moving From Big Data to Big Value

Information Stewardship: Moving From Big Data to Big Value By John Burke Principal Research Analyst, Nemertes Research Executive Summary Big data stresses tools, networks, and storage infrastructures.

### NO LUCK NEEDED. How the Right Data Can Improve Casino Marketing Campaigns

GAMING/CASINO DATA MARKETING WHITE PAPER NO LUCK NEEDED. How the Right Data Can Improve Casino Marketing Campaigns V12 Group 141 West Front Street Suite 410 Red Bank, NJ 07701 1-866-842-1001 www.v12groupinc.com

### Creating an Effective Mystery Shopping Program Best Practices

Creating an Effective Mystery Shopping Program Best Practices BEST PRACTICE GUIDE Congratulations! If you are reading this paper, it s likely that you are seriously considering implementing a mystery shop

### Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

### Perspectives on Data Mining

Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery

### 20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing

### INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

### Formal Methods for Preserving Privacy for Big Data Extraction Software

Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

### THE WHE TO PLAY. Teacher s Guide Getting Started. Shereen Khan & Fayad Ali Trinidad and Tobago

Teacher s Guide Getting Started Shereen Khan & Fayad Ali Trinidad and Tobago Purpose In this two-day lesson, students develop different strategies to play a game in order to win. In particular, they will

### Outline. What is Big data and where they come from? How we deal with Big data?

What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,

### A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS Gary Kader and Mike Perry Appalachian State University USA This paper will describe a content-pedagogy course designed to prepare elementary

### Banking On A Customer-Centric Approach To Data

Banking On A Customer-Centric Approach To Data Putting Content into Context to Enhance Customer Lifetime Value No matter which company they interact with, consumers today have far greater expectations

### The Power of Social Media in Marketing

The Power of Social Media in Marketing 1 Contents Executive Summary...3 What is Social Media Marketing?...3 Importance of Social Media Marketing...4 Promoting Through Social Media...5 Social Media Channels/

### Battleships Searching Algorithms

Activity 6 Battleships Searching Algorithms Summary Computers are often required to find information in large collections of data. They need to develop quick and efficient ways of doing this. This activity

### ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

### Data Mining and Statistics: What is the Connection?

This article appeared in The Data Administration Newsletter 30.0, October 2004 (www.tdan.com). Data Mining and Statistics: What is the Connection? Dr. Diego Kuonen Statoo Consulting, PSE-B, 1015 Lausanne

### International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

### MBA 8473 - Data Mining & Knowledge Discovery

MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various

### Using Data Mining to Detect Insurance Fraud

IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: combines powerful analytical techniques with existing fraud detection and prevention efforts

### Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project

Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project This document describes the optional class project for the Fall 2013 offering of CS191x. The project will not be graded.

### Big Data Just Noise or Does it Matter?

Big Data Just Noise or Does it Matter? Opportunities for Continuous Auditing Presented by: Solon Angel Product Manager Servers The CaseWare Group. Founded in 1988. An industry leader in providing technology

### Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the

### Data Mining for Knowledge Management in Technology Enhanced Learning

Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

### Overview of Pricing Research

Overview of Pricing Research by Keith Chrzan, Director of Marketing Sciences, Maritz Research 2011 Maritz All rights reserved Introduction Marketers take obvious risks when pricing new productsor services,

### Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

### Using Data Mining to Detect Insurance Fraud

IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: Combine powerful analytical techniques with existing fraud detection and prevention efforts Build

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### 10 Tips on How to Plan a Successful Internet Business. Robert Rustici

10 Tips on How to Plan a Successful Internet Business Robert Rustici 1. Define Your Business Type - Going Outside of the Box Will Cost You When planning to create an Internet Business there are three common

### Big Data Big Deal? Salford Systems www.salford-systems.com

Big Data Big Deal? Salford Systems www.salford-systems.com 2015 Copyright Salford Systems 2010-2015 Big Data Is The New In Thing Google trends as of September 24, 2015 Difficult to read trade press without

### Big Data. Fast Forward. Putting data to productive use

Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

### TEST 2 STUDY GUIDE. 1. Consider the data shown below.

2006 by The Arizona Board of Regents for The University of Arizona All rights reserved Business Mathematics I TEST 2 STUDY GUIDE 1 Consider the data shown below (a) Fill in the Frequency and Relative Frequency

### APPROACHABLE ANALYTICS MAKING SENSE OF DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for

### Sunnie Chung. Cleveland State University

Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

### Data Mining in Telecommunication

Data Mining in Telecommunication Mohsin Nadaf & Vidya Kadam Department of IT, Trinity College of Engineering & Research, Pune, India E-mail : mohsinanadaf@gmail.com Abstract Telecommunication is one of

### Gold. Mining for Information

Mining for Information Gold Data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way Joseph M. Firestone, Ph.D. During the late 1980s,

### Data Mining: Overview. What is Data Mining?

Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

### not possible or was possible at a high cost for collecting the data.

Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

### Today s mobile ecosystem means shared responsibility

It seems just about everybody has a mobile phone now, including more than three-quarters of U.S. teens and a rapidly growing number of younger kids. For young people as well as adults, the technology has

### A Perspective on Statistical Tools for Data Mining Applications

A Perspective on Statistical Tools for Data Mining Applications David M. Rocke Center for Image Processing and Integrated Computing University of California, Davis Statistics and Data Mining Statistics

### Information Management course

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

### Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

### Dynamic Data in terms of Data Mining Streams

International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

### Strategic Online Advertising: Modeling Internet User Behavior with

2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

### Student-Athletes. Guide to. College Recruitment

A Student-Athletes Guide to College Recruitment 2 Table of Contents Welcome Letter 3 Guidelines for Marketing Yourself as an Athlete 4 Time Line for Marketing Yourself as an Athlete 4 6 Questions to Ask

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### Quantitative Methods Workshop. Graphical Methods for Investigating Missing Data

Quantitative Methods Workshop Graphical Methods for Investigating Missing Data Graeme Hutcheson School of Education University of Manchester missing data data imputation missing data Data sets with missing

### Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

### ! Insurance and Gambling

2009-8-18 0 Insurance and Gambling Eric Hehner Gambling works as follows. You pay some money to the house. Then a random event is observed; it may be the roll of some dice, the draw of some cards, or the

### MACHINE LEARNING BASICS WITH R

MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

### Working with telecommunications

Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

### DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you. JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition

### Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

### Research on consumer attitude and effectiveness of advertising in computer and video games

Research on consumer attitude and effectiveness of advertising in computer and video games (Summary) Zhana Belcheva Master program Advertising Management, New Bulgarian University, Bulgaria In a world

### AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

### Buyer s Guide to Big Data Integration

SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

### NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

### Statistics for BIG data

Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

### Betting with the Kelly Criterion

Betting with the Kelly Criterion Jane June 2, 2010 Contents 1 Introduction 2 2 Kelly Criterion 2 3 The Stock Market 3 4 Simulations 5 5 Conclusion 8 1 Page 2 of 9 1 Introduction Gambling in all forms,

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft

### The Orthopaedic Surgeon Online Reputation & SEO Guide

The Texas Orthopaedic Association Presents: The Orthopaedic Surgeon Online Reputation & SEO Guide 1 Provided By: the Texas Orthopaedic Association This physician rating and SEO guide was paid for by the

### This Method will show you exactly how you can profit from this specific online casino and beat them at their own game.

This Method will show you exactly how you can profit from this specific online casino and beat them at their own game. It s NOT complicated, and you DON T need a degree in mathematics or statistics to

### NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

### Data Virtualization: Achieve Better Business Outcomes, Faster

White Paper Data Virtualization: Achieve Better Business Outcomes, Faster What You Will Learn Over the past decade, businesses have made tremendous investments in information capture, storage, and analysis.

### Take Control of your future with this residual income, home based business.

Take Control of your future with this residual income, home based business. Who is your online niche business? We re in the business of making your life better by helping you earn a part time income working

### Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

### The Truth About Music Licensing in Europe

The Truth About Music Licensing in Europe European consumers today have access to a greater variety of music in different formats and price points than ever before. Online licensing in the music sector

### Concept and Applications of Data Mining. Week 1

Concept and Applications of Data Mining Week 1 Topics Introduction Syllabus Data Mining Concepts Team Organization Introduction Session Your name and major The dfiiti definition of dt data mining i Your

### INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining

### Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially