Data Mining and Statistics: What is the Connection?
|
|
|
- Cecilia Quinn
- 10 years ago
- Views:
Transcription
1 This article appeared in The Data Administration Newsletter 30.0, October 2004 ( Data Mining and Statistics: What is the Connection? Dr. Diego Kuonen Statoo Consulting, PSE-B, 1015 Lausanne 15, Switzerland The field of data mining, like statistics, concerns itself with learning from data or turning data into information. In this article we will look at the connection between data mining and statistics, and ask ourselves whether data mining is statistical déjà vu. What is statistics and why is statistics needed? Statistics is the science of learning from data. It includes everything from planning for the collection of data and subsequent data management to end-of-the-line activities such as drawing inferences from numerical facts called data and presentation of results. Statistics is concerned with one of the most basic of human needs: the need to find out more about the world and how it operates in face of variation and uncertainty. Because of the increasing use of statistics, it has become very important to understand and practise statistical thinking. Or, in the words of H. G. Wells: Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. But, why is statistics needed? Knowledge is what we know. Information is the communication of knowledge. Data are known to be crude information and not knowledge by themselves. The sequence from data to knowledge is as follows: from data to information (data become information when they become relevant to the decision problem); from information to facts (information becomes facts when the data can support it); and finally, from facts to knowledge (facts become knowledge when they are used in the successful completion of the decision process). Figure 1 illustrates this statistical thinking process based on data in constructing statistical models for decision making under uncertainties. That is why we need statistics. Statistics arose from the need to place knowledge on a systematic evidence base. This required a study of the laws of probability, the development of measures of data properties and relationships, and so on. Figure 1. The statistical thinking process based on data in constructing statistical models for decision making under uncertainties. 1 of 6
2 What is data mining? Data mining has been defined in almost as many ways as there are authors who have written about it. Because it sits at the interface between statistics, computer science, artificial intelligence, machine learning, database management and data visualization (to name some of the fields), the definition changes with the perspective of the user: Data mining is the process of exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules. (M. J. A. Berry and G. S. Linoff) Data mining is finding interesting structure (patterns, statistical models, relationships) in databases. (U. Fayyad, S. Chaudhuri and P. Bradley) Data mining is the application of statistics in the form of exploratory data analysis and predictive models to reveal patterns and trends in very large data sets. ( Insightful Miner 3.0 User Guide ) We think of data mining as the process of identifying valid, novel, potentially useful, and ultimately comprehensible understandable patterns or models in data to make crucial business decisions. Valid means that the patterns hold in general, novel that we did not know the pattern beforehand, and understandable means that we can interpret and comprehend the patterns. Hence, like statistics, data mining is not only modelling and prediction, nor a product that can be bought, but a whole problem solving cycle/process that must be mastered through team effort. Defining the right business problem is the trickiest part of successful data mining because it is exclusively a communication problem. The technical people analyzing data need to understand what the business really needs. Even the most advanced algorithms cannot figure out what is most important. Never forget that garbage in yields garbage out. Data preprocessing or data cleaning or data preparation is also a key part of data mining. Quality decisions and quality mining results come from quality data. Data are always dirty and are not ready for data mining in the real world. For example, data need to be integrated from different sources; data contain missing values. i.e. incomplete data; data are noisy, i.e. contain outliers or errors, and inconsistent values (i.e. contain discrepancies in codes or names); data are not at the right level of aggregation. The main part of data mining is concerned with the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. It is the computer which is responsible for finding the patterns by identifying the underlying rules and features in the data. The choice of a particular combination of techniques to apply in a particular situation depends on both the nature of the data mining task to be accomplished and the nature of the available data. The idea is that it is possible to strike gold in unexpected places as the data mining software extracts patterns not previously discernible or so obvious that no-one has noticed them before. The analysis process starts with a set of data, uses a methodology to develop an optimal representation of the structure of the data during which time knowledge is acquired. Once knowledge has been acquired this can be extended to larger sets of data working on the assumption that the larger data set has a structure similar to the sample data. This is analogous to a mining operation where large amounts of low grade materials are sifted through in order to find something of value. This sounds familiar, doesn t it? First, recall that we defined statistics as the science of learning from data. Second, remember that the main sequence from data to knowledge is: from data to information, and from information to knowledge. Let us briefly illustrate this sequence. Data are what we can capture and store (e.g. customer data, store data, demographical data, geographical data), and become information when they become relevant to our decision problem. Information relates items of data (e.g. X lives in Z; S is Y years old; X and S moved; W has money in Z), and becomes knowledge when it is used in the successful completion of the decision process. Hence knowledge relates items of information (e.g. a quantity Q of product A is used in region Z; customers of class L use N% of C during period D). The latter is indeed a fragment of the so-called business intelligence chain: from 2 of 6
3 data to information, from information to knowledge, from knowledge to decision, and from decision to action (e.g. decisions: promote product A in region Z; mail ads to families of profile P; cross-sell service B to clients E). As we see, the main problem is to know how to get from data to knowledge, or, as J. Naisbitt said: We are drowning in information but starved for knowledge. The remedy to this problem is data mining and/or statistics. With data mining, companies can analyze customers' past behaviours in order to make strategic decisions for the future. Keep in mind, however, that the data mining techniques and tools are equally applicable in fields ranging from law enforcement to radio astronomy, medicine and industrial process control (to name some of the fields). Why data mining? Data mining got its start in what is now known as customer relationship management (CRM). It is widely recognized that companies of all sizes need to learn to emulate what small, service-oriented businesses have always done well creating one-to-one relationships with their customers. In every industry, forward-looking companies are trying to move towards the one-to-one ideal of understanding each customer individually and to use that understanding to make it easier for the customer to do business with them rather than with a competitor. These same companies are learning to look at the lifetime value of each customer so they know which ones are worth investing money and effort to hold on to and which ones to let drop. As noted, a small business builds one-to-one relationships with its customers by noticing their needs, remembering their preferences, and learning from past interactions how to serve them better in the future. How can a large enterprise accomplish something similar when most customers may never interact personally with company employees? What can replace the creative intuition of the sole proprietor who recognizes customers by name, face, and voice, and remembers their habits and preferences? In a word: nothing. But through the clever application of information technology, even the largest enterprise can come surprisingly close. In large commercial enterprises, the first step - noticing what the customer does - has already largely been automated. On-line transaction processing (OLTP) systems are everywhere, collecting data on seemingly everything. These days, we all go through life generating a constant stream of transaction records. The customer-focused enterprise regards every record of an interaction with a client or prospect as a learning opportunity. But, learning requires more than simply gathering data. In fact, many companies gather hundreds of gigabytes or terabytes of data from and about their customers without learning anything. Data is gathered because it is needed for some operational purpose, e.g. inventory control or billing. And, once it has served that purpose, it languishes on tape or gets discarded. For learning to take place, data from many sources must first be gathered together and organized in a consistent and useful way. This is called data warehousing. Hence data warehousing allows the enterprise to remember what it has noticed about its customers. Data warehousing provides the enterprise with a memory. But, memory is of little use without intelligence. That is where data mining comes in. Intelligence allows us to comb through our memories noticing patterns, devising rules, coming up with new ideas to try, and making predictions about the future. The data must be analyzed, understood, and turned into actionable information. Data mining provides tools and techniques that add intelligence to the data warehouse. Data mining provides the enterprise with intelligence. Using several data mining tools and techniques that add intelligence to the data warehouse, an enterprise will be able to exploit the vast mountains of data generated by interactions with its customers and prospects in order to get to know them better. What customers are most likely to respond to a mailing? Are there groups (or segments) of customers with similar characteristics or behaviour? Are there interesting relationships between customer characteristics? Who is likely to remain a loyal customer and who is likely to jump ship? What is the next product or service this customer will want? 3 of 6
4 Answers to such questions lie buried in the enterprise s corporate data, but it takes powerful data mining tools to get at them, i.e. to dig user info for gold. The main data mining tasks Let us define the main tasks well-suited for data mining, all of which involve extracting meaningful new information from the data. Knowledge discovery (learning from data) comes in two flavours: directed (supervised) and undirected (unsupervised) learning from data. The six main activities of data mining are: classification (examining the feature of a newly presented object and assigning it to one of a predefined set of classes); estimation (given some input data, coming up with a value for some unknown continuous variable such as income, height, or credit-card balance); prediction (the same as classification and estimation except that the records are classified according to some predicted future behaviour or estimated future value); affinity grouping or association rules (determine which things go together, also known as dependency modelling, e.g. in a shopping cart at the supermarket - market basket analysis); clustering (segmenting a population into a number of subgroups or clusters); and description and visualization (exploratory or visual data mining). The first three tasks classification, estimation and prediction are all examples of directed knowledge discovery (supervised learning). In supervised learning the goal is to use the available data to build a model that describes one particular variable of interest, such as income or response, in terms of the rest of the available data ( class prediction ). The next three tasks affinity grouping or association rules, clustering, and description and visualization are examples of undirected knowledge discovery (unsupervised learning). In unsupervised learning no variable is singled out as the target; the goal is to establish some relationship among all the variables ( class discovery ). Unsupervised learning attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes. This is similar to looking for needles in haystacks. In fact, hardly any of the data mining algorithms were first invented with commercial applications in mind. Although most of the data mining techniques have existed, at least as academic algorithms, for years or decades, it is only in the last several years that commercial data mining has caught on in a big way. This is due to the convergence in the 1990s of a number of factors: the data are being produced; the data are being warehoused; the computing power is affordable; the competitive 4 of 6
5 pressure is strong; and commercial data mining software products have become available. The commercial data miner employs a grab bag of techniques borrowed from statistics, computer science and artificial intelligence research. Moreover, no single data mining tool or technique is equally applicable to all the tasks. The choice of a particular combination of data mining techniques to apply in a particular situation depends on both the nature of the data mining task to be accomplished and the nature of the available data. From a statistical perspective, many data mining tools could be described as flexible models and methods for exploratory data analysis. In other words many data mining tools are nothing else than multivariate (statistical) data analysis methods. Or, in the words of I. H. Witten and E. Franke: What s the difference between machine learning and statistics? Cynics, looking wryly at the explosion of commercial interest (and hype) in this area, equate data mining to statistics plus marketing. Data mining myths versus realities Do not let contradictory claims about data mining keep you from improving your business. A great deal of what is said about data mining is incomplete, exaggerated, or wrong. Data mining has taken the business world by storm, but as with many new technologies, there seems to be a direct relationship between its potential benefits and the quantity of (often) contradictory claims, or myths, about its capabilities and weaknesses. When you undertake a data mining project, avoid a cycle of unrealistic expectations followed by disappointment. Understand the facts instead and your data mining efforts will be successful. Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions. Data mining cannot be ignored the data are there, the methods are numerous and the advantages that knowledge discovery brings to a business are tremendous. Companies whose data mining efforts are guided by mythology will find themselves at a serious competitive disadvantage to those organizations taking a measured, rational approach based on facts. Finally, let me cite A. Onassis, The secret of success is to know something that nobody else knows, and J. Bigus, If you are not mining your data for all it is worth, you are guilty of underuse of one of your company s greatest assets. Conclusion: challenges for data miners, statisticians and clients The field of data mining, like statistics, concerns itself with learning from data or turning data into information. So we asked ourselves whether data mining is statistical déjà vu. As seen, answering yes to the latter would be absurd. Rather, it is important to note that data mining can learn from statistics that, to a large extent, statistics is fundamental to what data mining is really trying to achieve. There is the opportunity for an immensely rewarding synergy between data miners and statisticians. However, most data miners tend to be ignorant of statistics and client s domain; statisticians tend to be ignorant of data mining and client s domain; and clients tend to be ignorant of data mining and statistics. Unfortunately, they also tend to be inhibited by myopic points of view: computer scientists focus upon database manipulations and processing algorithms; statisticians focus upon identifying and handling uncertainties; and clients focus upon integrating knowledge into the knowledge domain. Moreover, most data miners and statisticians continue to sarcastically criticise each other. This is detrimental to both disciplines. Unfortunately, the anti-statistical attitude will keep data mining from reaching its actual potential data mining can learn from statistics. Data mining and statistics will inevitably grow toward each other in the near future because data mining will not become knowledge discovery without statistical thinking, statistics will not be able to succeed on massive and complex datasets without data mining approaches. Remember that knowledge discovery rests on the three balanced legs of computer science, statistics and client knowledge: it will not stand either on one leg or on two legs, or even on three unbalanced legs. Hence, successful knowledge discovery needs a substantial to collaboration from all three. All parties should widen their focus until true collaboration and the mining for gold becomes reality. A maturity challenge is for data miners, statisticians and clients to recognize their dependence on each other and for all of them to widen their focus until true collaboration becomes reality. The critical challenge for us all is to view the challenges as opportunities for our joint success. All parties should widen their focus until true collaboration and the mining for gold becomes reality. 5 of 6
6 About the author Diego Kuonen, PhD in Statistics, is founder and CEO of Statoo Consulting, Lausanne, Switzerland. Statoo Consulting is a Swiss consulting firm specialized in statistical consulting and training, data analysis, data mining and analytical CRM services. Dr. Diego Kuonen has several years of experience in statistical consulting, in computing and in data mining, and also in teaching and training. Currently, he is also vice president of the Swiss Statistical Society and president of its Section Statistics in Business and Industry. Have you already been Statooed? If not, please find further information on how to get Statooed at 6 of 6
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge
Use of Data Mining in the field of Library and Information Science : An Overview
512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi R P Bajpai Abstract Data Mining refers to the extraction or Mining knowledge from large amount of
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY
TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY Danubianu Mirela Stefan cel Mare University of Suceava Faculty of Electrical Engineering andcomputer Science 13 Universitatii Street, Suceava
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Data Mining: An Introduction
Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted
What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM
Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable
Challenges in Bioinformatics for Statistical Data Miners
Challenges in Bioinformatics for Statistical Data Miners Abstract Dr. Diego Kuonen Statoo Consulting, PSE-B, 1015 Lausanne 15, Switzerland [email protected] Starting with possible definitions of statistical
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of
CONNECTING DATA WITH BUSINESS
CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm
Banking On A Customer-Centric Approach To Data
Banking On A Customer-Centric Approach To Data Putting Content into Context to Enhance Customer Lifetime Value No matter which company they interact with, consumers today have far greater expectations
How To Use Data Mining For Loyalty Based Management
Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland [email protected],
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Data Mining with SAS. Mathias Lanner [email protected]. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner [email protected] Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam
ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open
Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?
Class 1 Data Mining Data Mining and Artificial Intelligence We are in the 21 st century So where are the robots? Data mining is the one really successful application of artificial intelligence technology.
Chapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90
FREE echapter C H A P T E R1 Big Data and Analytics Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 percent of the data in the
Big Data for Marketing & Sales: Data Accuracy to Business Impact
Needs Strategy Big Data for Marketing & Sales: Data Accuracy to Business Impact An IDG Connect survey of marketing, sales and research personnel in 300 US enterprise organizations. Decisions Usage Planning
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
The Bibliomining Process: Data Warehousing and Data Mining for Library Decision-Making
Nicholson, S. (2003) The Bibliomining Process: Data Warehousing and Data Mining for Library Decision-Making. Information Technology and Libraries 22 (4). The Bibliomining Process: Data Warehousing and
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept
Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA ([email protected]) Faculty of Computer Science, University of Indonesia Objectives
White Paper. Benefits and Challenges for Today s Online B- to- B Research Methodology. By Pete Cape, Director, Global Knowledge Management.
White Paper Benefits and Challenges for Today s Online B- to- B Research Methodology By Pete Cape, Director, Global Knowledge Management March 2015 Survey Sampling International, 2015 ABOUT THE AUTHOR
Business Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
Introduction to Data Mining
Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro) Overview Why data mining (data cascade) Application examples Data Mining
[callout: no organization can afford to deny itself the power of business intelligence ]
Publication: Telephony Author: Douglas Hackney Headline: Applied Business Intelligence [callout: no organization can afford to deny itself the power of business intelligence ] [begin copy] 1 Business Intelligence
SAP BusinessObjects Predictive Analysis. Transforming the Future with Insight Today
SAP BusinessObjects Predictive Analysis Transforming the Future with Insight Today What if.... You could identify hidden revenue opportunities within your customer base through predictive analytics?....
Chapter 2 Literature Review
Chapter 2 Literature Review 2.1 Data Mining The amount of data continues to grow at an enormous rate even though the data stores are already vast. The primary challenge is how to make the database a competitive
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining
Tom Khabaza Hard Hats for Data Miners: Myths and Pitfalls of Data Mining Hard Hats for Data Miners: Myths and Pitfalls of Data Mining By Tom Khabaza The intrepid data miner runs many risks, including being
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
CHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1. Introduction 1.1 Data Warehouse In the 1990's as organizations of scale began to need more timely data for their business, they found that traditional information systems technology
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
A SAS White Paper: Implementing the Customer Relationship Management Foundation Analytical CRM
A SAS White Paper: Implementing the Customer Relationship Management Foundation Analytical CRM Table of Contents Introduction.......................................................................... 1
Data Analytics in Organisations and Business
Data Analytics in Organisations and Business Dr. Isabelle E-mail: [email protected] 1 Data Analytics in Organisations and Business Some organisational information: Tutorship: Gian Thanei:
A SAS White Paper: Implementing a CRM-based Campaign Management Strategy
A SAS White Paper: Implementing a CRM-based Campaign Management Strategy Table of Contents Introduction.......................................................................... 1 CRM and Campaign Management......................................................
Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.
Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the
Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry
Advances in Natural and Applied Sciences, 3(1): 73-78, 2009 ISSN 1995-0772 2009, American Eurasian Network for Scientific Information This is a refereed journal and all articles are professionally screened
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
Outline. What is Big data and where they come from? How we deal with Big data?
What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Navigating Big Data business analytics
mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Data mining and official statistics
Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Successful Steps and Simple Ideas to Maximise your Direct Marketing Return On Investment
Successful Steps and Simple Ideas to Maximise your Direct Marketing Return On Investment By German Sacristan, X1 Head of Marketing and Customer Experience, UK and author of The Digital & Direct Marketing
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Louis Gudema: Founder and President of Revenue + Associates
The Interview Series - Presented by SmartFunnel Interviews of Sales + Marketing Industry Leaders Louis Gudema: Founder and President of Revenue + Associates PETER: Hello folks this is Peter Fillmore speaking.
Web Data Mining: A Case Study. Abstract. Introduction
Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 [email protected] Abstract With an enormous amount of data stored
Data Mining. Anyone can tell you that it takes hard work, talent, and hours upon hours of
Seth Rhine Math 382 Shapiro Data Mining Anyone can tell you that it takes hard work, talent, and hours upon hours of watching videos for a professional sports team to be successful. Finding the leaks in
How To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: [email protected]
96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad
DEFINITELY. GAME CHANGER? EVOLUTION? Big Data
Big Data EVOLUTION? GAME CHANGER? DEFINITELY. EMC s Bill Schmarzo and consultant Ben Woo weigh in on whether Big Data is revolutionary, evolutionary, or both. by Terry Brown EMC+ In a recent survey of
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Session 10 : E-business models, Big Data, Data Mining, Cloud Computing
INFORMATION STRATEGY Session 10 : E-business models, Big Data, Data Mining, Cloud Computing Tharaka Tennekoon B.Sc (Hons) Computing, MBA (PIM - USJ) POST GRADUATE DIPLOMA IN BUSINESS AND FINANCE 2014 Internet
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
SAP Predictive Analysis: Strategy, Value Proposition
September 10-13, 2012 Orlando, Florida SAP Predictive Analysis: Strategy, Value Proposition Thomas B Kuruvilla, Solution Management, SAP Business Intelligence Scott Leaver, Solution Management, SAP Business
Big Data Hope or Hype?
Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big
Healthcare, transportation,
Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Driving business intelligence to new destinations
IBM SPSS Modeler and IBM Cognos Business Intelligence Driving business intelligence to new destinations Integrating IBM SPSS Modeler and IBM Cognos Business Intelligence Contents: 2 Mining for intelligence
Cleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto [email protected] @alexcpsec @MLSecProject
Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto [email protected] @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning
Chapter Managing Knowledge in the Digital Firm
Chapter Managing Knowledge in the Digital Firm Essay Questions: 1. What is knowledge management? Briefly outline the knowledge management chain. 2. Identify the three major types of knowledge management
Market Research. What is market research? 2. Why conduct market research?
What is market research? Market Research Successful businesses have extensive knowledge of their customers and their competitors. Market research is the process of gathering information which will make
An interdisciplinary model for analytics education
An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
