Lecture 6 - Data Mining Processes
|
|
- Amos Short
- 8 years ago
- Views:
Transcription
1 Lecture 6 - Data Mining Processes Dr. Songsri Tangsripairoj Dr.Benjarath Pupacdi Faculty of ICT, Mahidol University 1
2 Cross-Industry Standard Process for Data Mining (CRISP-DM) Example Application: Telephone Bill Study 2
3 CRISP-DM Cross-Industry Standard Process for Data Mining i ( /) CRISP-DM is a data mining process model that describes commonly used approaches that expert tdata miners use to tackle problems. One of first comprehensive attempts toward standard process model for data mining Independent of industry sector & technology 3
4 CRISP-DM Phases 1. Business (or problem) understanding 2. Data understanding 3. Data preparation Transform & create data set for modeling 4. Modeling 5. Evaluation Check good models, evaluate to assure nothing missing 6. Deployment 4
5 1. Business Understanding Determine business objectives Solve a specific problem Assess the current situation ti Convert the above into a data mining gproblem What types of customers are interested in each of our products? What are typical profiles of our customers? Develop a project plan 5
6 2. Data Understanding Initial Data Collection Data Description Data Exploration Data Quality Verification Data Selection Related data can come from many sources Internal (ERP (or MIS), Data Warehouse) External (Government data, Commercial data) Created (Research) 6
7 Set up a concise and clear description of the problem Identify spending behaviors of female shoppers who purchase seasonal clothes Identify bankruptcy patterns of credit card holders Identify the relevant data for the problem description Demographical, credit card transactional, financial data Selected variables for the relevant data should be independent of each other 7
8 Demographic data Such as income, education, number of fhouseholds, h and age Socio-graphic data Such as hobby, club membership, and entertainment Transactional data Such as sales record, credit card spending, issued checks 8
9 Nominal Ordinal Interval Ratio 9
10 Have finite non-ordered values Values are distinct symbols Only equality tests can be performed (=, ) Example: outlook: {sunny, overcast, rainy} sex: {male, female} eye color: {black, blue, green, brown, etc.} } 10
11 Have finite ordered values Impose order on values (<, >) But: no distance between values defined Example: grades: A > B > C > D > F credit ratings: excellent > fair > bad temperature: hot > mild > cool height: tall > medium > short 11
12 Interval quantities are not only ordered but measured din fixed and equal units The differences between values are meaningful, i.e., a unit of measurement exists (+, - ) Examples: temperatures in Celsius or Fahrenheit calendar dates 12
13 Ratio quantities are treated as real numbers All mathematical operations are allowed Both differences and ratios are meaningful (*, /) Example: age, length, time, counts, monetary yquantities 13
14 The type of an attribute depends on which of the following properties (operations) it possesses: Distinctness: = Order: <> Addition: + - Multiplication: * / Nominal: distinctness Ordinal: distinctness & order Interval: distinctness, order & addition Categorical (Qualitative) Numeric Ratio: all 4 properties (Quantitative) 14
15 Discrete data Has only a finitei or countably infinite i set of values Often represented as integer variables. Note: binary attributes (e,g., true/false, yes/no, 0/1) are a special case of discrete attributes Examples: zip codes, counts, or the set of words in a collection of documents 15
16 Continuous data Infinite i number of possible values Continuous attributes are typically y represented as floating-point variables Has real numbers as attribute values Practically, real values can only be measured and represented using a finite number of digits Examples: temperature, height, or weight 16
17 Types of Data Features PolyAnalyst PASW Modeler Continuous Numerical Range Integer Integer Range Yes/No Binary Flag Finite Categorical Set Date/Time String Text Range Typeless 17
18 3. Data Preparation Clean selected data for better quality Fill in missing values, Identify or remove outliers Resolve redundancy caused by data integration Correct inconsistent data Transform data Convert different measurements of data into a unified numerical scale by using simple mathematical formulations 18
19 Customer Zip Gender Income Age Marital Transaction ID Statust Amount M M J2S7K7 F W S M S F D
20 incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation= noisy: containing errors or outliers e.g., Salary= inconsistent: containing discrepancies in codes or names eg e.g., Age= 42 Birthday= 03/07/1997 e.g., Was rating 1,2,3, now rating A, B, C 20
21 Outliers differ greatly from the majority of data Data that are clearly out of range of the selected data groups Example: The Income of a customer included in the middle class is $250,000. The age of a credit card holder is recorded as
22 Incomplete data may come from Not applicable data value when collected Different considerations between the time when the data was collected and when it is analyzed. Human/hardware/software problems Noisy data (incorrect values) may come from Faulty data collection instruments Human or computer error at data entry Errors in data transmission Inconsistent data may come from Different data sources Functional dependency violation (e.g., modify some linked data) 22
23 Transform numerical to numerical scales Salary ranges from $20,000 to $100,000 to a number in [0.0, 1.0] The metric system (e.g., meter, kilometer) to the English system (e.g., foot and mile) Recode categorical data to numerical scales 1 = Yes and 0 = No 1 for $0 to $20,000 and 2 for $20,001 to $40,000 23
24 4. Modeling Data Treatment Training i set Test set Maybe others Data Mining Techniques Association Classification Clustering Predictions Sequential patterns 24
25 Derive a set of association rules showing relationships among attributes and data items, based on statistical significance. Example: Market-Basket analysis TID Items Rules discovered can be 1 Bread, Coke, Milk {milk} {coke} {diaper, milk} {beer} 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk 25
26 Classify data items into one of several predefined classes. Example: To indicate whether a customer is likely to buy a computer A decision tree <=30 Student Age Yes >40 Credit rating No Yes Excellent Fair No Yes No Yes 26
27 To group data items into a number of clusters by using some similarity measures. Example: Find subgroups of customers having similar purchase behaviors. Dimension = 2 Classes = 3 Patterns in class 1 = 20 Patterns in class 2 = 28 Patterns in class 3 = 25 Total patterns = 73 P 2 class 1 class 2 class 3 P 1 27
28 Related to regression techniques To discover the relationship between een the dependent and independent variables, the relationship between the independent variables Examples: Predict the amount of revenue that each item will generate during an upcoming sale, based on previous sales data Predict sales amounts of new product based on advertising expenditure. 28
29 To find similar patterns in data transaction over a business period Example: In point-of-sale transaction sequences, Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports_Jacket) Computer Bookstore: (Modern Database Management) (Data Warehousing Fundamentals) --> (Introduction to Data Mining) 29
30 5. Evaluation Does model meet business objectives? Any important business objectives not addressed? Does model make sense? Is model actionable? It should be possible to make business decisions after this step. All important objectives should be achieved. 30
31 6. Deployment Ongoing g monitoring & maintenance a Evaluate performance against success criteria Market reaction & competitor changes 31
32 Example Application Telephone industry Problem: Unpaid bills Data mining used to develop models to predict nonpayment as early as possible 32
33 Telephone Bill Study Billing period sequence analyzed Use 2 months, receive bill, payment due month of billing, disconnect if unpaid in given period Hypothesis: Insolvent customers would change calling habits & phone usage during a critical period before & immediately after termination of billing period 33
34 1: Business Understanding Predict which customers would be insolvent In time for firm to take preventive measures (and avert losing good customers) Hypothesis: Insolvent customers would change calling habits & phone usage during a critical period before & immediately after termination of billing period 34
35 2: Data Understanding Static customer information available in files Bills, payments, usage Used data warehouse to gather and organize data Coded to protect customer privacy 35
36 Creating Target Data Set Customer files Customer information Disconnects Reconnections Time-dependent data Bills Payments Usage 100,000 customers over 17-month period Stratified sampling to assure all groups appropriately represented 36
37 3: Data Preparation Filtered out incomplete data Deleted inexpensive calls Reduced data volume about 50% Low number of fraudulent cases Cross-checked with phone disconnects Lagged data made synchronization necessary 37
38 Data Reduction & Projection Information grouped by account Customer data aggregated by 2-week periods Discriminant analysis on 23 categories Calculated average owed by category (significant) ifi Identified extra charges (significant) Investigated payment by installments (not significant) 38
39 Choosing Data Mining Function Classes: Most possibly solvent (99.3%) Most possibly insolvent (0.7%) Costs of error widely different New data set created through stratified sampling Retained all insolvent Altered distribution to 90% solvent Used 2,066 cases total Citi Critical period didentified d Last 15 two-week periods before service interruption Variables defined d by counting measures in two-week periods 46 variables as candidate discriminant factors 39
40 4: Modeling Discriminant Analysis Linear model SPSS stepwise forward selection Decision Trees Rule-based classifier Neural Networks Nonlinear model 40
41 Data Mining Training set is about 2/3 of the data. The rest of the data (1/3) is the test set. Discriminant i i analysis Used 17 variables Equal costs correct Unequal costs correct Rule-based correct Neural network correct 41
42 5: Evaluation 1st objective e to maximize accuracy acy of predicting insolvent customers Decision tree classifier best 2nd objective to minimize error rate for solvent customers Neural network model close to Decision tree Used all 3 on case-by-case basis 42
43 Coincidence Matrix Combined Models Model Model Unclass Totals insolvent solvent Actual insolvent Actual solvent Totals
44 6: Implementation Every customer examined using all 3 algorithms If all 3 agreed, used that classification If disagreement, categorized as unclassified Correct on test data Only 1 actually solvent customer would ldhave been disconnected 44
2 Data Mining Process
2 Data Mining Process In order to systematically conduct data mining analysis, a general process is usually followed. There are some standard processes, two of which are described in this chapter. One
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationData Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Introduction Lecture Notes for Chapter 1 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused - Web
More informationData Mining 5. Cluster Analysis
Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationQuick Introduction of Data Mining Techniques
Quick Introduction of Data Mining Techniques *Sources partially from Introduction to Data Mining, by P.-N. Tan, M. Steinbach, V. Kumar, Addison-Wesley, 2005. Main Data Mining Techniques Link Analysis Associations
More informationElementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationIntroduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining
Introduction of Information Visualization and Visual Analytics Chapter 4 Data Mining Books! P. N. Tan, M. Steinbach, V. Kumar: Introduction to Data Mining. First Edition, ISBN-13: 978-0321321367, 2005.
More informationChapter 1: The Nature of Probability and Statistics
Chapter 1: The Nature of Probability and Statistics Learning Objectives Upon successful completion of Chapter 1, you will have applicable knowledge of the following concepts: Statistics: An Overview and
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationAdvanced Data Mining Techniques
Advanced Data Mining Techniques David L. Olson Dursun Delen Advanced Data Mining Techniques Dr. David L. Olson Department of Management Science University of Nebraska Lincoln, NE 68588-0491 USA dolson3@unl.edu
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationClassification Techniques (1)
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
More informationConcepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D.
Concepts of Variables Greg C Elvers, Ph.D. 1 Levels of Measurement When we observe and record a variable, it has characteristics that influence the type of statistical analysis that we can perform on it
More informationFoundations of Artificial Intelligence. Introduction to Data Mining
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
More informationLecture 2: Types of Variables
2typesofvariables.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 2: Types of Variables Recap what we talked about last time Recall how we study social world using populations and samples. Recall
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationData Mining Application in Higher Learning Institutions
Informatics in Education, 2008, Vol. 7, No. 1, 31 54 31 2008 Institute of Mathematics and Informatics, Vilnius Data Mining Application in Higher Learning Institutions Naeimeh DELAVARI, Somnuk PHON-AMNUAISUK
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationFraming Business Problems as Data Mining Problems
Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS
More informationData Mining: Introduction
Data Mining: Introduction Introducing the course How the course is organized How students are evaluated Deadlines Data Mining [Chapt. 1 of course book] What is it about? The KDD process Relations to other
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationIntroduction to Artificial Intelligence G51IAI. An Introduction to Data Mining
Introduction to Artificial Intelligence G51IAI An Introduction to Data Mining Learning Objectives Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationData Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
More informationS P S S Statistical Package for the Social Sciences
S P S S Statistical Package for the Social Sciences Data Entry Data Management Basic Descriptive Statistics Jamie Lynn Marincic Leanne Hicks Survey, Statistics, and Psychometrics Core Facility (SSP) July
More informationData Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:
More informationBusiness Statistics: Intorduction
Business Statistics: Intorduction Donglei Du (ddu@unb.edu) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 September 23, 2015 Donglei Du (UNB) AlgoTrading
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationSutee Sujitparapitaya, Ph.D. Institutional Effectiveness and Analytics San José State University
Sutee Sujitparapitaya, Ph.D. Associate Vice President for Institutional Effectiveness and Analytics San José State University Email: Sutee.Sujitparapitaya@sjsu.edu Copyright Sutee Sujitparapitaya, 2011
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationIT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users
1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data
More informationData Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.
Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation
More informationISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationData Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationEasily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationConcept and Applications of Data Mining. Week 1
Concept and Applications of Data Mining Week 1 Topics Introduction Syllabus Data Mining Concepts Team Organization Introduction Session Your name and major The dfiiti definition of dt data mining i Your
More informationSilvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationData Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
More informationMBA 8473 - Data Mining & Knowledge Discovery
MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various
More informationMaster of Science in Health Information Technology Degree Curriculum
Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationData Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
More informationData Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
More informationWeb Mining as a Tool for Understanding Online Learning
Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu
More informationFAO Standard Seed Security Assessment CREATING MS EXCEL DATABASE
FAO Standard Seed Security Assessment CREATING MS EXCEL DATABASE When you open a Microsoft Excel programme, a new file (book1) appears on your screen. This file normally consist of three work sheets (new
More informationData Mining: Data Preprocessing. I211: Information infrastructure II
Data Mining: Data Preprocessing I211: Information infrastructure II 10 What is Data? Collection of data objects and their attributes Attributes An attribute is a property or characteristic of an object
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationData Mining: An Introduction
Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationIBM SPSS Direct Marketing 19
IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS
More information1 Choosing the right data mining techniques for the job (8 minutes,
CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the
More informationDecision Trees What Are They?
Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationA STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationUSING LOGIT MODEL TO PREDICT CREDIT SCORE
USING LOGIT MODEL TO PREDICT CREDIT SCORE Taiwo Amoo, Associate Professor of Business Statistics and Operation Management, Brooklyn College, City University of New York, (718) 951-5219, Tamoo@brooklyn.cuny.edu
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationfrom Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationIntroduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives
More informationTDWI Best Practice BI & DW Predictive Analytics & Data Mining
TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationData cleaning and Data preprocessing
Data cleaning and Data preprocessing Nguyen Hung Son This presentation was prepared on the basis of the following public materials: 1. Jiawei Han and Micheline Kamber, Data mining, concept and techniques
More informationData Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationIBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
More informationDMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
More informationSurvey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses
Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA By Dean Abbott Abbott Analytics
More informationCredit Risk Models. August 24 26, 2010
Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationMeasurement Information Model
mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides
More informationStatistics and Data Mining
Statistics and Data Mining A B M Shawkat Ali PowerPoint permissions Cengage Learning Australia hereby permits the usage and posting of our copyright controlled PowerPoint slide content for all courses
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More information