Principles of Data Mining
|
|
|
- Henry Stafford
- 10 years ago
- Views:
Transcription
1 Principles of Data Mining Instructor: Sargur N. 1 University at Buffalo The State University of New York [email protected]
2 Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data Mining Tasks (What?) 5. Components of Data Mining Algorithms(How?) 6. Statistics vs Data Mining 2
3 Flood of Data New York Times, January 11, 2010 Video and Image Data Unstructured Structured and Unstructured (Text) Data 3
4 Large Data Sets are Ubiquitous 1.Due to advances in digital data acquisition and storage technology Business Supermarket transactions Credit card usage records Telephone call details Government statistics Scientific Images of astronomical bodies Molecular databases Medical records International organizations produce more information in a week than many people could read in a lifetime 2. Automatic data production leads to need for automatic data consumption 3. Large databases mean vast amounts of information 4 4. Difficulty lies in accessing it
5 Data Mining as Discovery Data Mining is Science of extracting useful information from large data sets or databases Also known as KDD Knowledge Discovery and Data Mining Knowledge Discovery in Databases 5
6 KDD is a multidisciplinary field Information Retrieval Machine Learning Pattern Recognition Database KDD Statistics Visualization Artificial Intelligence Expert Systems 6
7 Terminology for Data Structured Data Training Set Unstructured Data Information Retrieval Machine Learning Pattern Recognition Records Database KDD Statistics Samples Table Visualization Artificial Intelligence Expert Systems Data Points Instances 7
8 Data Mining Definition Analysis of (often large) Observational Data to find unsuspected relationships and Summarize data in novel ways that are understandable and useful to data owner Unsuspected Relationships non-trivial, implicit, previously unknown Ex of Trivial: Those who are pregnant are female Relationships and Summary are in the form of Patterns and Models Linear Equations, Rules, Clusters, Graphs, Tree Structures, Recurrent Patterns in Time Series Usefulness: meaningful: lead to some advantage, usually economic Analysis: Process of discovery (Extraction of knowledge) Automatic or Semi-automatic
9 Observational Data Observational Data Objective of data mining exercise plays no role in data collection strategy E.g., Data collected for Transactions in a Bank Experimental Data Collected in Response to Questionnaire Efficient strategies to Answer Specific Questions In this way it differs from much of statistics For this reason, data mining is referred to as secondary data analysis 9
10 KDD Process Stages: Selecting Target Data Preprocessing Transforming them Data Mining to Extract Patterns and Relationships Interpreting Assesses Structures KDD more complicated than initially thought 80% preparing data 20% mining data 10
11 Seeking Relationships Finding accurate, convenient and useful representations of data involves these steps: Determining nature and structure of representation E.g., linear regression Deciding how to quantify and compare two different representation E.g., sum of squared errors Choosing an algorithmic process to optimize score function E.g., gradient descent optimization Efficient Implementation using data management
12 Example of Regression Analysis EXAMPLE of Model 1. Representation 2. Score function 3. Process to optimize score 4. Implementation: data management, efficiency 1. Regression: y = a + bx Predictor variable = x (income) Response variable = y (credit card spending) 2. Score: sum of squared errors 12
13 Linear Regression Process: Extracting a Linear Model Linear regression with one variable Data of the form (x i, y i ), i =1,..n samples Need to find a and b such that y = a+bx Data Representation y x 1 3 Y X What is involved in calculating a and b So that the line fits the points the best?
14 Score: Sum of Squared Errors Where y i is the response value obtained from the model We wish to minimize SSE 14
15 Minimizing SSE for Regression Differentiating SSE with respect to a and b we have Setting partial derivatives equal to zero and rearranging terms Which we solve for a and b, the regression coefficients 15
16 Regression Coefficients To calculate a and b we need to find the means of the x and y values. Then we calculate b as a function of the x and y values and the means a as a function of the means and b 16
17 Application to Data y 1 x 3 mean y = 5 mean x = 6 a = 0.8, b = Optimal regression line is y = x Linear Regression For the data set 4 5 y x 10
18 Multiple Regression p predictor variables y x 1 x 2. x p y(1) x 1 (1) n objects y(n) x 1 (n) X = n x d+1 matrix Where a column of 1 s are added to incorporate a 0 in model Solution: 18 y is a column vector, a=(a o,..,a p ) e is a n by 1 vector containing residuals
19 Implementation of Regression Solution: 19 Simple summaries of the data; sums, sums of squares and sums of products of X and Y are sufficient to compute estimates of a and b Implies single pass through the data will yield estimates
20 2. Nature of Data Sets Structured Data set of measurements from an environment or process Simple case n objects with d measurements each: n x d matrix d columns are called variables, features, attributes or fields 20
21 Structured Data and Data Types US Census Bureau Data Public Use Microdata Sample data sets (PUMS) ID Age Sex Marital Quantitative Continuous Categorical Nominal Status Education Male Married High School grad Missing Categorical Ordinal Income Noisy data A guess? data 249?? Female Married HS grad Male Married Some College Male Not Married Child PUMS 21 Data has identifying information removed. Available in 5% and 1% sample sizes. 1% sample has 2.7 million records
22 22 Unstructured Data 1. Structured Data Well-defined tables, attributes (columns), tuples (rows) UC Irvine data set 2. Unstructured Data World wide web Documents and hyperlinks HTML docs represent tree structure with text and attributes embedded at nodes XML pages use metadata descriptions Text Documents Document viewed as sequence of words and punctuations Mining Tasks» Text categorization» Clustering Similar Documents» Finding documents that match a query» Automatic Essay Scoring (AES) Reuters collection is at
23 Representations of Text Documents Boolean Vector Document is a vector where each element is a bit representing presence/absence of word A set of documents can be represented as matrix (d,w) where document d and word w has value 1 or 0 (sparse matrix) Vector Space Representation Each element has a value such as no. of occurrences or frequency A set of documents represented as a document-term matrix 23
24 Vector Space Example Document-Term Matrix t1 database t2 SQL t3 index t4 regression t5 likelihood t6 linear d ij represents number of times that term appears in that document 24
25 Mixed Data: Structured & Unstructured Medical Patient Data Blood Pressure at different times of day Image data (x-ray or MRI) Specialistʼs comments (text) Hierarchy of relationships between patients, doctors, hospitals N x d data matrix is oversimplification of what occurs in practice 25
26 Transaction Data List of store purchases: date, customer ID, list of items and prices Web transaction log -sequence of triples: (user id, web page, time) Can be transformed into binary-valued matrix Individuals Web Page Visited
27 3.Types of Structures: Models and Patterns Representations sought in data mining Global Model Local Pattern 27
28 Models and Patterns Global Model Make a statement about any point in d-space E.g., assign a point to a cluster 28 Even when some values are missing Simple model: Y = ax + c Functional model is linear Linear in variables rather than parameters Local Patterns Make a statement about restricted regions of space spanned by variables E.g.1: if X > thresh1 then Prob (Y > thresh2) =p E.g.2: certain classes of transactions do not show peaks and troughs (bank discovers dead peopleʼs open accounts)
29 4. Data Mining Tasks (What?) Not so much a single technique Idea that there is more knowledge hidden in the data than shows itself on the surface Any technique that helps to extract more out of data is useful Five major task types: 1. Exploratory Data Analysis (Visualization) 2. Descriptive Modeling (Density estimation, Clustering) Model 3. Predictive Modeling (Classification and Regression) building 4. Discovering Patterns and Rules (Association rules) 5. Retrieval by Content (Retrieve items similar to pattern of interest) 29
30 Exploratory Data Analysis Interactive and Visual Pie Charts (angles represent size) Cox Comb Charts (radii represent size) Intricate spatial displays of users of Google around the world 30
31 Descriptive Modeling Describe all the data or a process for generating the data Probability Distribution using Density Estimation Clustering and Segmentation Partitioning p-dimensional space into groups Similar people are put in same group 31
32 Predictive Modeling Classification and Regression Market value of a stock, disease, brittleness of a weld Machine Learning Approaches A unique variable is the objective in prediction unlike in description. 32
33 Discovering Patterns and Rules Detecting fraudulent behavior by determining data that differs significantly from rest Finding combinations of transactions that occur frequently in transactional data bases Grocery items purchased together 33
34 Retrieval by Content User has pattern of interest and wishes to find that pattern in database, Ex: Text Search Estimate the relative importance of web pages using a feature vector whose elements are derived from the Query-URL pair Image Search Search a large database of images by using content descriptors such as color, texture, relative position 34
35 Components of Data Mining 35 Algorithms (How?) Four basic components in each algorithm* 1. Model or Pattern Structure Determining underlying structure or functional form we seek from data 2. Score Function Judging the quality of the fitted model 3. Optimization and Search Method Searching over different model and pattern structures 4. Data Management Strategy Handling data access efficiently *IIlustrated in Regression example
36 Statistics vs Data Mining Size of data set (large in data mining) Eyeballing not an option (terabytes of data) Entire dataset rather than a sample Many variables Curse of dimensionality Make predictions Small sample sizes can lead to spurious discovery: Superbowl winner conference correlates to stock market (up/down)
37 Searching Data Base vs Data Mining Data Base: When you know exactly what you are looking for Query Tool: SQL (Structured Query Language) example Table called Persons LastName FirstName Address City Hansen Ola Timoteivn 10 Sandnes Svendson Tove Borgvn 23 Sandnes Pettersen Kari Storgt 20 Stavanger Query: SELECT LastName FROM Persons results in LastName Hansen Svendson Pettersen Data Mining: When you only vaguely know what you are looking for 37
38 Reference Textbooks 1. Hand, David, Heikki Mannila, and Padhraic Smyth, Principles of Data Mining, MIT Press Bishop, Christopher, Pattern Recognition and Machine Learning, Springer 2006 Approach: Fundamental principles Emphasis on Theory and Algorithms Many other textbooks: Emphasize business applications, case studies 38
39 Many Other Textbooks 1. Han and Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2000 (Data Base Perspective) 2. Witten, I. H., and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, (Machine Learning Perspective) 3. Adriaans, P., and D. Zantinge, Data Mining, Addison- Wesley,1998. (Layman Perspective) 4. Groth, R., Data Mining: A Hands-on Approach for Business Professionals, Prentice-Hall PTR,1997. (Business Perspective) 5. Kennedy, R., Y. Lee, et al., Solving Data Mining Problems through Pattern Recognition, Prentice-Hall PTR, (Pattern Recognition Perspective) 6. Weiss, S., and N. Indurkhya, Predictive Data Mining: A Practical Guide, Morgan Kaufmann, (Statistical Perspective) 39
40 More Data Mining Textbooks 7. S.Chakrabarti, Mining the web, Morgan Kaufman, 2003 (Emphasis on webpages and hyperlinks) 8 T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning, Wiley, 2003 (Focus on data quality) 9. K. Cios, W. Pedrycz and R. Swiniarski, Data Mining Methods for Knowledge Discovery,Kluwer, 1998,(Focus on Mathematical issues, e.g., rough sets) 10. M. Kantardzic, Data Mining: Concepts, Models and Algorithms, IEEE-Wiley, 2003 (Focus on Machine Learning) 11. A. K. Pujari, Data Mining Techniques, Universities Press, 2001,(Data Base Perspective) 12. R. Groth, Data Mining: A hands-on approach for business professionals, Prentice Hall, 1998 (Business user perspective including software CD) 40
41 Premier Data Mining Conference 41
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Why do statisticians "hate" us?
Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept
Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
Principles of Data Mining by David Hand, Heikki Mannila and Padhraic Smyth The MIT Press 2001 (546 pages)
Principles of Data Mining by David Hand, Heikki Mannila and Padhraic Smyth The MIT Press 2001 (546 pages) ISBN: 026208290x A comprehensive, highly technical look at the math and science behind extracting
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Introduction to Data Mining. Lijun Zhang [email protected] http://cs.nju.edu.cn/zlj
Introduction to Data Mining Lijun Zhang [email protected] http://cs.nju.edu.cn/zlj Outline Overview Introduction The Data Mining Process The Basic Data Types The Major Building Blocks Scalability and Streaming
Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare
Syllabus HMI 7437: Data Warehousing and Data/Text Mining for Healthcare 1. Instructor Illhoi Yoo, Ph.D Office: 404 Clark Hall Email: [email protected] Office hours: TBA Classroom: TBA Class hours: TBA
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Perspectives on Data Mining
Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London [email protected] April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.
Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Three Perspectives of Data Mining
Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA
315 DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA
Introduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain [email protected] Horsens, Denmark, 26th September 2005
CAS CS 565, Data Mining
CAS CS 565, Data Mining Course logistics Course webpage: http://www.cs.bu.edu/~evimaria/cs565-10.html Schedule: Mon Wed, 4-5:30 Instructor: Evimaria Terzi, [email protected] Office hours: Mon 2:30-4pm,
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis
DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos [email protected] http://aetos.it.teithe.gr/~dad Georgios Evangelidis
ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining
A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Subject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
A Brief Tutorial on Database Queries, Data Mining, and OLAP
A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
Data Mining: Motivations and Concepts
POLYTECHNIC UNIVERSITY Department of Computer Science / Finance and Risk Engineering Data Mining: Motivations and Concepts K. Ming Leung Abstract: We discuss here the need, the goals, and the primary tasks
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
Data Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: [email protected] http://sci2s.ugr.es
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 1 Introduction SURESH BABU M ASST PROF IT DEPT VJIT 1 Chapter 1. Introduction Motivation: Why data mining? What is data mining? Data Mining: On what kind of
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Use of Data Mining in the field of Library and Information Science : An Overview
512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi R P Bajpai Abstract Data Mining refers to the extraction or Mining knowledge from large amount of
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
Foundations of Artificial Intelligence. Introduction to Data Mining
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Data Warehousing and Data Mining
Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper [email protected] DM Lecturer: Mouna Kacimi [email protected] http://www.inf.unibz.it/dis/teaching/dwdm/index.html
APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email [email protected]
Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and
NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Data Mining: An Introduction
Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
Data Mining for Digital Forensics
Digital Forensics - CS489 Sep 15, 2006 Topical Paper Mayuri Shakamuri Data Mining for Digital Forensics Introduction "Data mining is the analysis of (often large) observational data sets to find unsuspected
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
Knowledge Discovery from Data Bases Proposal for a MAP-I UC
Knowledge Discovery from Data Bases Proposal for a MAP-I UC P. Brazdil 1, João Gama 1, P. Azevedo 2 1 Universidade do Porto; 2 Universidade do Minho; 1 Knowledge Discovery from Data Bases We are deluged
Predictive Analytics Certificate Program
Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and
Data Mining Techniques for Mortality at Advanced Age
Data Mining Techniques for Mortality at Advanced Age Lijia Guo, Ph.D., A.S.A. and Morgan C. Wang, Ph.D. University of Central Florida Abstract This paper addresses issues and techniques for advanced age
College of Health and Human Services. Fall 2013. Syllabus
College of Health and Human Services Fall 2013 Syllabus information placement Instructor description objectives HAP 780 : Data Mining in Health Care Time: Mondays, 7.20pm 10pm (except for 3 rd lecture
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Data Mining and Business Intelligence CIT-6-DMB. http://blackboard.lsbu.ac.uk. Faculty of Business 2011/2012. Level 6
Data Mining and Business Intelligence CIT-6-DMB http://blackboard.lsbu.ac.uk Faculty of Business 2011/2012 Level 6 Table of Contents 1. Module Details... 3 2. Short Description... 3 3. Aims of the Module...
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa [email protected] skype, gtalk: avellido Tels.:
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
Data Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
Cleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
Building Data Cubes and Mining Them. Jelena Jovanovic Email: [email protected]
Building Data Cubes and Mining Them Jelena Jovanovic Email: [email protected] KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
