What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO



Similar documents
An Introduction to the WEKA Data Mining System

Introduction. A. Bellaachia Page: 1

Data Mining Solutions for the Business Environment

Hexaware E-book on Predictive Analytics

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Introduction to Data Mining

Introduction to Data Mining

1. Introduction to Data Mining

DATA MINING AND WAREHOUSING CONCEPTS

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

An Introduction to Data Mining

An Introduction to WEKA. As presented by PACE

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

DATA MINING USING PENTAHO / WEKA

Data Mining with Weka

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Pentaho Data Mining Last Modified on January 22, 2007

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Introduction Predictive Analytics Tools: Weka

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

IBM SPSS Modeler Professional

not possible or was possible at a high cost for collecting the data.

THE COMPARISON OF DATA MINING TOOLS

Information Management course

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Introduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining

Data Mining for Fun and Profit

Brochure More information from

Data Mining: Introduction

Megaputer Intelligence

CUSTOMER Presentation of SAP Predictive Analytics

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

8. Machine Learning Applied Artificial Intelligence

Qi Liu Rutgers Business School ISACA New York 2013

Knowledge Discovery Process and Data Mining - Final remarks

Data Mining Part 5. Prediction

Data Mining for Everyone

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Database Marketing, Business Intelligence and Knowledge Discovery

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Advanced analytics at your hands

The Prophecy-Prototype of Prediction modeling tool

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Big Data and Data Science: Behind the Buzz Words

Predictive Analytics & Predictive Modeling December 2 3, Catherine Snyder Supervisor US Dealer Audit, Audit Services General Motors Company

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Machine Learning with MATLAB David Willingham Application Engineer

CSE4334/5334 Data Mining Lecturer 2: Introduction to Data Mining. Chengkai Li University of Texas at Arlington Spring 2016

Social Media Mining. Data Mining Essentials

Data Warehousing and Data Mining in Business Applications

Supervised DNA barcodes species classification: analysis, comparisons and results. Tutorial. Citations

Maschinelles Lernen mit MATLAB

A survey on Data Mining: Tools, Techniques, Applications, Trends and Issues.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Marcin Szczuka The University of Warsaw Poland

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

An Overview of Knowledge Discovery Database and Data mining Techniques

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES

from Larson Text By Susan Miertschin

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining Techniques

Significance of Data Warehousing and Data Mining in Business Applications

MA2823: Foundations of Machine Learning

Chapter 12 Discovering New Knowledge Data Mining

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

WEKA Explorer Tutorial

DATA MINING - SELECTED TOPICS

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

MBA Data Mining & Knowledge Discovery

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Text Mining - Scope and Applications

Worldwide Business Analytics Software Forecast and 2013 Vendor Shares

Data Mining System, Functionalities and Applications: A Radical Review

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Azure Machine Learning, SQL Data Mining and R

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Contents WEKA Microsoft SQL Database

Data Mining. Dr. Saed Sayad. University of Toronto

Make Better Decisions Through Predictive Intelligence

KNOWLEDGE BASE DATA MINING FOR BUSINESS INTELLIGENCE

BUSINESS ANALYTICS. Overview. Lecture 0. Information Systems and Machine Learning Lab. University of Hildesheim. Germany

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Transcription:

What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus Data mining finds valuable information hidden in large volumes of data. Data Mining process involves: Databases Statistics Machine Learning High Performance Computing Visualization Mathematics Data mining: Basic steps Mining tasks Handling missing values, outlier removal, labeling output, feature selection Classification, clustering, association analysis etc. Classification: YES, NO Patient diagnosis Regression: Predict the actual output value Clustering: Grouping identical items Weather forecasting Recommender systems e.g. Netflix Data sampling Association rules: Identifying association among input attributes. {Milk} --> {Coke}, {Diaper, Milk} --> {Beer} Anomaly detection: Detect deviation from normal behavior Fraud detection, Network intrusions 1

Data Mining Software Enterprise-level: (US $10,000 and more) Fair Isaac, IBM, Insightful, KXEN, Oracle, SAS, and SPSS Department-level: (from $1,000 to $9,999) Angoss, CART/MARS/TreeNet/Random Forests, Equbits, GhostMiner, Gornik, Mineset, MATLAB, Megaputer, Microsoft SQL Server, Statsoft Statistica, ThinkAnalytics Personal-level: (from $1 to $999): Excel, See5, MATLAB Free: C4.5, R, Weka, Xelopes Data Mining: WEKA Outline Data preparation Preprocessing and arff files Filters, classifiers, and visualization Attribute selection Training and testing Quality measurements Interpretation of results Data Mining Procedures Prepare the data into desired formats Preprocess the data if necessary Select different algorithms based on application or domain expertise Evaluate the results and repeat experiments again if necessary 2

Data Sets Arff file @relation weather @attribute No real @attribute outlook {sunny,overcast,rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE,FALSE} @attribute play {yes,no} Arff file Arff file 3

WEKA Explorer WEKA Parameter Distribution Filters 4

Filters Filters Filters Filters 5

Classifier Classifiers 6

7

Neural Networks Neural Networks Neural Networks Visualization 8

Clustering Clustering Clustering UCI repository http://archive.ics.uci.edu/ml/index.html 9