Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis

Similar documents
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Information Management course

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Data Mining Part 5. Prediction

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Introduction to Data Mining

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Introduction to Data Mining

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

The Scientific Data Mining Process

Introduction. A. Bellaachia Page: 1

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

not possible or was possible at a high cost for collecting the data.

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Model Deployment. Dr. Saed Sayad. University of Toronto

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Dynamic Data in terms of Data Mining Streams

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Data Warehousing and Data Mining in Business Applications

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Azure Machine Learning, SQL Data Mining and R

Customer Classification And Prediction Based On Data Mining Technique

Data Mining Solutions for the Business Environment

Business Intelligence and Process Modelling

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Database Marketing, Business Intelligence and Knowledge Discovery

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS.

Knowledge Discovery in Data with FIT-Miner

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Visualization

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Summarizing and Displaying Categorical Data

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Client Overview. Engagement Situation. Key Requirements

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Business Intelligence. Data Mining and Optimization for Decision Making

GETTING STARTED WITH R AND DATA ANALYSIS

Data Mining Part 5. Prediction

2.1. Data Mining for Biomedical and DNA data analysis

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Data Mining Applications in Higher Education

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Data, Measurements, Features

Data Exploration Data Visualization

DATA MINING AND WAREHOUSING CONCEPTS

ANALYTICS CENTER LEARNING PROGRAM

Introduction to Data Mining

Integrated Data Mining and Knowledge Discovery Techniques in ERP

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Advanced analytics at your hands

Data Warehousing and Data Mining

Data Mining - Introduction

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Distance Learning and Examining Systems

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

How To Solve The Kd Cup 2010 Challenge

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Pentaho Data Mining Last Modified on January 22, 2007

SPATIAL DATA CLASSIFICATION AND DATA MINING

Information Visualization WS 2013/14 11 Visual Analytics

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Statistics for BIG data

A Knowledge Management Framework Using Business Intelligence Solutions

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Importance or the Role of Data Warehousing and Data Mining in Business Applications

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

An Overview of Knowledge Discovery Database and Data mining Techniques

Chapter ML:XI (continued)

430 Statistics and Financial Mathematics for Business

Data Mining for Successful Healthcare Organizations

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Visualizations. Cyclical data. Comparison. What would you like to show? Composition. Simple share of total. Relative and absolute differences matter

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Application of Predictive Analytics for Better Alignment of Business and IT

The Problem: Challenges in Visualizing Pharmaceutical Business Information

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

Principles of Data Mining by Hand&Mannila&Smyth

Transcription:

Data Foundations Data Attributes and Features Data Pre-processing Data Storage Data Analysis 1 Data Attributes Describing data content and characteristics Representing data dimensions Set of all attributes: attribute vector or attribute array. 2 1

Attribute Types Nominal Data Sortable Data 3 Numerical Attributes Discrete vs. Continuous 4 2

Statistical Features of Data Mean: Median: Standard Deviation: 5 Similarity Matrix Distance for Categorical Data: p: total # of types; m: # of same types 6 3

Distance Measures (Numerical) Euclidean Distance: L2 Manhattan Distance: L1 Minkowski Distance: Lp Cosine Distance: s( X, Y ) X X Y Y 7 Data Uncertainty Source of uncertainty Attribute error Missing attributes Data integration error Resolution conversion Application uncertainty 8 4

Data Preprocessing Application database Data Products ETL: Extract, Transform, & Load Data Warehouse Commercial Intelligence Analysis 9 Data Preprocessing 1. Data Cleaning 2. Data Integration Data Quality Accuracy Completeness Consistency Timeliness Believability Interpretability 10 5

Data Visualization Quality Data-Ink Ratio: 11 Data Error Types Missing data Replace with constants Replace with average value of attributes Regression Manual filling Noisy data Regression Outlier analysis 12 6

Visual Data Cleaning Using visualization techniques for data cleaning 13 Data Integration Combining data from multiple sources Structural conflicts Schema differences Data Conflicts Repeated data Providing a uniform visualization 14 7

Data Integration Example Form 1: Form 2: Integrated Form: 15 Data Storage File Systems Databases and DBMS Data Warehouse 16 8

CSV file (comma-separated values) 17 Structured Files XML files: extensible Markup Language <note> <to>tove</to> <from>jani</from> <heading>reminder</heading> <body>don't forget me this weekend!</body> </note> XML Extensions: IVOA VOtable (Space programs), KML (Web maps), etc. Special formats: e.g. HDF (Hierarchical Data Format) for scientific data 18 9

Databases A database is an organized collection of data. The data is typically organized to model aspects of reality in a way that supports processes requiring information. For example, modelling the availability of rooms in hotels in a way that supports finding a hotel with vacancies. 19 Relational Databases DBMS: Database Management System Data definition and query languages (SQL) 20 10

Visualization of a database Challenge: Interactivity! Example: National Science Foundation Database: 21 Data Warehouse A data warehouse is a system used for reporting and data analysis. It is a central repository of integrated data from one or more disparate sources. 22 11

Database vs Data Warehouse Database Data Warehouse Purpose Data operation Information in the data Application Business Analysis Users Employee, DB Administrator Analyst, manager, executive Functionalities Daily operations decision making support Data Current Historical, time-variant Access Read, write, mean, etc Read Focus Input, query information output Size 1 GB ~ < 1 TB TBs 23 Data Analysis Statistical Analysis Exploratory Analysis Data Mining 24 12

Statistical Analysis Statistical description: properties, parameters, distribution, correlations. Statistical prediction: using probability methods and sampling theory to predict statistical properties (distribution, correlation, parameters, forecasting, etc.) 25 Data Exploration Using Visualization Raw data drawing Statistical drawing Multi-view 26 13

Data Trajectory 27 Data Comparison 28 14

Trend and Patterns 29 Relations 30 15

Line Chart 31 Line Chart: Sunspots 32 16

Bar Chart 33 Sorted Bar Chart 34 17

Bar Chart Labeling 35 Bar Chart: Scale 36 18

Bar Chart : Variant 37 Stacked Bar Chart 38 19

Stacked Bar Chart 39 Stacked Chart 40 20

Pie Chart 41 Pie Chart 42 21

Pie Charts of Van Gogh Paintings 43 Histogram 44 22

Contour Map 45 Scatter Plot Matrix 46 23

Reference Lines in Scatter Plot 47 Heatmap 48 24

Box Plot 49 Box Plot Variations 50 25

Multi-View 51 Data Mining Data mining, also popularly referred to as knowledge discovery from data (KDD), is the automated or convenient extraction of patterns representing knowledge implicitly stored or captured in large databases, data warehouses, the Web, other massive repositories or data streams J. Han and M Kamber, Data Mining: Concepts and Techniques Data Mining Model Validation Knowledge 52 26

Data Mining Tasks Description: Data Algorithm Features Prediction 1 Model Training Data Trained Model 2 New Data Trained Model Features 53 Data Mining Tasks Descriptive Tasks: Concept Description Association Mining Clustering Outlier Analysis Predictive Tasks: Classification Evolution Analysis 54 27

Data Mining Methods Statistical Method: Regression, Parameter Estimation Machine Learning: Decision tree, Neural Networks Statistical Learning: Probability model, Basian networks Algorithmic Method: K- mean, Graph operations 55 Data Mining New Applications Text Mining summarizes, navigates, and clusters documents contained in a database. Web Mining integrates data and text mining within a Web site; enhances the Web site with intelligent behavior, such as suggesting related links or recommending new products to the consumer 56 28

Data Visualization Pipeline 57 Looping Model 58 29

Visual Data Mining and Visual Analytics Users are involved in the data mining process through visualization and user interactions. Certain tasks are difficult to be automated Validation of clustering results Checking data focal points and noise Expert knowledge input, etc. 59 Visual Analytics Paradigm Knowledge Visualization Data Processing & Mining User Interaction Revision of Mining Engine & Decision Making 60 30

Visual Analytics By Daniel Keim 61 Visual Data Mining Examples: Visualizing data correlations 62 31

Visual Data Mining Example: Biomarker detection N 1 Z f ( D1, P i ) N i 1 1 7 Z f ( D1, P ) 1 7 i Z f ( D2, P i ) 7 i 1 7 i 1 1 Z f ( D1, Pi ), i 1, 2, 7 3 i 1 Z f( Pi, Dj ), i 1, 2, 7 3 i 1 7 Z f ( P i, D j ), 7 i 1 1 Z f ( Pi, Dj ), i 1, 2, 3, 5, 7 5 i 63 Visual Data Mining Example: Facial feature detection for medical diagnosis 64 32

Visual Data Mining Example: Concept detection in text data 65 Visual Data Mining Example: Visualizing decision trees 66 33

67 34