Data Mining Individual Assignment report

Size: px
Start display at page:

Download "Data Mining Individual Assignment report"

Transcription

1 Björn Þór Jónsson Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent pattern mining and clustering, using data from questionnaire results submitted by students in a 2014 Data Mining class. The implementation is split into Java packages, one for each Data Mining method, and the package names accompany each section name here below, for easy reference. Comments may be sparse but descriptive method and variable names should make up for that that s a coding style I ve come to appreciate, where meta data in code comments can do more harm than good when they re not maintained as the code changes and become outdated. Hope the implementation proves to be readable. Plots of generated data are made with simple R scripts that can be found in the plots directory within the project root. Preprocessing code namespace: is.bthj.itu.datamining.preprocessing The attributes chosen from the data to work with are: age, programming skill, years at university, preferred operating system, favorite programming languages, whether more mountains should be in Denmark and if one is fed up with the winter, and the favorite color. Cleaning the data consists of normalization in the form of inferring consistent values from ones that are considered the same and clamping numerical values to a defined range. After that process, tuples are removed that still have unknown values. Specifically, age values are only accepted if they are between 18 and 120, inclusive; programming skill is clamped to the range 1 10; years at university values are accepted as they are if they prove to be a known numerical value; prefered operating system answers are set to consistent values inferred from a list of alternative spellings, as can be seen in OSSynonyms; values from the list of favorite programming languages are in a similar way set to consistent ones inferred from lists of synonyms in the enumeration ProgrammingLanguages; the boolean attributes about mountains and winter in Denmark are set to either Yes or No by comparing with many different synonyms for those words, in the enumeration BooleanSynonyms; favorite color is set to the closest match found in the list of color names in BasicColorNames. Cleaning the data in this way and writing it to disk can be done by running the mainmethod of CSVFileReaderin the.preprocessingpackage; the results can be seen in the file

2 cleaned dataset.csvin the project s root. In the rest of the project, the cleaning method QuestionairePreProcessor.getCleanedQuestionairesis called directly in code instead of reading from this file, for ease over efficiency. Supervised learning: classification is.bthj.itu.datamining.classification For classification with supervised learning, the knn method was chosen and the target attribute: Do you think there should be more mountains in Denmark? Different combinations of the other attributes, that are both numerical and nominal, were tried to compute the distance between tuples (by commenting out different parts of ClassificationKNN.distanceBetweenTwoTuples that could indeed have been done in a more elegant way). The implementation can be tested by running the mainmethod in the ClassificationKNNclass. Plots of classification accuracy for a few of the different combinations can be seen here below, where the Favorite color attribute alone proves to be best for classifying the tuples, where k = 11 gives 89% accuracy. Distance metric by: color attribute age attribute age, programming skill and operating system all attributes years at university

3 Frequent pattern / association mining is.bthj.itu.datamining.association For finding frequent patterns with a given support and association rules with a given minimum confidence, the Apriori algorithm was implemented and targeted at the Favorite programming languages attribute. The implementation can be tested by running the mainmethod in the Aprioriclass. To test and validate the implementation, data was used from Example 6.3 and Table 6.1 in the textbook, Data Mining Concepts and Techniques, 3rd edition see method Apriori.getTextBookTransactionalData. That proved to be a good idea as it uncovered errors in the implementation, when compared with the results in Example 6.3; One error was in the frequent item set search, where support for candidate sets was found by only comparing the first elements of the set with the first elements of each set in the data, in other words depending the same order of occurrence of the compared elements, instead of searching specifically for the existence of each element in the candidate set, anywhere in each data record set see method Apriori.countSupport. Another uncovered error was in the generation of association rules where the confidence calculation was flawed as confidence( A => B)was computed as support_count( B ) / support_count( A )instead of support_count( A U B) / support_count( A ) see method Apriori.printAssociationRules Output from the implementation, by running the main method in the Apriori class, with support set to 2 and and minimum confidence set to 70%, is the following: ***Frequent itemsets with minimum support: 2 [C, CSharp, Java] [CPlusPlus, CSharp, Java] [CSharp, FSharp, Java] [CSharp, FSharp, Scala] [CSharp, Java, JavaScript] [CSharp, Java, PHP] [CSharp, Java, Python] [CSharp, JavaScript, Python] ***Association rules with minimum conficence = 70% C,CSharp => Java, confidence = 2/2 = 100% C,Java => CSharp, confidence = 2/2 = 100% CPlusPlus,CSharp => Java, confidence = 2/2 = 100% FSharp,Scala => CSharp, confidence = 2/2 = 100% Java,JavaScript => CSharp, confidence = 3/4 = 75% CSharp,PHP => Java, confidence = 7/8 = 88% Java,PHP => CSharp, confidence = 7/8 = 88% PHP => CSharp,Java, confidence = 7/10 = 70% From this we can for example say that Java and JavaScript preference implies CSharp preference, with 75% confidence.

4 Clustering is.bthj.itu.datamining.clustering To cluster the tuples into k numbers of partitions, the k Means technique was implemented. Only one dimension of the data was used to partition by age but more dimensions could easily be added by expanding the method KMeans.getTupleValue. The implementation can be tested by running the mainmethod in the KMeansclass. To measure the quality of the clusters formed in this dimension, for different values of k, the sum of square errors for each partition count k was computed, and as initial cluster centroids are chosen at random, an average of errors from 10 computations for each k was computed: Average of 10 sums of square errors for partition size k = 2: Average of 10 sums of square errors for partition size k = 3: Average of 10 sums of square errors for partition size k = 4: Average of 10 sums of square errors for partition size k = 5: Average of 10 sums of square errors for partition size k = 6: Average of 10 sums of square errors for partition size k = 7: Average of 10 sums of square errors for partition size k = 8: Average of 10 sums of square errors for partition size k = 9: Average of 10 sums of square errors for partition size k = 10: k = k = From this can be seen that k = 6 gives a comparatively low local minimum of error, with a reasonably low number of partitions, so k = 6 seems to be a good choice when clustering the tuples from values in the age attribute. Though clustering is unsupervised, and so has no predefined classes, it could be interesting to look at how well this clustering method performs as a classifier, for example by measuring how dominantly similar single nominal values are within each cluster, like Favorite color, as a measure of goodness, but I ll let the sum of square errors suffice as a measure for now.

5 Conclusion: It has been interesting to get acquainted with those Data Mining methods and I can foresee using them in my future game development. IT University of Copenhagen spring 2014 Björn Þór Jónsson

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Classification Techniques (1)

Classification Techniques (1) 10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Car Insurance. Jan Tomášek Štěpán Havránek Michal Pokorný

Car Insurance. Jan Tomášek Štěpán Havránek Michal Pokorný Car Insurance Jan Tomášek Štěpán Havránek Michal Pokorný Competition details Jan Tomášek Official text As a customer shops an insurance policy, he/she will receive a number of quotes with different coverage

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points

Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points Journal of Computer Science 6 (3): 363-368, 2010 ISSN 1549-3636 2010 Science Publications Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

KINGS COLLEGE OF ENGINEERING

KINGS COLLEGE OF ENGINEERING KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR 2011-2012 / ODD SEMESTER SUBJECT CODE\NAME: CS1011-DATA WAREHOUSE AND DATA MINING YEAR / SEM: IV / VII UNIT I BASICS

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015 CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

More information

Clustering & Association

Clustering & Association Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Introduction to Statistical Machine Learning

Introduction to Statistical Machine Learning CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing

Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric

More information

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Decision tree algorithm short Weka tutorial

Decision tree algorithm short Weka tutorial Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 2009-2010 Machine Learning: brief summary Example You need to write a program that: given a

More information

Data mining knowledge representation

Data mining knowledge representation Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

More information

Clustering Algorithms. Data Mining Clustering. Distance. Example. More Than One Mean. Mean Clustering

Clustering Algorithms. Data Mining Clustering. Distance. Example. More Than One Mean. Mean Clustering Clustering Algorithms Data Mining Clustering Kevin Swingler Organise data into a number of distinct groups (clusters) according to the similarity of their members and their differences from other clusters

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Data Mining Applications in Manufacturing

Data Mining Applications in Manufacturing Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

More information

Data Mining with Weka

Data Mining with Weka Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

More information

Data Mining and Clustering Techniques

Data Mining and Clustering Techniques DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center

More information

2 When is a 2-Digit Number the Sum of the Squares of its Digits?

2 When is a 2-Digit Number the Sum of the Squares of its Digits? When Does a Number Equal the Sum of the Squares or Cubes of its Digits? An Exposition and a Call for a More elegant Proof 1 Introduction We will look at theorems of the following form: by William Gasarch

More information

C19 Machine Learning

C19 Machine Learning C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

More Data Mining with Weka

More Data Mining with Weka More Data Mining with Weka Class 3 Lesson 1 Decision trees and rules Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 3.1: Decision trees and rules

More information

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Applying Data Mining of Fuzzy Association Rules to Network Intrusion Detection

Applying Data Mining of Fuzzy Association Rules to Network Intrusion Detection Applying Data Mining of Fuzzy Association Rules to Network Intrusion Detection Authors: Aly El-Semary, Janica Edmonds, Jesús González-Pino, and Mauricio Papa Center for Information Security Department

More information

Understanding the Indian Labour Market: A Data Centric Approach

Understanding the Indian Labour Market: A Data Centric Approach Understanding the Indian Labour Market: A Data Centric Approach Shabana K M, Tony Gracious, Hrishikesh Subramonian R&D Department Flytxt Trivandrum-695581, India shabana.meethian,tony.gracious,hrishikesh.subramonian@flytxt.com

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Scikit-Learn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun

Scikit-Learn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun Scikit-Learn GUI NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale Client: Dr. Olufisayo Omojokun Presentation Overview Introduction to Machine Learning Importance of Machine Learning Feasibility

More information

Applied Data Mining. Ingo Lütkebohle, Julia Lüning 27.12.2004. 21. Chaos Communication Congress

Applied Data Mining. Ingo Lütkebohle, Julia Lüning 27.12.2004. 21. Chaos Communication Congress Applied Data Mining Ingo Lütkebohle, Julia Lüning 21. Chaos Communication Congress 27.12.2004 Outline 1 motivation process of mining data 2 visualisation 3 statistics clustering 4 algorithm tool example

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Project Report. 1. Application Scenario

Project Report. 1. Application Scenario Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions

More information

Applying Data Mining to Demand Forecasting and Product Allocations

Applying Data Mining to Demand Forecasting and Product Allocations The Pennsylvania State University The Graduate School Capital College Applying Data Mining to Demand Forecasting and Product Allocations A Master s Paper in Computer Science By Bhavin Parikh @2003 Bhavin

More information

Lecture 20: Clustering

Lecture 20: Clustering Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Smart Grid Data Analytics for Decision Support

Smart Grid Data Analytics for Decision Support 1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

More information

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

Prediction of Car Prices of Federal Auctions

Prediction of Car Prices of Federal Auctions Prediction of Car Prices of Federal Auctions BUDT733- Final Project Report Tetsuya Morito Karen Pereira Jung-Fu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

System for recommending job titles based on user provided titles and categories

System for recommending job titles based on user provided titles and categories Technical Disclosure Commons Defensive Publications Series June 03, 2016 System for recommending job titles based on user provided titles and categories Xuejun Tao Hao Jiang Roger Hernandez Jr Mark Rivera

More information

Sampling Distributions and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem 135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

More information

IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) Thomas Zeutschler

IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) Thomas Zeutschler Hochschule Düsseldorf University of Applied Scienses Fachbereich Wirtschaftswissenschaften W Business Analytics (M.Sc.) IT in Business Analytics IT Applications in Business Analytics SS2016 / Lecture 07

More information

CSCI-B 565 DATA MINING Project Report for K-means Clustering algorithm Computer Science Core Fall 2012 Indiana University

CSCI-B 565 DATA MINING Project Report for K-means Clustering algorithm Computer Science Core Fall 2012 Indiana University CSCI-B 565 DATA MINING Project Report for K-means Clustering algorithm Computer Science Core Fall 2012 Indiana University Jayesh Kawli jkawli@indiana.edu 09/17/2012 1. Examining Wolberg s breast cancer

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 2, Issue 5 (March 2013) PP: 16-21 Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

More information

EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING

EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara- Punjab Abstract Clustering is an essential

More information

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.

More information

Data Mining Fundamentals

Data Mining Fundamentals Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Application of Data Mining Methods in Health Care Databases

Application of Data Mining Methods in Health Care Databases 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Association Rules Apriori Algorithm. Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example

Association Rules Apriori Algorithm. Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example Association Rules Apriori Algorithm Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example 1 Machine Learning Common ground of presented methods Statistical Learning

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning. By: Ibrar Hussain Student ID: Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

1992-2010 by Pearson Education, Inc. All Rights Reserved.

1992-2010 by Pearson Education, Inc. All Rights Reserved. Key benefit of object-oriented programming is that the software is more understandable better organized and easier to maintain, modify and debug Significant because perhaps as much as 80 percent of software

More information

Cluster Analysis: Basic Concepts and Methods

Cluster Analysis: Basic Concepts and Methods 10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract

More information

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

More information

Clustering and Data Mining in R

Clustering and Data Mining in R Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

More information