Data Mining Individual Assignment report


 Rodger Norman
 2 years ago
 Views:
Transcription
1 Björn Þór Jónsson Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent pattern mining and clustering, using data from questionnaire results submitted by students in a 2014 Data Mining class. The implementation is split into Java packages, one for each Data Mining method, and the package names accompany each section name here below, for easy reference. Comments may be sparse but descriptive method and variable names should make up for that that s a coding style I ve come to appreciate, where meta data in code comments can do more harm than good when they re not maintained as the code changes and become outdated. Hope the implementation proves to be readable. Plots of generated data are made with simple R scripts that can be found in the plots directory within the project root. Preprocessing code namespace: is.bthj.itu.datamining.preprocessing The attributes chosen from the data to work with are: age, programming skill, years at university, preferred operating system, favorite programming languages, whether more mountains should be in Denmark and if one is fed up with the winter, and the favorite color. Cleaning the data consists of normalization in the form of inferring consistent values from ones that are considered the same and clamping numerical values to a defined range. After that process, tuples are removed that still have unknown values. Specifically, age values are only accepted if they are between 18 and 120, inclusive; programming skill is clamped to the range 1 10; years at university values are accepted as they are if they prove to be a known numerical value; prefered operating system answers are set to consistent values inferred from a list of alternative spellings, as can be seen in OSSynonyms; values from the list of favorite programming languages are in a similar way set to consistent ones inferred from lists of synonyms in the enumeration ProgrammingLanguages; the boolean attributes about mountains and winter in Denmark are set to either Yes or No by comparing with many different synonyms for those words, in the enumeration BooleanSynonyms; favorite color is set to the closest match found in the list of color names in BasicColorNames. Cleaning the data in this way and writing it to disk can be done by running the mainmethod of CSVFileReaderin the.preprocessingpackage; the results can be seen in the file
2 cleaned dataset.csvin the project s root. In the rest of the project, the cleaning method QuestionairePreProcessor.getCleanedQuestionairesis called directly in code instead of reading from this file, for ease over efficiency. Supervised learning: classification is.bthj.itu.datamining.classification For classification with supervised learning, the knn method was chosen and the target attribute: Do you think there should be more mountains in Denmark? Different combinations of the other attributes, that are both numerical and nominal, were tried to compute the distance between tuples (by commenting out different parts of ClassificationKNN.distanceBetweenTwoTuples that could indeed have been done in a more elegant way). The implementation can be tested by running the mainmethod in the ClassificationKNNclass. Plots of classification accuracy for a few of the different combinations can be seen here below, where the Favorite color attribute alone proves to be best for classifying the tuples, where k = 11 gives 89% accuracy. Distance metric by: color attribute age attribute age, programming skill and operating system all attributes years at university
3 Frequent pattern / association mining is.bthj.itu.datamining.association For finding frequent patterns with a given support and association rules with a given minimum confidence, the Apriori algorithm was implemented and targeted at the Favorite programming languages attribute. The implementation can be tested by running the mainmethod in the Aprioriclass. To test and validate the implementation, data was used from Example 6.3 and Table 6.1 in the textbook, Data Mining Concepts and Techniques, 3rd edition see method Apriori.getTextBookTransactionalData. That proved to be a good idea as it uncovered errors in the implementation, when compared with the results in Example 6.3; One error was in the frequent item set search, where support for candidate sets was found by only comparing the first elements of the set with the first elements of each set in the data, in other words depending the same order of occurrence of the compared elements, instead of searching specifically for the existence of each element in the candidate set, anywhere in each data record set see method Apriori.countSupport. Another uncovered error was in the generation of association rules where the confidence calculation was flawed as confidence( A => B)was computed as support_count( B ) / support_count( A )instead of support_count( A U B) / support_count( A ) see method Apriori.printAssociationRules Output from the implementation, by running the main method in the Apriori class, with support set to 2 and and minimum confidence set to 70%, is the following: ***Frequent itemsets with minimum support: 2 [C, CSharp, Java] [CPlusPlus, CSharp, Java] [CSharp, FSharp, Java] [CSharp, FSharp, Scala] [CSharp, Java, JavaScript] [CSharp, Java, PHP] [CSharp, Java, Python] [CSharp, JavaScript, Python] ***Association rules with minimum conficence = 70% C,CSharp => Java, confidence = 2/2 = 100% C,Java => CSharp, confidence = 2/2 = 100% CPlusPlus,CSharp => Java, confidence = 2/2 = 100% FSharp,Scala => CSharp, confidence = 2/2 = 100% Java,JavaScript => CSharp, confidence = 3/4 = 75% CSharp,PHP => Java, confidence = 7/8 = 88% Java,PHP => CSharp, confidence = 7/8 = 88% PHP => CSharp,Java, confidence = 7/10 = 70% From this we can for example say that Java and JavaScript preference implies CSharp preference, with 75% confidence.
4 Clustering is.bthj.itu.datamining.clustering To cluster the tuples into k numbers of partitions, the k Means technique was implemented. Only one dimension of the data was used to partition by age but more dimensions could easily be added by expanding the method KMeans.getTupleValue. The implementation can be tested by running the mainmethod in the KMeansclass. To measure the quality of the clusters formed in this dimension, for different values of k, the sum of square errors for each partition count k was computed, and as initial cluster centroids are chosen at random, an average of errors from 10 computations for each k was computed: Average of 10 sums of square errors for partition size k = 2: Average of 10 sums of square errors for partition size k = 3: Average of 10 sums of square errors for partition size k = 4: Average of 10 sums of square errors for partition size k = 5: Average of 10 sums of square errors for partition size k = 6: Average of 10 sums of square errors for partition size k = 7: Average of 10 sums of square errors for partition size k = 8: Average of 10 sums of square errors for partition size k = 9: Average of 10 sums of square errors for partition size k = 10: k = k = From this can be seen that k = 6 gives a comparatively low local minimum of error, with a reasonably low number of partitions, so k = 6 seems to be a good choice when clustering the tuples from values in the age attribute. Though clustering is unsupervised, and so has no predefined classes, it could be interesting to look at how well this clustering method performs as a classifier, for example by measuring how dominantly similar single nominal values are within each cluster, like Favorite color, as a measure of goodness, but I ll let the sum of square errors suffice as a measure for now.
5 Conclusion: It has been interesting to get acquainted with those Data Mining methods and I can foresee using them in my future game development. IT University of Copenhagen spring 2014 Björn Þór Jónsson
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationReference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification knearest neighbors
Classification knearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationClassification Techniques (1)
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distancebased Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationCar Insurance. Jan Tomášek Štěpán Havránek Michal Pokorný
Car Insurance Jan Tomášek Štěpán Havránek Michal Pokorný Competition details Jan Tomášek Official text As a customer shops an insurance policy, he/she will receive a number of quotes with different coverage
More informationUniversité de Montpellier 2 Hugo AlatristaSalas : hugo.alatristasalas@teledetection.fr
Université de Montpellier 2 Hugo AlatristaSalas : hugo.alatristasalas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationKNIME TUTORIAL. Anna Monreale KDDLab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDDLab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationData Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationClassification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
More informationComputational Complexity between KMeans and KMedoids Clustering Algorithms for Normal and Uniform Distributions of Data Points
Journal of Computer Science 6 (3): 363368, 2010 ISSN 15493636 2010 Science Publications Computational Complexity between KMeans and KMedoids Clustering Algorithms for Normal and Uniform Distributions
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationKINGS COLLEGE OF ENGINEERING
KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR 20112012 / ODD SEMESTER SUBJECT CODE\NAME: CS1011DATA WAREHOUSE AND DATA MINING YEAR / SEM: IV / VII UNIT I BASICS
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationCPSC 340: Machine Learning and Data Mining. KMeans Clustering Fall 2015
CPSC 340: Machine Learning and Data Mining KMeans Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the
More informationClustering & Association
Clustering  Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects
More informationClassification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of StudentFaculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationIntroduction to Statistical Machine Learning
CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad
More informationCOURSE RECOMMENDER SYSTEM IN ELEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, JanuaryJune 2012, pp. 159164 COURSE RECOMMENDER SYSTEM IN ELEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)II, Walchand
More informationOverview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing
Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised
More informationData Mining Practical Machine Learning Tools and Techniques
Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric
More informationCHAPTER 3 DATA MINING AND CLUSTERING
CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationDecision tree algorithm short Weka tutorial
Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 20092010 Machine Learning: brief summary Example You need to write a program that: given a
More informationData mining knowledge representation
Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness
More informationClustering Algorithms. Data Mining Clustering. Distance. Example. More Than One Mean. Mean Clustering
Clustering Algorithms Data Mining Clustering Kevin Swingler Organise data into a number of distinct groups (clusters) according to the similarity of their members and their differences from other clusters
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.dbbook.com for conditions on reuse Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationOLAP & DATA MINING CS561SPRING 2012 WPI, MOHAMED ELTABAKH
OLAP & DATA MINING CS561SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationData Mining Applications in Manufacturing
Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge  Context Intelligent
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationData Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
More informationData Mining and Clustering Techniques
DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center
More information2 When is a 2Digit Number the Sum of the Squares of its Digits?
When Does a Number Equal the Sum of the Squares or Cubes of its Digits? An Exposition and a Call for a More elegant Proof 1 Introduction We will look at theorems of the following form: by William Gasarch
More informationC19 Machine Learning
C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationKMeans Clustering Tutorial
KMeans Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. KMeans Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationMore Data Mining with Weka
More Data Mining with Weka Class 3 Lesson 1 Decision trees and rules Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 3.1: Decision trees and rules
More informationChapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFLIC, Laboratoire de systèmes d'informations répartis Data Mining  1
Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFLIC, Laboratoire de systèmes d'informations répartis Data Mining  1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationApplying Data Mining of Fuzzy Association Rules to Network Intrusion Detection
Applying Data Mining of Fuzzy Association Rules to Network Intrusion Detection Authors: Aly ElSemary, Janica Edmonds, Jesús GonzálezPino, and Mauricio Papa Center for Information Security Department
More informationUnderstanding the Indian Labour Market: A Data Centric Approach
Understanding the Indian Labour Market: A Data Centric Approach Shabana K M, Tony Gracious, Hrishikesh Subramonian R&D Department Flytxt Trivandrum695581, India shabana.meethian,tony.gracious,hrishikesh.subramonian@flytxt.com
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationBOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on elearning (elearning2014), 2223 September 2014, Belgrade, Serbia BOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationScikitLearn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun
ScikitLearn GUI NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale Client: Dr. Olufisayo Omojokun Presentation Overview Introduction to Machine Learning Importance of Machine Learning Feasibility
More informationApplied Data Mining. Ingo Lütkebohle, Julia Lüning 27.12.2004. 21. Chaos Communication Congress
Applied Data Mining Ingo Lütkebohle, Julia Lüning 21. Chaos Communication Congress 27.12.2004 Outline 1 motivation process of mining data 2 visualisation 3 statistics clustering 4 algorithm tool example
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationProject Report. 1. Application Scenario
Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions
More informationApplying Data Mining to Demand Forecasting and Product Allocations
The Pennsylvania State University The Graduate School Capital College Applying Data Mining to Demand Forecasting and Product Allocations A Master s Paper in Computer Science By Bhavin Parikh @2003 Bhavin
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationSmart Grid Data Analytics for Decision Support
1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 7017774431
More informationMining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The MidAtlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
More informationPrediction of Car Prices of Federal Auctions
Prediction of Car Prices of Federal Auctions BUDT733 Final Project Report Tetsuya Morito Karen Pereira JungFu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend
More informationDATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
More informationSystem for recommending job titles based on user provided titles and categories
Technical Disclosure Commons Defensive Publications Series June 03, 2016 System for recommending job titles based on user provided titles and categories Xuejun Tao Hao Jiang Roger Hernandez Jr Mark Rivera
More informationSampling Distributions and the Central Limit Theorem
135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained
More informationIT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) Thomas Zeutschler
Hochschule Düsseldorf University of Applied Scienses Fachbereich Wirtschaftswissenschaften W Business Analytics (M.Sc.) IT in Business Analytics IT Applications in Business Analytics SS2016 / Lecture 07
More informationCSCIB 565 DATA MINING Project Report for Kmeans Clustering algorithm Computer Science Core Fall 2012 Indiana University
CSCIB 565 DATA MINING Project Report for Kmeans Clustering algorithm Computer Science Core Fall 2012 Indiana University Jayesh Kawli jkawli@indiana.edu 09/17/2012 1. Examining Wolberg s breast cancer
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More information1311. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 1311 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationWeb Mining Patterns Discovery and Analysis Using CustomBuilt Apriori Algorithm
International Journal of Engineering Inventions eissn: 22787461, pissn: 23196491 Volume 2, Issue 5 (March 2013) PP: 1621 Web Mining Patterns Discovery and Analysis Using CustomBuilt Apriori Algorithm
More informationEFFICIENT KMEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING
EFFICIENT KMEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara Punjab Abstract Clustering is an essential
More informationPREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.
More informationData Mining Fundamentals
Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks YoungRae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, MayJun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationApplied Mathematical Sciences, Vol. 7, 2013, no. 112, 55915597 HIKARI Ltd, www.mhikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 55915597 HIKARI Ltd, www.mhikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
More informationData Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationApplication of Data Mining Methods in Health Care Databases
6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes VathyFogarassy Department of Mathematics and
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance Knearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationAssociation Rules Apriori Algorithm. Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example
Association Rules Apriori Algorithm Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example 1 Machine Learning Common ground of presented methods Statistical Learning
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationClustering in Machine Learning. By: Ibrar Hussain Student ID:
Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning Kmeans Clustering Example of kmeans
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More information19922010 by Pearson Education, Inc. All Rights Reserved.
Key benefit of objectoriented programming is that the software is more understandable better organized and easier to maintain, modify and debug Significant because perhaps as much as 80 percent of software
More informationCluster Analysis: Basic Concepts and Methods
10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationEFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract
More informationCity University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015
City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:
More informationClustering and Data Mining in R
Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches
More information