1 Practical Introduction to Machine Learning and Optimization Alessio Signorini
2 Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization technique: Timing your walk to catch a bus Picking the best road to get somewhere Groceries to buy for the week (and where) Organizing flights or vacation Buying something (especially online) Nowadays it is a fundamental tool for almost all corporations (e.g., insurance, groceries, banks,...)
3 Evolution: Successful Optimization Evolution is probably the most successful but least famous optimization system. The strongest species survive while the weakest die. Same goes for reproduction among the same specie. The world is tuning itself. Why do you think some of us are afraid of heights, speed or animals?
4 What is Optimization? Choosing the best element among a set of available alternatives. Sometimes it is sufficient to choose an element that is good enough. Seeking to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. First technique (Steepest Descent) invented by Gauss. Linear Programming invented in 1940s.
5 Various Flavors of Optimization I will mostly talk about heuristic techniques (return approximate results), but optimization has many subfields, as for example: Linear/Integer Programming Quadratic/Nonlinear Programming Stochastic/Robust Programming Constraint Satisfaction Heuristics Algorithms Called programming due to US Military programs
6 Machine Learning Automatically learn to recognize complex patterns and make intelligent decisions based on data. Today machine learning has lots of uses: Search Engines Speech and Handwriting Recognition Credit Cards Fraud Detection Computer Vision and Face Recognition Medical Diagnosis
7 Problems Types In a search engine, machine learning tasks can be generally divided in three main groups: Classification or Clustering Divide queries or pages in known groups or groups learned from the data. Examples: adult, news, sports,... Regression Learn to approximate an existing function. Examples: pulse of a page, stock prices,... Ranking Not interested in function value but to relative importance of items. Examples: pages or images ranking,...
8 Algorithms Taxonomy Algorithms for machine learning can be broadly subdivided between: Supervised Learning (e.g., classification) Unsupervised Learning (e.g., clustering) Reinforcement Learning (e.g., driving) Other approaches exists (e.g., semi-supervised learning, transduction learning, ) but the ones above are the most practical ones.
9 Whatever You Do, Get Lots of Data Whatever is the machine learning task, you need three fundamental things: Lots of clean input/example data Good selection of meaningful features A clear goal function (or good approximation) If you have those, there is hope for you. Now you just have to select the appropriate learning method and parameters.
10 Classification Divide objects among a set of known classes. You basically want to assign labels. Simple examples are: Categorize News Articles: sports, politics, Identify Adult or Spam pages Identify the Language: EN, IT, EL,... Features can be: words for text, genes for DNA, time/place/amount for credit cards transactions,...
11 Classification: naïve Bayes Commonly used everywhere, especially in spam filtering. For text classification it is technically a bad choice because it assumes words independence. During training it calculates a statistical model for words and categories. At classification time it uses those statistics to estimate the probability of each category.
12 Classification: DBACL Available under GPL at To train a category given some text use dbacl -l sport.bin sport.txt To classify unknown text use dbacl -U -c sport.bin -c politic.bin article.txt OUTPUT: sport.bin 100% To get negative logarithm of probabilities use dbacl -n -c sport.bin -c politic.bin article.txt OUTPUT: sport.bin politic.bin
13 Classification: Hierarchy When the categories are more than 5 or 6 do not attempt to classify against all of them. Instead, create a hierarchy. For example, first classify among sports and politics, if sports is chosen, then classify among basketball, soccer or golf. Pay attention: a logical hierarchy is not always the best for the classifier. For example, Nascar should go with Autos/Trucks and not sports.
14 Classification: Other Approaches There are many other approaches: Latent Semantic Indexing Neural Networks, Decision Trees Support Vector Machines And many other tools/libraries: Mallet LibSVM Classifier4J To implement, remember: log(x*y) = log(x) + log(y)
15 Clustering The objective of clustering is similar to classification but the labels are not know and need to be learned from the data. For example, you may want to cluster together all the news around the same topic, or similar results after a search. It is very useful in medicine/biology to find nonobvious groups or patterns among items, but also for sites like Pandora or Amazon.
16 Clustering: K-Means Probably the simplest and most famous clustering method. Works reasonably well and is usually fast. Requires to know at priori the number of clusters (i.e., not good for news or results). Define distance measure among items. Euclidean distance sqrt(sum[(pi-qi)^2]) is often a simple option. Not guaranteed to converge to best solution.
17 Clustering: Lloyd's Algorithm Each cluster has a centroid, which is usually the average of its elements. At startup: Partition randomly the objects in N clusters. At each iteration: Recompute centroid for each cluster. Assign each item to closest cluster. Stop after M iterations or when no changes.
19 Clustering: Lloyd's Algorithm Since it is very sensitive to startup assignments, it is sometimes useful to restart multiple times. When cluster numbers is not known but in a certain range, you can execute the algorithm for different N values and pick best solution. Software Available: Apache Mahout Mathlab kmeans
20 Clustering: Min-Hashing Simple and fast algorithm: 1) Create hash (e.g., MD5) of each word 2) Signature = smallest N hashes Example: Similar to what OneRiot has done with its own... 23ce4c f ecb 3ea e5e0 0f ce4c ea e5e0 7562ecb... The signature can be used directly as ID of the cluster. Or consider results as similar if there is a good overlap among signatures.
21 Decision Trees Decision trees are predictive models that map observations to conclusions on its target output. PRODUCT CEO G B BOARD G B B G OK TOBIAS Y N FAIL COMPETITOR Y N OK FAIL FAIL OK
22 Decision Trees After enough examples, it is possible to calculate the frequency of hitting each leaf. CEO G B PRODUCT BOARD G B B G OK 30% TOBIAS Y N FAIL 10% COMPETITOR Y N OK FAIL 30% 10% FAIL OK 10% 10%
23 Decision Trees From the frequencies, it is possible to extrapolate early results in nodes and make decisions early. CEO G B PRODUCT BOARD G B B G OK TOBIAS FAIL COMPETITOR 30% Y N 10% Y N CEO G B OK=60% OK=10% PRODUCT BOARD G B B G OK=30% OK=10% OK TOBIAS FAIL COMPETITOR 30% Y N 10% Y N OK FAIL 30% 10% FAIL OK 10% 10% OK FAIL 30% 10% FAIL OK 10% 10%
24 Decision Trees: Information Gain Most of the algorithms are based on Information Gain, a concept related to the Entropy of Information Theory. At each step, for each variable V left, compute Vi = ( -Pi * log(pi) ) + ( -Ni * log(ni) ) where Pi is the fraction of items labeled positive for variable Vi (e.g., CEO = Good) and Ni is the fraction labeled negative (e.g., CEO = Bad).
25 Decision Trees: C4.5 Available at To train create names and data file GOLF.names Play, Don't Play. outlook: temperature: Humidity: Windy: Then launch c4.5 -t 4 -f GOLF sunny, overcast, rain. continuous. continuous. true, false. GOLF.data sunny, 85, 85, false, Don't Play sunny, 80, 90, true, Don't Play overcast, 83, 78, false, Play rain, 70, 96, false, Play rain, 65, 70, true, Don't Play overcast, 64, 65, true, Play
26 Decision Trees: C4.5 Output Cycle Tree -----Cases Errors size window other window rate other rate total rate % % % % % % % % % % 0 0.0% 0 0.0% outlook = overcast: Play outlook = sunny: humidity <= 80 : Play humidity > 80 : Don't Play outlook = rain: windy = true: Don't Play windy = false: Play Trial Before Pruning After Pruning Size Errors Size Errors Estimate 0 8 0( 0.0%) 8 0( 0.0%) (38.5%) << 1 8 0( 0.0%) 8 0( 0.0%) (38.5%)
27 Support Vector Machine SVM can be used for classification, regression and ranking optimization. It is flexible and usually fast. Attempts to construct a set of hyperplanes which have the largest distance from the closest datapoint of each class. The explanation for regression it is even more complicated. I will skip it here but there are plenty of papers available on the web.
28 SVM: svm-light Available at To train an SVM model create data file RANKING 3 qid:1 1:0.53 2:0.12 3: qid:1 1:0.13 2:0.1 3: qid:1 1:0.27 2:0.5 3: qid:2 1: : qid:2 1:0.87 2:0.12 3:0.45 REGRESSION 1.4 1:0.53 2:0.12 3: :0.13 2:0.1 3: :0.27 2:0.5 3: : : :0.87 2:0.12 3:0.45 Then launch svm_learn pulse.data pulse.model
29 SVM: Other Tools Available There are hundreds of libraries for SVM: LibSVM Algorithm::SVM PyML TinySVM There are executables built on top of most of them and they usually accept the same input format of svm-light. May need a script to extract features importance.
30 Genetic Algorithms Genetic Algorithms are flexible, simple to implement and can be quickly adapted to lots of optimization tasks. They are based on evolutionary biology: strongest species survive, weakest die, offsprings are similar to parents but may have random differences. This kind of algorithms is used wherever there are lots of variables and values and approximated solutions are acceptable (e.g., protein folding).
31 GA: The Basic Algorithm Startup: Create N random solutions Algorithm: 1) compute fitness of solutions 2) breed M new solutions 3) kill M weakest solutions Repeat the algorithm for K iterations or until there is no improvement.
32 GA: The Basic Algorithm During breeding (step 2) remember to follow biology rules: Better individuals are more likely to find a (good) mate Offsprings carry genes from each parent There is always the possibility of some random genetic mutations
33 GA: Relevance Example Each solution is a set of weights for the various attributes (e.g., title weights, content weight,...). The fitness of each solution is given by the delta with editorial judgments (e.g., DCG). During breeding, you may take title and content weight from parent A, description from parent B, When a mutation occurs, the weight of that variable is picked at random.
34 One of the Problems I Work On We have a set of users and for each we know the movies they like among a given set. Extrapolate the set of features (e.g., Julia Roberts, Thriller, Funny,...) that each movie has so that it is liked by all the users which like it. Not interested in what the features are or represent: we are fine with just a bunch of IDs. This could save/improve the life of lots of people.
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science email@example.com What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research firstname.lastname@example.org Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang email@example.com http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
Flat Clustering K-Means Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov firstname.lastname@example.org Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Contents Dedication List of Figures List of Tables Foreword Preface Acknowledgments v xiii xvii xix xxi xxv Part I Concepts and Techniques 1. INTRODUCTION 3 1 The Quest for Knowledge 3 2 Problem Description
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Mammoth Scale Machine Learning! Speaker: Robin Anil, Apache Mahout PMC Member! OSCON"10! Portland, OR! July 2010! Quick Show of Hands!# Are you fascinated about ML?!# Have you used ML?!# Do you have Gigabytes
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (email@example.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, firstname.lastname@example.org Department of Electrical Engineering, Stanford University Abstract Given two persons
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Some Core Learning Representations Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision trees Learning Rules Association rules
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu email@example.com
Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University
Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland firstname.lastname@example.org Abstract Neural Networks (NN) are important
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
Introduction to Artificial Intelligence G51IAI An Introduction to Data Mining Learning Objectives Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees
Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means
Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto email@example.com @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning
Additional Topics: Big Data Lecture #2 Algorithms for Big Data Joseph Bonneau firstname.lastname@example.org April 30, 2012 Today's topic: algorithms Do we need new algorithms? Quantity is a quality of its own Joseph
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
Understanding Network Traffic An Introduction to Machine Learning in Networking Pedro CASAS FTW - Communication Networks Group Vienna, Austria 3rd TMA PhD School Department of Telecommunications AGH University
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University email@example.com CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business: