8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1
Retrospective Natural Language Processing Name and explain different areas of NLP What are the 7 levels of language understanding? What is tokenizing, sentence splitting, POS tagging, and parsing? What do language resources offer to NLP? Give examples What do NLP frameworks offer? Give examples What do NLP services offer? Give examples 2
Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 3
What is Machine Learning (ML)? Generating a model based on inputs and using it for making decisions or predictions ( rather than programming instructions explicitly ) 4
Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 5
Applications of ML: Spam filtering Task: classify new e-mails as spam or not spam New e-mails Automatically classified Spam filter ML input Manually classified Corrections 6
Stock market analysis Task: make recommendations on buying and selling stocks Current stock values Prediction Recommendation ML input Decision History of stock values 7 Image source: Wikimedia
Detecting credit card fraud Task: Detect fraud in credit card payments CC payments Automatically classified Fraud detection ML input Manually classified Corrections 8
Recommender systems Task: Recommending customers suitable products Order Recommendation of related products Recommender system ML input Purchasing behaviour of other customers or customer groups 9
Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 10
Categories of ML tasks Machine Learning Task Supervised Learning Unsupervised Learning Reinforcement Learning Classification Regression Clustering Feature selection / extraction Topic modeling P.S. Other categorizations / groupings are possible 11
Categories of ML tasks Supervised learning Given: Example inputs and desired outputs Goal: Learn a general rule that maps inputs to outputs Unsupervised learning Given: Data inputs (e.g., documents) Goal: Find structure in the inputs Reinforcement learning Setting: An agent interacts with a dynamic environment in which it must perform a goal Goal: Improving the agent s behaviour 12
Supervised learning subcategories Classification Given: Training inputs (records) which are divided into two or more classes Goal: Produce model to classify new inputs Examples: spam filter, fraud detection, Regression Given: Training data (records) with continuous (not discrete) output values Goal: Produce model to predict output values for new inputs Example: stock value prediction 13
Unsupervised learning subcategories Clustering Given: Set of input records Goal: Identifying clusters (groups of similar records) Example: Customer grouping Feature selection / extraction Given: Set of input records with attributes ( features ) Goal: Find a subset of the original attributes that are equally well suited for classification / clustering tasks Topic modeling Given: Set of text documents Goal: Find abstract topics that occur in several documents and classify documents accordingly 14
Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 15
Decision Tree Learning Used for supervised learning (classification, regression) Training input: Training data (records) with output values (discrete or continuous Learning result: decision tree that allows classifying / predicting output values of new data records Example (figure): Decision tree for classfying passengers on the Titanic in survived / died 16 Image source: Wikipedia
Artificial Neural Networks (ANN) Inspired by brain / nervous system: - Neurons connected via dentrites - Reduce resistance if fired repeatedly Artificial Neuron: - Weighted inputs - Function, e.g., weighted sum - Filter, e.g, threshold output Artificial Neural Network (ANN): - Input layer, output layer, and possibly intermediate layers of neurons - Training phase: weights are adjusted via known cases - Regognition phase: output is produced for new cases 17 Prof. Source: Dr. Bernhard Ivan Galkin, Humm, U. MASS Darmstadt Lowell University ( http://ulcar.uml.edu/~iag/cs/intro-to-ann.html of Applied Sciences. www.fbi.h-da.de/~b.humm. ) 18.11.2014
Bayesian Networks Directed acyclic graph (DAG) with: - Nodes: random variables + probability function - Edges: conditional dependencies Example: - Probablility of rain - Sprinkler is turned on if it hasn t rained for a while - Grass is wet if it is raining or the sprinkler is turned on Bayes Network inference allows answering questions like: - What is the probability that it is raining, given the grass is wet? - What is the impact of turning the sprinkler on? 18 Source: http://en.wikipedia.org/wiki/bayesian_network
Inductive Logic Programming Given: - Set of logic facts (background knowledge), e.g. male(tom), female(eve), parent (Tom, Eve) - Positive and / or negative examples, e.g., daughter (Eve, Tom) Learning goal: - General rules that are consistent with the examples and the background knowledge, e.g., parent(p1, p2) and female(p2) daughter(p2, p1) male female George parent Helen Mary Tom Nancy 19 Eve
Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 20
WEKA http://www.cs.waikato.ac.nz/ml/weka/ 21
Tasks supported by WEKA Numerous approaches for supervised and unsupervised learning Preprocess Choose and modify the data being acted on Classify Cluster Train and test learning schemes that classify or perform regression Learn clusters for the data Associate Learn association rules for the data Select attributes Select the most relevant attributes in the data Visualize View an interactive 2D plot of the data 22
WEKA Datasets Collection of examples Each instance consists of attributes Attribute types: - Nominal (enumeration) - Numeric (real or integer number) - String Example: @relation golfweathermichigan_1988/02/10_14days @attribute outlook {sunny, overcast, rainy} @attribute windy {TRUE, FALSE} @attribute temperature real @attribute humidity real @attribute play {yes, no} @data sunny,false,85,85,no sunny,true,80,90,no overcast,false,83,86,yes rainy,false,70,96,yes rainy,false,68,80,yes 23
WEKA GUI 24
Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 25
ML Services Map ML services ML development environments / frameworks ML libraries Web services for for experimenting with different ML approaches and configuring solutions IDEs and frameworks for experimenting with different ML approaches and configuring solutions Algorithms for classification, regression, clustering, feature selection / extraction, tropic modelling, etc. using different approaches, e.g., decision tree learning, Artificial Neural Networks, Bayes networks, inductive logic programming, Support Vector machines, Hidden Markov Chains, etc. 26
ML Product Map ML services ML development environments / frameworks bigml, wise.io, procog, ersatz, WEKA, Orange, Shogun, scikt-learn, ML libraries Eblearn, OpenNN, aisolver, CURRENNT, 27
ML product map (table) Product ML library ML development environment / framework Java Neural Network Framework Neuroph x x ML service Fast Artificial Neural Network Library eblearn x x Jaden x x OpenNN - Open Neural Networks Library aisolver CURRENNT x x x WEKA x x Orange x x Shogun x x scikit-learn x x bigml wise.io procog ersatz x x x x 28