Machine Learning and Data Mining -



Similar documents
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Chapter 4: Artificial Neural Networks

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Recurrent Neural Networks

Application of Neural Network in User Authentication for Smart Home System

Lecture 6. Artificial Neural Networks

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Neural Network Design in Cloud Computing

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Neural Networks and Support Vector Machines

Role of Neural network in data mining

6.2.8 Neural networks for data mining

An Introduction to Neural Networks

Neural network software tool development: exploring programming language options

Neural Networks algorithms and applications

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Introduction to Artificial Neural Networks

Analecta Vol. 8, No. 2 ISSN

How To Use Neural Networks In Data Mining

Stock Prediction using Artificial Neural Networks

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

NEURAL NETWORKS A Comprehensive Foundation

An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage

SEMINAR OUTLINE. Introduction to Data Mining Using Artificial Neural Networks. Definitions of Neural Networks. Definitions of Neural Networks

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Data Mining and Neural Networks in Stata

129: Artificial Neural Networks. Ajith Abraham Oklahoma State University, Stillwater, OK, USA 1 INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

Artificial neural networks

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Novelty Detection in image recognition using IRF Neural Networks properties

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Biological Neurons and Neural Networks, Artificial Neurons

Effect of Using Neural Networks in GA-Based School Timetabling

NEURAL NETWORKS IN DATA MINING

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

American International Journal of Research in Science, Technology, Engineering & Mathematics

Machine Learning: Multi Layer Perceptrons

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Learning to Process Natural Language in Big Data Environment

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

The KDD Process: Applying Data Mining

Power Prediction Analysis using Artificial Neural Network in MS Excel

Chapter 12 Discovering New Knowledge Data Mining

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Chapter 6. The stacking ensemble approach

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

A Time Series ANN Approach for Weather Forecasting

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Neural Networks and Back Propagation Algorithm

Comparison of K-means and Backpropagation Data Mining Algorithms

NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

NEUROMATHEMATICS: DEVELOPMENT TENDENCIES. 1. Which tasks are adequate of neurocomputers?

1. Classification problems

Price Prediction of Share Market using Artificial Neural Network (ANN)

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

Neural Networks in Data Mining

Neural Computation - Assignment

A Review of Data Mining Techniques

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Models of Cortical Maps II

Introduction to Machine Learning Using Python. Vikram Kamath

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

DYNAMIC LOAD BALANCING OF FINE-GRAIN SERVICES USING PREDICTION BASED ON SERVICE INPUT JAN MIKSATKO. B.S., Charles University, 2003 A THESIS

Multiple Layer Perceptron Training Using Genetic Algorithms

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

A Neural Network Based System for Intrusion Detection and Classification of Attacks

Neural Network Add-in

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

Machine Learning: Overview

A Content based Spam Filtering Using Optical Back Propagation Technique

ARTIFICIAL NEURAL NETWORKS FOR DATA MINING

Fall 2012 Q530. Programming for Cognitive Science

Learning is a very general term denoting the way in which agents:

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

Credit Card Fraud Detection Using Self Organised Map

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

THE HUMAN BRAIN. observations and foundations

Data Mining Techniques Chapter 7: Artificial Neural Networks

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

ABSTRACT The World MINING R. Vasudevan. Trichy. Page 9. usage mining. basic. processing. Web usage mining. Web. useful information

An Artificial Neural Networks-Based on-line Monitoring Odor Sensing System

The Backpropagation Algorithm

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Development of a Network Configuration Management System Using Artificial Neural Networks

Feature Subset Selection in Spam Detection

Transcription:

Machine Learning and Data Mining - Perceptron Neural Networks Nuno Cavalheiro Marques (nmm@di.fct.unl.pt) Spring Semester 2010/2011 MSc in Computer Science

Multi Layer Perceptron Neurons and the Perceptron Model logic with neurons (McCulloch and Pitts, 1943). x 0 usually is set to 1, for all samples. In that case w 0 is called neuron bias.

Multi Layer Perceptron Most Relevant Work in Neural Networks McCulloch and Pitts in 1943 Introduces Neural Networks as computing devices:a Logical Calculus of the Ideas Immanent in Nervous Activity. Hebb, 1949 The basic model of network self-organization. Rosenbatt,1958 Learning with a teacher and the Perceptron Minsky and Papert (1969), show that a perceptron can not learn a simple XOR multilayer (feedforward) perceptron (or MLP), and backpropagation of the error: Rumelhart, Hinton, and Williams (1986)

Learning Perceptron Weights Backpropagation Basic Algorithm Backpropagation learning in networks of perceptrons. BP (and variants) are the best way to train MLPs. Generalized Delta Rule: w = η (E[ w])

Why Several Neural Network models? Models, Paradigms and Methods The exact description of a biological neuron is still unknown. In Neuro-physiology each cell is a complex dynamical system controlled by electric fields and chemical neurotransmitters. (Mathematical) Model Representation of an existing system which presents (some) knowledge in usable form. Finite set of variables and interactions. (Eg. Scientific) Method A way how to solve a given problem. Eg. Bodies of techniques for investigating phenomena, acquiring, correcting and integrating knowledge. Collection of data through observation and experimentation, and the formulation and testing of hypotheses.

Why Several Neural Network models? Models, Paradigms and Methods for ANN Paradigms Model or case example of a general theory Paradigms for some biological neuronal behaviour. Available models are incorrect or incomplete regarding the biological neuron. What is the goal? Understand the biological brain? Get better machine learning methods? Solve specific problems? Question for each new model: what is its added value? (Can we solve it with a previous model?) Eg. Self Organizing Models (SOM): Unsupervised model that can be used for clustering and visualizing data.

Additional Models ANN models JASTAP Recurrent SOM ART PNN RBF... Relevant problems for new models How to train/learn? How to understand and represent knowledge? What is this model contribution?

Additional Models Spiking models Implement more neuro-realistic models, taking time into account Input is a time series Some works use spike input trains.

Additional Models Issues on Any Neural Networks How can NN learn in more complex models? Genetic algorithms can learn weights and architecture. Generalised BP for spiking neurons but model harder to understand. What are the advantages of more complex models? Even in simple MLPs: only sub-optimal learning (e.g. Vapnik). Simulate or understand biology? Berger: Bio-realistic differential population models. Spiking models are good models of biological neurons? In SOM: hundreds of simple units similar to biological neural maps. Hans Moravec: why use silicon to duplicate biology. Hölldobler: MLP implement any logical function kernel.

Backpropagation Basic Algorithm Backpropagation learning in networks of perceptrons. BP (and variants) are the best way to train MLPs. Generalized Delta Rule: w = η (E[ w])

Gradient Descent: concepts (from [TM97]) I Consider simpler linear unit, where: o = w 0 + w 1 x 1 + + w n x n Let s learn w i s that minimize the squared error of simpler linear unit E[ w] 1 (t d o d ) 2 2 d D Where D is set of training examples.

Gradient Descent: concepts (from [TM97]) II Gradient: Training rule: i.e., E[ w] [ E, E, E ] w 0 w 1 w n w = η E[ w] w i = η E w i

Gradient Descent: concepts (from [TM97]) III E = 1 w i w i 2 = 1 2 d = 1 2 d (t d o d ) 2 d w i (t d o d ) 2 2(t d o d ) w i (t d o d ) = (t d o d ) (t d w x d ) w i d E w i = (t d o d )( x i,d ) d

Gradient Descent Algorithm for Perceptron (from [TM97]) I Gradient-Descent(training examples, η) Each training example is a pair of the form x, t, where x is the vector of input values, and t is the target output value. η is the learning rate (e.g.,.05). Note: [TM97] is assuming the linear unit Initialize each w i to some small random value Until the termination condition is met, Do Initialize each w i to zero. For each x, t in training examples, Do Input the instance x to the unit and compute the output o

Gradient Descent Algorithm for Perceptron (from [TM97]) II For each linear unit weight w i, Do w i w i + η(t o)x i For each linear unit weight w i, Do w i w i + w i

Sigmoid versus step function Step function x > 0 is non differentiable. Sigmoid functions are similar but differentiable. Function: σ(x) = 1 1 + e x has the nice property: = σ(x)(1 σ(x)) σ(x) x

Error Gradient for a Sigmoid Unit (from [TM97]) I E = 1 w i w i 2 (t d o d ) 2 d D = 1 2 d = 1 2(t d o d ) 2 = d d w i (t d o d ) 2 (t d o d ) (t d o d ) w i ) ( o d w i = d (t d o d ) o d net d net d w i

Error Gradient for a Sigmoid Unit (from [TM97]) II But we know: o d = σ(net d) = o d (1 o d ) net d net d So: net d = ( w x d) = x i,d w i w i E = (t d o d )o d (1 o d )x i,d w i d D

MLP Backpropagation Algorithm (from [TM97]) I Initialize all weights to small random numbers. Until satisfied, Do For each training example, Do 1 Input the training example to the network and compute the network outputs 2 For each output unit k 3 For each hidden unit h δ k o k (1 o k )(t k o k ) δ h o h (1 o h ) k outputs w h,k δ k

MLP Backpropagation Algorithm (from [TM97]) II 4 Update each network weight w i,j where w i,j w i,j + w i,j w i,j = ηδ j x i,j

Backpropagation properties More on Backpropagation (from [TM97]) Gradient descent over entire network weight vector Easily generalized to directed graphs (Elman) Will find a local, not necessarily global error minimum In practice, often works well (can run multiple times) Often include weight momentum α w i,j (n) = ηδ j x i,j + α w i,j (n 1) Minimizes error over training examples Will it generalize well to subsequent examples? Training can take thousands of iterations slow! Using network after training is very fast

Backpropagation properties MLP-Decision Regions Single Perceptron: linear discriminator unit One hidden layer: any logic function

Backpropagation properties Overfitting in Neural Networks Too much units on hidden layers can prevent good generalization. Blackbox: Non-localized knowledge representation

Backpropagation properties Local Minima with Backpropagation Algorithm Local minima: usual problem with greedy methods. Wrong starting point can be problematic. Usual near zero random initialization can have problems. We should have a good starting point (based on our knowledge). We should avoid traps in learning space (based on our

Visualization of the Neural Network Using Visualization for Learning Analysis Discovering what the neural Network learned Sensitivity Analysis Rule Generation Standard Graphics Neural Network Graphic Hinton Diagram for weights Clustering and Segmentation Visualization Using Domain knowledge to view the output

Visualization of the Neural Network Tom Mitchell s Binary Encoding (in Mitchell, 1997)

Visualization of the Neural Network Tom Mitchell s Iterations for Binary Encoding (in Mitchell, 1997)

Visualization of the Neural Network POS Tagger Network Graphic PCT DET N ADJ V PREP CONJ WH ADV HAVE Vt BE PRONt DO AUX PRON PN INT Input Hidden Output -2-1 0 1 2 1 2 3

Visualization of the Neural Network POS Tagger weights Hidden Units... @ 2... @ 1... @0... @1... @1 Input Units Output Units Hidden Units

NN as a Logic Function Including Knowledge in Neural Networks Both logic and neural networks are two founding areas in AI. Logic is applied to Computational Logic or Logic Programming We can reason over structured objects. Neural Networks are particularly good for modeling sub-symbolic information Can learn, adapt to new environments, graceful degradation. Used in Industrial processes or controllers Namely after training, specialized hardware and very fast response time. Core method intends to integrate both approaches.

NN as a Logic Function Issues on Neuro-Symbolic Integration I BP doesn t work so well with strict logic rules ([BHM08]). In machine learning we need ways to guide learning. Propositional Fixation (only atomic propositions). Care should be taken while encoding symbols. FOL and logical arguments can be taken into account. Neural Networks can implement an Universal Turing machine. We need recursion (difficult to understand). In general, no clear semantics in integrated information. Only by understanding the starting model and the learning that is taking place we can advance beyond black-box approaches to machine learning.

NN as a Logic Function Issues on Neuro-Symbolic Integration II Neural network usage in data mining requires good knowledge and explanation of data. Can the basic ideas be extended to other models and knowledge representations?

Symbolic or Subsimbolic How can neural networks take advantage of express knowledge? Knowledge extraction achieved by machine learning in neural networks (backpropagation) After proper training, networks extract meaningful knowledge from data Implicitly: classification and clustering models Explicitly: Neuro-symbolic approach New relevant connections in learned model (i.e. we can understand the neural network) Domain knowledge (rules) used to guide learning Feed-Forward neural networks have many parameters Rules Bias learning to a nearby local-minima Recursion (and neuro-computation) can be inserted with Elman Neural Networks.

Neuro-Symbolic Networks: the Core Method Core method basic algorithm [HK,94] Implements the T p operator (next logical step) If appropriate, recursion will lead to stable models. m nb. of propositions in P Input and output layer have size m, each unit has threshold 0.5 and represents one proposition. For each clause A L 1 L 2... L k with l 0 in P: 1 Add a binary unit c to hidden layer. 2 For each literal L j Lj, 1 j k, connect input unit for L j to c with weight ω if L j is an positive literal, or ω if L j is a negative literal. 3 Set threshold T c of unit c as l ω 0.5, where l is the number of positive literals.

Neuro-Symbolic Networks: the Core Method Discussion of ω while encoding rules For ω = 0 network stops to improve at some point. For ω > 0 network outperforms network ω = 0. For ω = 5 network did not learn very well. Avoiding Local Minima With errors on only a few samples, back-propagation can get into a local minima.

Neuro-Symbolic Networks: the Core Method The MLP s Sigmoid Function [M09] Understanding Backpropagation Learning in the Core Method Networks

Neuro-Symbolic Networks: the Core Method Backpropagation and NeSy encoding I Parameter ω gives stability to rules while training. Recall gradient trains sigmoid units with: E w i = d (t d o d ) o d net d x i,d. (1) bigger values of ω make the sigmoid approximate a stepwise function and make go near zero. o d net d good if certain on the rules. Not advisable for rules needing revision. There is a learning zone that depends on ω and net. Free units (with near zero weights) are the first to learn.

Probabilistic/Numeric Encodings Basic Core Method for Probabilistic/Numeric Encodings How to handle other inputs (e.g. a probability of some observation). Problem with direct use of probabilities, in core method: 0.6 0.5, but 0.6 0.6 = 0.36 0.5 We need additional constraints, e.g.: X > T X. But can the network learn the best value of T X? Use a sigmoid neuron having as input X and bias ω T X : 1 alpha < will be true iff X > T X 1 1 + e ω (X T X ) < alpha.

Probabilistic/Numeric Encodings Illustrative Example: Learning AND 1HL Original Core Method 2HL Extended Core Method Figure: Initial core methods neural networks, before training and initial weight disturbance.

Probabilistic/Numeric Encodings Differences in Backprop for Learning AND Figure: Train error for learning an AND with four distinct encodings.

Elman Neural Network [E1990] Core Method can not encode variable length patterns Novel methods for encoding temporal rules to extend the power of neuro-symbolic integration. Elman neural networks allow rules with preconditions at non-deterministic time (e.g. important in NLP [FP01] (Analogical) Recursive connections allow a super-turing power (e.g. Siegelmann, 1994/1999). Learning space: bias learning to a nearby local-minima and allow complex (but controled) feedback loops.

The Classic Elman Neural Network Elman Neural Networks Neuronal Unit Elman Neural Network

The Classic Elman Neural Network Learning in Elman Neural Networks Non-parametric statistical models learning from examples. Error Backpropagation algorithm iteratively adjusts weights. Semi-recursive neural network using Backpropagation-Through Time. Find patterns in a sequences of values.

Main References Main References Mitchell, T.M.: Machine Learning. McGraw-Hill (March 1997) Steffen Hölldobler and Yvonne Kalinke, Towards a massively parallel computational model for logic programming, in Proceedings ECAI94 Workshop on Combining Symbolic and Connectionist Processing, pp. 68 77. ECAI, (1994). Jeffrey L. Elman Finding Structure in Time In Cognitive Science vol. 14, 1990.

Other References Other References I David E. Rumelhart and Geoffrey E. Hinton and Ronald Williams, Learning representations by back-propagating errors, in Nature (323), pp. 533-536. (October 1986). Sebastian Bader, Steffen Hölldobler and Nuno Marques. Guiding Backprop by Inserting Rules Proceedings ECAI94 Workshop on Neuro-Symbolic Integration., ECCAI, (2008). Nuno Marques. An Extension of the Core Method for Continuous Values New Trends in Artificial Intelligence. APPIA, (2009).