Automated Methods for Fuzzy Systems

Similar documents

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Lecture 3: Linear methods for classification

Least-Squares Intersection of Lines

A Fuzzy System Approach of Feed Rate Determination for CNC Milling

About the NeuroFuzzy Module of the FuzzyTECH5.5 Software

Maximum Likelihood Estimation

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

CORRELATION ANALYSIS

ABSTRACT. Keyword double rotary inverted pendulum, fuzzy logic controller, nonlinear system, LQR, MATLAB software 1 PREFACE

Linear Threshold Units

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Solutions to Homework 5

Project Management Efficiency A Fuzzy Logic Approach

Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application

How To Solve The Cluster Algorithm

In this section, we will consider techniques for solving problems of this type.

Exact Confidence Intervals

Confidence Intervals for Exponential Reliability

AC : MATHEMATICAL MODELING AND SIMULATION US- ING LABVIEW AND LABVIEW MATHSCRIPT

Section 12.6: Directional Derivatives and the Gradient Vector

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

CSCI567 Machine Learning (Fall 2014)

Figure 1. Experimental setup for the proposed MLR-IPSRR system.

WHERE DOES THE 10% CONDITION COME FROM?

Lecture 8 February 4

Introduction to Fuzzy Control

Statistical Machine Learning from Data

University of Lille I PC first year list of exercises n 7. Review

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

Fuzzy Candlestick Approach to Trade S&P CNX NIFTY 50 Index using Engulfing Patterns

Math Numerical Analysis Homework #2 Solutions

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

jorge s. marques image processing

Temporal Difference Learning in the Tetris Game

Zeros of Polynomial Functions

Neural network software tool development: exploring programming language options

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Applications of improved grey prediction model for power demand forecasting

JAVA FUZZY LOGIC TOOLBOX FOR INDUSTRIAL PROCESS CONTROL

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

DERIVATIVES AS MATRICES; CHAIN RULE

A Game-Theoretical Approach for Designing Market Trading Strategies

A Fuzzy-Based Speed Control of DC Motor Using Combined Armature Voltage and Field Current

Section 6.1 Joint Distribution Functions

4 Sums of Random Variables

Lecture 2: The SVM classifier

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Knowledge Base and Inference Motor for an Automated Management System for developing Expert Systems and Fuzzy Classifiers

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Week 2: Exponential Functions

Fast Generation of Implied Volatility Surface for Exchange-Traded Stock Options

Practice problems for Homework 11 - Point Estimation

Nonlinear Optimization: Algorithms 3: Interior-point methods

Lesson 20. Probability and Cumulative Distribution Functions

Statistical Machine Learning

1 The Brownian bridge construction

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

Critical points of once continuously differentiable functions are important because they are the only points that can be local maxima or minima.

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

METNUMER - Numerical Methods

A New Image Edge Detection Method using Quality-based Clustering. Bijay Neupane Zeyar Aung Wei Lee Woon. Technical Report DNA #

1 Sufficient statistics

Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment,

Lecture 2: Universality

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Math 431 An Introduction to Probability. Final Exam Solutions

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

An Overview of Integer Factoring Algorithms. The Problem

Time Series and Forecasting

Dynamic intelligent cleaning model of dirty electric load data

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Point Biserial Correlation Tests

Neural Network-Based Tool Breakage Monitoring System for End Milling Operations

A New Approach For Estimating Software Effort Using RBFN Network

A subjective job scheduler based on a backpropagation neural network

Coefficient of Determination

Class Meeting # 1: Introduction to PDEs

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

Nonlinear Regression:

Metrics on SO(3) and Inverse Kinematics

MAT12X Intermediate Algebra

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Confidence Intervals for Cpk

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

During the analysis of cash flows we assume that if time is discrete when:

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

A FUZZY LOGIC APPROACH FOR SALES FORECASTING

= δx x + δy y. df ds = dx. ds y + xdy ds. Now multiply by ds to get the form of the equation in terms of differentials: df = y dx + x dy.

Microeconomic Theory: Basic Math Concepts

Feb 28 Homework Solutions Math 151, Winter Chapter 6 Problems (pages )

Leran Wang and Tom Kazmierski

Lecture 8. Generating a non-uniform probability distribution

Transcription:

Automated Methods for Fuzzy Systems Gradient Method Adriano Joaquim de Oliveira Cruz PPGI-UFRJ September 2012 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 1 / 41

Summary 1 Introduction Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 2 / 41

Summary 1 Introduction 2 Training Standard Fuzzy System Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 2 / 41

Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 2 / 41

Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 2 / 41

Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 2 / 41

Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 2 / 41

Section Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 3 / 41

A precise model is a contradiction. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 4 / 41

Bibliography Kevin M. Passino, Stephen Yurkovich Fuzzy Control in Chapter 5. Addison Wesley Longman, Inc, USA, 1998. Timothy J. Ross Fuzzy Logic with Engineering Applications. John Wiley and Sons, Inc, USA, 2010. J. R. Jang, C. Sun, E. Mizutani Neuro Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence Prentice Hall, NJ, USA, 1997 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 5 / 41

Constructing fuzzy systems How to construct a fuzzy system from numeric data? Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 6 / 41

Constructing fuzzy systems How to construct a fuzzy system from numeric data? Using data obtained experimentally from a system, it is possible to identify the model. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 6 / 41

Constructing fuzzy systems How to construct a fuzzy system from numeric data? Using data obtained experimentally from a system, it is possible to identify the model. Find a model that fits the data by using fuzzy interpolation capabilities. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 6 / 41

Introduction We need to construct a fuzzy system f(x,θ) that approximate the function g represented in the training data G. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 7 / 41

Introduction We need to construct a fuzzy system f(x,θ) that approximate the function g represented in the training data G. There is no guarantee that it will succeed. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 7 / 41

Introduction We need to construct a fuzzy system f(x,θ) that approximate the function g represented in the training data G. There is no guarantee that it will succeed. It provides a method to tune all parameters of a fuzzy system. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 7 / 41

Section Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 8 / 41

The System Gaussian input membership functions with centers c i j and spreads σ i j. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 9 / 41

The System Gaussian input membership functions with centers c i j and spreads σ i j. Output membership function centers b i. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 9 / 41

The System Gaussian input membership functions with centers c i j and spreads σ i j. Output membership function centers b i. Product for premise and implication. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 9 / 41

The System Gaussian input membership functions with centers cj i and spreads σj i. Output membership function centers b i. Product for premise and implication. Center-average defuzzification. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 9 / 41

The System Gaussian input membership functions with centers c i j and spreads σ i j. Output membership function centers b i. Product for premise and implication. Center-average defuzzification. It is described by f(x θ) = [ R i=1 b n i j=1 exp 1 2 ( ) ] 2 x j cj i σj i [ ( ) ] R 2 n i=1 j=1 exp 1 x j cj i 2 σj i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 9 / 41

Error Suppose that you have the m th training data pair (x,y) G. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Error Suppose that you have the m th training data pair (x,y) G. The GM s goal is to minimize the error between the predicted output value, f(x m θ) and the actual output value y m. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Error Suppose that you have the m th training data pair (x,y) G. The GM s goal is to minimize the error between the predicted output value, f(x m θ) and the actual output value y m. The equation for the error surface is: e m = 1 2 [f(x θ) y]2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Error Suppose that you have the m th training data pair (x,y) G. The GM s goal is to minimize the error between the predicted output value, f(x m θ) and the actual output value y m. The equation for the error surface is: e m = 1 2 [f(x θ) y]2 We seek to minimize e m by choosing the parameters θ that are Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Error Suppose that you have the m th training data pair (x,y) G. The GM s goal is to minimize the error between the predicted output value, f(x m θ) and the actual output value y m. The equation for the error surface is: e m = 1 2 [f(x θ) y]2 We seek to minimize e m by choosing the parameters θ that are b i,cj i and θj i, i = 1,2,...,R, j = 1,2,...,n. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Error Suppose that you have the m th training data pair (x,y) G. The GM s goal is to minimize the error between the predicted output value, f(x m θ) and the actual output value y m. The equation for the error surface is: e m = 1 2 [f(x θ) y]2 We seek to minimize e m by choosing the parameters θ that are b i,cj i and θj i, i = 1,2,...,R, j = 1,2,...,n. R rules, n input variables. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Error Suppose that you have the m th training data pair (x,y) G. The GM s goal is to minimize the error between the predicted output value, f(x m θ) and the actual output value y m. The equation for the error surface is: e m = 1 2 [f(x θ) y]2 We seek to minimize e m by choosing the parameters θ that are b i,cj i and θj i, i = 1,2,...,R, j = 1,2,...,n. R rules, n input variables. θ(k) will be used to denote these parameter s values at time k. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 10 / 41

Section Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 11 / 41

b i Update Law How to adjunt the b i to minimize e m. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 12 / 41

b i Update Law How to adjunt the b i to minimize e m. We will use b i (k +1) = b i (k) λ 1 e m b i k Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 12 / 41

b i Update Law How to adjunt the b i to minimize e m. We will use where i = 1,2,...,R b i (k +1) = b i (k) λ 1 e m b i k Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 12 / 41

b i Update Law How to adjunt the b i to minimize e m. We will use where i = 1,2,...,R b i (k +1) = b i (k) λ 1 e m b i This is the gradiante descent approach. k Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 12 / 41

Gradient Descent The update method would move b i along the negative gradient of the error surface. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 13 / 41

Gradient Descent The update method would move b i along the negative gradient of the error surface. The parameter λ 1 > 0 characterizes the step size. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 13 / 41

Gradient Descent The update method would move b i along the negative gradient of the error surface. The parameter λ 1 > 0 characterizes the step size. If λ 1 is chosen too small, then b i is adjusted very slowly. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 13 / 41

Gradient Descent The update method would move b i along the negative gradient of the error surface. The parameter λ 1 > 0 characterizes the step size. If λ 1 is chosen too small, then b i is adjusted very slowly. If λ 1 is chosen too big, then it may step over the minimum value of e m. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 13 / 41

Gradient Descent The update method would move b i along the negative gradient of the error surface. The parameter λ 1 > 0 characterizes the step size. If λ 1 is chosen too small, then b i is adjusted very slowly. If λ 1 is chosen too big, then it may step over the minimum value of e m. Some algorithms try to adaptively choose the step size. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 13 / 41

Gradient Descent The update method would move b i along the negative gradient of the error surface. The parameter λ 1 > 0 characterizes the step size. If λ 1 is chosen too small, then b i is adjusted very slowly. If λ 1 is chosen too big, then it may step over the minimum value of e m. Some algorithms try to adaptively choose the step size. If the error is big increase λ 1, but if they are decreasing take small steps. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 13 / 41

b i Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 14 / 41

b i Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m b i = (f(x m θ) y m ) f(xm θ) b i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 14 / 41

b i Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m = (f(x m θ) y m ) f(xm θ) b i b i Since f(x θ) = R i=1 b n i j=1 exp 1 2 ( x j c i j ) 2 σ j i ( ) R n i=1 j=1 exp 1 x j c j i 2 2 σ j i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 14 / 41

b i Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m = (f(x m θ) y m ) f(xm θ) b i b i Since f(x θ) = R i=1 b n i j=1 exp 1 2 then e m = (f(x m θ) y m ) b i ( x j c i j ) 2 σ j i ( ) R n i=1 j=1 exp 1 x j c j i 2 2 σ j i n j=1 exp 1 2 ( x j c i j ) 2 σ j i ( ) R n i=1 j=1 exp 1 x j c j i 2 2 σ j i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 14 / 41

b i Update Formula II Let µ i (x m,k) = ( ) n j=1 exp 1 x j c j i 2 2 σ j i ( ) R n i=1 j=1 exp 1 x j c j i 2 2 σ j i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 15 / 41

b i Update Formula II Let µ i (x m,k) = ( ) n j=1 exp 1 x j c j i 2 2 σ j i ( ) R n i=1 j=1 exp 1 x j c j i 2 2 σ j i Let ǫ m (k) = f(x m θ(k)) y m Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 15 / 41

b i Update Formula II Let µ i (x m,k) = ( ) n j=1 exp 1 x j c j i 2 2 σ j i ( ) R n i=1 j=1 exp 1 x j c j i 2 2 σ j i Let ǫ m (k) = f(x m θ(k)) y m Then µ i (x m,k) b i (k +1) = b i (k) λ 1 ǫ m (k) R i=1 µ i(x m,k) Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 15 / 41

Section Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 16 / 41

c i j Update Law We will use c i j (k +1) = ci j (k) λ 2 e m c i j k Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 17 / 41

c i j Update Law We will use c i j (k +1) = ci j (k) λ 2 e m c i j k where λ 2 > 0, i = 1,2,...,R and j = 1,2,...,n Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 17 / 41

c i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 18 / 41

c i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m c i j = ǫ m (k) f(xm θ(k)) µ i (x m,k) µ i (x m,k) cj i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 18 / 41

c i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m c i j Now f(xm θ(k)) µ i (x m,k) = ǫ m (k) f(xm θ(k)) µ i (x m,k) µ i (x m,k) cj i = ( R i=1 µ i(x m,k))b i (k) ( R i=1 b i(k)µ i (x m,k))(1) ( R i=1 µ i(x m,k)) 2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 18 / 41

c i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m c i j Now f(xm θ(k)) µ i (x m,k) So that f(xm θ(k)) µ i (x m,k) = ǫ m (k) f(xm θ(k)) µ i (x m,k) µ i (x m,k) cj i = ( R i=1 µ i(x m,k))b i (k) ( R i=1 b i(k)µ i (x m,k))(1) ( R i=1 µ i(x m,k)) 2 = b i(k) f(x m θ(k)) R i=1 µ i(x m,k) Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 18 / 41

c i j Update Formula II Also we have µ i(x m,k) c i j ( ) = µ i (x m xj,k) m c j i (k) (σj i(k))2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 19 / 41

c i j Update Formula II Also we have µ i(x m,k) c i j ( ) = µ i (x m xj,k) m c j i (k) (σj i(k))2 The update formula for c i j is Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 19 / 41

c i j Update Formula II Also we have µ i(x m,k) c i j ( ) = µ i (x m xj,k) m c j i (k) (σj i(k))2 The update formula for c i j is Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 19 / 41

c i j Update Formula II Also we have µ i(x m ( ),k) cj i = µ i (x m xj,k) m c j i (k) (σj i(k))2 The update formula for cj i is ( ) ( cj(k+1) i = cj(k) λ i b i (k) f(x m θ(k)) x m 2 ǫ m (k) R i=1 µ µ i (x m j cj i,k) (k) ) i(x m,k) (σj i(k))2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 19 / 41

Section Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 20 / 41

σ i j Update Law We will use σ i j (k +1) = σi j (k) λ 3 e m σ i j k Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 21 / 41

σ i j Update Law We will use σ i j (k +1) = σi j (k) λ 3 e m σ i j k where λ 3 > 0, i = 1,2,...,R and j = 1,2,...,n Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 21 / 41

σ i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 22 / 41

σ i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m σ i j = ǫ m (k) f(xm θ(k)) µ i (x m,k) µ i (x m,k) σj i Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 22 / 41

σ i j Update Formula I Erro: e m = 1 2 [f(x θ) y]2 Regra da Cadeia: e m σ i j We already calculated f(xm θ(k)) µ i (x m,k) = ǫ m (k) f(xm θ(k)) µ i (x m,k) µ i (x m,k) σj i = b i(k) f(x m θ(k)) R i=1 µ i(x m,k) Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 22 / 41

σ i j Update Formula II Also we have µ i(x m,k) σ i j ( ) = µ i (x m (x m,k) j cj i(k)2 ) (σj i(k))3 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 23 / 41

σ i j Update Formula II Also we have µ i(x m,k) σ i j ( ) = µ i (x m (x m,k) j cj i(k)2 ) (σj i(k))3 The update formula for σ i j is Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 23 / 41

σ i j Update Formula II Also we have µ i(x m,k) σ i j ( ) = µ i (x m (x m,k) j cj i(k)2 ) (σj i(k))3 The update formula for σ i j is Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 23 / 41

σ i j Update Formula II Also we have µ i(x m ( ),k) σj i = µ i (x m (x m,k) j cj i(k)2 ) (σj i(k))3 The update formula for σj i is ) 2 σj(k i +1) = σj(k) λ i 3 ǫ m (k) b i(k) f(x m θ(k)) (x R i=1 µ µ i (x m j m cj i(k),k) i(x m,k) (σj i(k))3 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 23 / 41

Section Summary 1 Introduction 2 Training Standard Fuzzy System 3 Output Membership Function Centers Update Law 4 Input Membership Function Centers Update Law 5 Input Membership Function Spreads Update Law 6 Example Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 24 / 41

Training Data Set We will use the training data set of the table to illustrate the algorithm. x 1 x 2 y x 1 0 2 1 x 2 2 4 5 x 3 3 6 6 Table: Z = [([x 1,x 2 ],y)] Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 25 / 41

Choosing the step size The algorithm requires that a step size λ be specified for each of the three parameters. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 26 / 41

Choosing the step size The algorithm requires that a step size λ be specified for each of the three parameters. Selecting a large λ will converge faster but may risk overstepping the minimum. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 26 / 41

Choosing the step size The algorithm requires that a step size λ be specified for each of the three parameters. Selecting a large λ will converge faster but may risk overstepping the minimum. Selecting a small step means converging very slowly. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 26 / 41

Choosing the step size The algorithm requires that a step size λ be specified for each of the three parameters. Selecting a large λ will converge faster but may risk overstepping the minimum. Selecting a small step means converging very slowly. In this example the same value will be chosen, so λ 1 = λ 2 = λ 3 = 1. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 26 / 41

Choosing initial values Initial values for the rules must be designated. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 27 / 41

Choosing initial values Initial values for the rules must be designated. For the first rule, we choose x 1 1,x1 2,y1 as the input and output membership centers. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 27 / 41

Choosing initial values Initial values for the rules must be designated. For the first rule, we choose x 1 1,x1 2,y1 as the input and output membership centers. For the second rule, we choose x 2 1,x2 2,y2 as the input and output membership centers. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 27 / 41

Choosing initial values Initial values for the rules must be designated. For the first rule, we choose x 1 1,x1 2,y1 as the input and output membership centers. For the second rule, we choose x 2 1,x2 2,y2 as the input and output membership centers. Select spread equals to 1. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 27 / 41

Choosing initial values Initial values for the rules must be designated. For the first rule, we choose x 1 1,x1 2,y1 as the input and output membership centers. For the second rule, we choose x 2 1,x2 2,y2 as the input and output membership centers. Select spread equals to 1. These values correspond to the zero time step (k = 0). Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 27 / 41

Choosing initial values [ c 1 1 (0) c 1 2 (0) [ c 2 1 (0) c 2 2 (0) Rule1 ] = Rule2 ] = [ 0 2 [ 2 4 ] [ σ 1 1 (0) σ 1 2 (0) ] [ σ 2 1 (0) σ 2 2 (0) ] = ] = [ 1 1 [ 1 1 ] ] b 1 (0) = 1 b 2 (0) = 5 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 28 / 41

Plotting initial values 1 µ(x 1 ) 0.5 c c 11 12 0 0 2 4 6 8 10 x 1 1 µ(x 2 ) 0.5 c 21 c 22 0 0 2 4 6 8 10 x 2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 29 / 41

Calculating predicted outputs Calculate the membership values of the implication of each rule using: ( ) n µ i (x m,k = 0) = exp 1 x m j cj i 2 (k = 0) 2 σj i (k = 0) j=1 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 30 / 41

Calculating predicted outputs Calculate the membership values of the implication of each rule using: ( ) n µ i (x m,k = 0) = exp 1 x m j cj i 2 (k = 0) 2 σj i (k = 0) j=1 Calculate the outputs using (defuzzification): f(x m θ(k = 0)) = R i=1 b i(0)µ i (x m,k = 0) µ i (x m,k = 0) Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 30 / 41

Membership degrees rule 1 [ µ 1 (x 1,0) = exp 1 ( ) ] [ 0 0 2 exp 1 ( ) ] 2 2 2 = 1 2 1 2 1 [ µ 1 (x 2,0) = exp 1 ( ) ] [ 2 0 2 exp 1 ( ) ] 4 2 2 = 0.0183156 2 1 2 1 [ µ 1 (x 3,0) = exp 1 ( ) ] [ 3 0 2 exp 1 ( ) ] 6 2 2 = 3.72665 10 6 2 1 2 1 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 31 / 41

Membership degrees rule 2 [ µ 2 (x 1,0) = exp 1 ( ) ] [ 0 2 2 exp 1 ( ) ] 2 4 2 = 0.0183156 2 1 2 1 [ µ 2 (x 2,0) = exp 1 ( ) ] [ 2 2 2 exp 1 ( ) ] 4 4 2 = 1.0 2 1 2 1 [ µ 2 (x 3,0) = exp 1 ( ) ] [ 3 2 2 exp 1 ( ) ] 6 4 2 = 0.082085 2 1 2 1 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 32 / 41

Defuzzification f(x 1 θ(0)) = b 1(0) µ 1 (x 1,0)+b 2 (0) µ 2 (x 1,0) µ 1 (x 1,0)+µ 2 (x 1,0) f(x 1 θ(0)) = 1 1+5 0.0183156 1+0.0183156 f(x 1 θ(0)) = 1.0719447 f(x 2 θ(0)) = b 1(0) µ 1 (x 2,0)+b 2 (0) µ 2 (x 2,0) µ 1 (x 2,0)+µ 2 (x 2,0) f(x 2 1 0.0183156 +5 1 θ(0)) = 0.0183156 + 1 f(x 2 θ(0)) = 4.92805 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 33 / 41

Defuzzification f(x 3 θ(0)) = b 1(0) µ 1 (x 3,0)+b 2 (0) µ 2 (x 3,0) µ 1 (x 3,0)+µ 2 (x 3,0) f(x 3 θ(0)) = 1 3.72665 10 6 +5 0.082085 3.72665 10 6 +0.082085 f(x 3 θ(0)) = 4.999818 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 34 / 41

Calculating erros e m = 1 2 [f(xm θ(k = 0)) y m ] 2 e 1 = 1 2 [1.0719447 1]2 = 2.58802 10 3 e 2 = 1 2 [4.9280550 5]2 = 2.58802 10 3 e 3 = 1 2 [4.9998180 6]2 = 0.500182 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 35 / 41

Calculating erros e m = 1 2 [f(xm θ(k = 0)) y m ] 2 e 1 = 1 2 [1.0719447 1]2 = 2.58802 10 3 e 2 = 1 2 [4.9280550 5]2 = 2.58802 10 3 e 3 = 1 2 [4.9998180 6]2 = 0.500182 The first two data points are mapped better than the third. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 35 / 41

Calculating erros e m = 1 2 [f(xm θ(k = 0)) y m ] 2 e 1 = 1 2 [1.0719447 1]2 = 2.58802 10 3 e 2 = 1 2 [4.9280550 5]2 = 2.58802 10 3 e 3 = 1 2 [4.9998180 6]2 = 0.500182 The first two data points are mapped better than the third. The result can be improved by cycling through the model. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 35 / 41

Calculating erros e m = 1 2 [f(xm θ(k = 0)) y m ] 2 e 1 = 1 2 [1.0719447 1]2 = 2.58802 10 3 e 2 = 1 2 [4.9280550 5]2 = 2.58802 10 3 e 3 = 1 2 [4.9998180 6]2 = 0.500182 The first two data points are mapped better than the third. The result can be improved by cycling through the model. The GM will update the rule-base parameters b i,c i j and σ i j using the first time step. Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 35 / 41

Updating... ǫ m (k = 0) = f(x m θ(k = 0)) y m ǫ 1 (0) = 1.0719447 1 = 0.0719447 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 36 / 41

Updating b i b i (k) = b i (k 1) λ 1 (ǫ k (k 1)) µ i (x k,k 1) R i=1 µ i(x k,k 1) µ 1 (x 1,0) b 1 (1) = b 1 (0) λ 1 (ǫ 1 (0)) µ 1 (x 1,0)+µ 2 (x 1,0) ( ) 1 = 1 1 (0.0719447) = 0.9644354 1+0.0183156 µ 2 (x 1,0) b 2 (1) = b 2 (0) λ 1 (ǫ 1 (0)) µ 1 (x 1,0)+µ 2 (x 1,0) ( ) 0.0183156 = 5 1 (0.0719447) = 4.998706 1+0.0183156 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 37 / 41

Updating c 1 j [ ] cj i (k) = ci j (k 1) λ b i (k 1) f(x k θ(k 1)) 2(ǫ k (k 1)) R i=1 µ i(x k,k 1) ( ) x k µ i (x k j cj i (k 1),k 1) (σj i (k 1))2 ( c1 1 (1) = c1 1 (0) 1ǫ b1 (0) f(x 1 ) ( θ(0)) x 1(0) µ 1 (x 1,0)+µ 2 (x 1 µ 1 (x 1 1,0) 1 c1 1(0) ),0) (σ1 1(0))2 c1 1 (1) = 0 ( c2 1 (1) = c2 1 (0) 1ǫ b1 (0) f(x 1 ) ( θ(0)) x 1(0) µ 1 (x 1,0)+µ 2 (x 1 µ 2 (x 1 1,0) 2 c2 1(0) ),0) (σ2 1(0))2 c 1 2 (1) = 2 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 38 / 41

Updating c 2 j ( c1 2 (1) = c2 1 (0) 1ǫ b2 (0) f(x 1 ) ( θ(0)) x 1(0) µ 1 (x 1,0)+µ 2 (x 1 µ 2 (x 1 1,0) 1 c1 2(0) ),0) (σ1 2(0))2 c1 1 (1) = 2.010166 ( c2 2 (1) = c2 2 (0) 1ǫ b2 (0) f(x 1 ) ( θ(0)) x 1(0) µ 1 (x 1,0)+µ 2 (x 1 µ 2 (x 1 1,0) 2 c2 2(0) ),0) (σ2 2(0))2 c2 2 (1) = 4.010166 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 39 / 41

Updating σ i j [ ] σj i (k) = σi j (k 1) λ b i (k 1) f(x k θ(k 1)) 3(ǫ k (k 1)) R i=1 µ i(x k,k 1) ( ) (x k j cj i (k 1))2 σ 1 1 (1) = 1 σ 1 2 (1) = 1 µ i (x k,k 1) σ1 2 (1) = 0.979668 σ2(1) 2 = 0.979668 (σj i (k 1))3 Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 40 / 41

The End Adriano Cruz (PPGI-UFRJ) Gradient Method 09/2012 41 / 41