Identification algorithms for hybrid systems

Similar documents

Identification of Hybrid Systems

An Overview of Knowledge Discovery Database and Data mining Techniques

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

What is Linear Programming?

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Data Mining - Evaluation of Classifiers

Support Vector Machines with Clustering for Training with Very Large Datasets

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

MACHINE LEARNING IN HIGH ENERGY PHYSICS

A fast multi-class SVM learning method for huge databases

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

Content-Based Recommendation

Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Chapter 6. The stacking ensemble approach

Discrete Optimization

Simple and efficient online algorithms for real world applications

: Introduction to Machine Learning Dr. Rita Osadchy

Core Curriculum to the Course:

Support Vector Machine (SVM)

Edge tracking for motion segmentation and depth ordering

Big Data - Lecture 1 Optimization reminders

Statistical Machine Learning

Lecture 3: Linear methods for classification

3. Linear Programming and Polyhedral Combinatorics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Functional Data Analysis of MALDI TOF Protein Spectra

Social Media Mining. Data Mining Essentials

K-Means Clustering Tutorial

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Module1. x y 800.

Lecture 10: Regression Trees

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Introduction to Pattern Recognition

Data Mining. Nonlinear Classification

Machine Learning Introduction

Linear Programming I

Azure Machine Learning, SQL Data Mining and R

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

4.1 Learning algorithms for neural networks

Master of Mathematical Finance: Course Descriptions

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

MSCA Introduction to Statistical Concepts

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

A Statistician s View of Big Data

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Applied mathematics and mathematical statistics

Machine Learning Logistic Regression

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

NEURAL NETWORKS IN DATA MINING

2.3 Convex Constrained Optimization Problems

Computer Animation and Visualisation. Lecture 1. Introduction

A Learning Based Method for Super-Resolution of Low Resolution Images

CSE 167: Lecture 13: Bézier Curves. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011

The Scientific Data Mining Process

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Lecture 2: August 29. Linear Programming (part I)

DATA MINING TECHNIQUES AND APPLICATIONS

Intrusion Detection via Machine Learning for SCADA System Protection

life science data mining

Local features and matching. Image classification & object localization

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Statistics for BIG data

Classification algorithm in Data mining: An Overview

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classiﬁca6on

For example, estimate the population of the United States as 3 times 10⁸ and the

Online Learning in Biometrics: A Case Study in Face Classifier Update

Thorsten Bonato. Contraction-based Separation and Lifting for Solving the Max-Cut Problem

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

OPTIMIZATION AND FORECASTING WITH FINANCIAL TIME SERIES

Largest Fixed-Aspect, Axis-Aligned Rectangle

Volume matching with application in medical treatment planning

A Simple Introduction to Support Vector Machines

How To Use Neural Networks In Data Mining

Minkowski Sum of Polytopes Defined by Their Vertices

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Introduction to Machine Learning Using Python. Vikram Kamath

STA 4273H: Statistical Machine Learning

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Machine Learning with MATLAB David Willingham Application Engineer

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Part-Based Recognition

Dynamic Process Modeling. Process Dynamics and Control

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Learning is a very general term denoting the way in which agents:

How To Write A Data Processing Pipeline In R

24. The Branch and Bound Method

Transcription:

Identification algorithms for hybrid systems Giancarlo Ferrari-Trecate

Modeling paradigms Chemistry White box Thermodynamics System Mechanics... Drawbacks: Parameter values of components must be known Symplifying assumptions Not feasible if first-principles are not available (e.g. economics, biology,...) Black box System Experimental data: Identification algorithm Mathematical model Huge literature on identification of linear and smooth, nonlinear systems

Hybrid identification What about identification of hybrid models? Is it really a problem? First guess: Each mode of operation is a linear/nonlinear system Resort to known identification methods for each mode! Not always feasible!

Motivating example Identification of an electronic component placement process Fast component mounter (courtesy of Assembleon) 12 mounting heads working in parallel Maximum throughput: 96.000 components per hour

Placement of the electronic component on the Printed Circuit Board Experimental setup Mounting head Moving impacting surface Ground connection

Schematic representation 2 basic modes of operation : free mode impact mode Mounting head (M) p M F p F Input: Motor force F Elastic impact surface M Output: Head position p Problem: the position of the impact surface is not measured The switch between modes is not measured

Experimental data Head position Identification data Force Validation data Which are the data generated in the free and impact modes? How to reconstruct the switching rule?

Hybrid identification Data could be naturally labeled according to finitely many modes of operation Each mode has unknown quantitative dynamics 3 modes Each mode has a linear behavior Goals: extract, at the same time, the switching mechanism and the mode dynamics

Hybrid identification: applications Domains: Engineering: mechanical systems with contact phenomena, current transformers, traction control Computer vision: motion segmentation (Wang & Adelson, 1993), (Vidal et al.,, 2006-2007) Signal processing: signal segmentation (Heredia & Gonzalo, 1996) Biology and medicine: sleep apneas detection form ECG, pulse detection from hormone concentration,... Tomorrow: identification of genetic regulatory networks (HYGEIA PhD School on Hybrid Systems Biology)

Outline of the lectures 1. Preliminaries: polytopes, PWA maps, PWARX models 2. Identification of PWARX models 3. The key difficulty: classification of the data points 4. Three identification algorithms: Clustering-based procedure Algebraic procedure Bounded-error procedure 5. Back to the motivating example: identification results

Preliminaries: polytopes, PWA maps, PWARX models

Let Preliminaries: polytopes Hyperplane: Half-space: Polyhedron: - Polyhedra are convex and closed sets Polytope: bounded polyhedron Not Necessarily Closed (NNC) polyhedron: convex and s.t. is a polyhedron Polyhedral partition of the polyhedron : finite collection of NNC polyhedra such that and

{X i } s i=1 Preliminaries: PWA maps is a polyhedral partition of the polytope Switching function: Ingredients: domain: number of modes: modes: Parameter Vectors (PVs): Regions:

PWARX models AutoRegressive exogenous (ARX) model of orders : Vector of regressors: ARX model (MISO) PieceWise ARX (PWARX) models of orders : is a PWA map PWARX model (MISO) PWA map: the interaction between logic/continuous components is modeled through discontinuities and the regions shape

Identification of PWARX models

Identification of PWARX models Dataset (noisy samples of a PWARX model) : Identification: reconstruct the PWA map from

Identification of PWARX models Dataset (noisy samples of a PWARX model) : Identification: reconstruct the PWA map from Standing assumptions: 1) Known model orders 2) Known regressor set (physical constraints) Estimate: The number of modes The PVs The regions

The key difficulty: data classification

Known switching sequence Sub-problem: classification -mode dataset:

Sub-problem: classification Known switching sequence -mode dataset: Pattern recognition algorithms Estimates of the regions

Sub-problem: classification Known switching sequence -mode dataset: Pattern recognition algorithms Estimates of the regions

Sub-problem: classification Known switching sequence -mode dataset: Pattern recognition algorithms Estimates of the regions Least squares on Estimates of the PVs

Sub-problem: classification Known switching sequence -mode dataset: Pattern recognition algorithms Estimates of the regions Least squares on Estimates of the PVs Classification problem: estimation of the switching sequence All algorithms for the identification of PWARX models solve, implicitly or explicitly, the classification problem!

An introduction to pattern recognition

Pattern recognition: the two-class problem Data: finite set of labeled points class 1: + class 2: + + + Problem: + find a hyperplane that separates the two classes, i.e. If such a hyperplane exists, the classes are linearly separable The separating hyperplane is not unique The optimal (in a statistical sense) separating hyperplane is unique and can be computed by solving a quadratic program. Sub-optimal hyperplanes can be computed through linear programs

Pattern recognition: the two-class problem The inseparable case Problem: find an hyperplane that minimizes an error measure related to misclassified data points + + + + + + + Again: The optimal separating hyperplane is unique and can be computed by solving a quadratic program (Support Vector Classification) (Vapnik, 1998) Sub-optimal hyperplanes can be computed through linear programs (Robust Linear Programming) (Bennet & Mangasarian, 1993)

Estimation of the regions The region is defined by the hyperplanes separating from Two strategies: 1) Pairwise separation: Many QPs/LPs of small size (the number of variables scales linearly with the number of data involved in each separation problem) Problem: possible holes in the set of regressors! 2) Multi-class separation: Separate simultaneously all classes Problem: existing algorithms amount to QPs/LPs of big size (the number of variables scales linearly with the total number of data) { {{ { + { { ++ + + + { {{ { + { { ++ + + +

Three identification algorithms Clustering-based procedure Algebraic procedure Bounded-error procedure

The clustering-based procedure (Ferrari-Trecate et al., 2003)

Clustering-based procedure: introduction Standing assumptions: 1) known number of modes 2) (just for sake of simplicity) Key idea: PWA maps are locally linear. If the local models around two data points are similar, it is likely that the data points belong to the same mode of operation Steps of the algorithm: 1) associate to each data point a local affine model 2) aggregate local models with similar features into clusters 3) classify in the same way data points corresponding to local models in the same cluster

Clustering-based procedure - Step 1 Extract local models 1. For each data point (x(k),y(k)) build a Local Dataset (LD) collecting (x(k),y(k)) and its first c à 1 neighboring points (c >n) 2. Fit a local linear model on each C k through Least Squares Local Parameter Vector (LPV) ò L and associated variance Data space LPV space k C k V k

Clustering-based procedure - Step 1 Extract local models 1. For each data point (x(k),y(k)) build a Local Dataset (LD) collecting (x(k),y(k)) and its first c à 1 neighboring points (c >n) 2. Fit a local linear model on each C k through Least Squares Local Parameter Vector (LPV) ò L and associated variance Data space LPV space k C k V k

Clustering-based procedure - Step 1 Extract local models 1. For each data point (x(k),y(k)) build a Local Dataset (LD) collecting (x(k),y(k)) and its first c à 1 neighboring points (c >n) 2. Fit a local linear model on each C k through Least Squares Local Parameter Vector (LPV) ò L and associated variance Data space LPV space k C k V k

Clustering-based procedure - Step 1 Extract local models 1. For each data point (x(k),y(k)) build a Local Dataset (LD) collecting (x(k),y(k)) and its first c à 1 neighboring points (c >n) 2. Fit a local linear model on each C k through Least Squares Local Parameter Vector (LPV) ò L and associated variance Data space LPV space k Pure LDs Pure LPVs C k V k Mixed LD Mixed LPV

Clustering-based procedure - Step 1 Extract local models 1. For each data point (x(k),y(k)) build a Local Dataset (LD) collecting (x(k),y(k)) and its first c à 1 neighboring points (c >n) 2. Fit a local linear model on each C k through Least Squares Local Parameter Vector (LPV) ò L and associated variance Data space LPV space k Pure LDs Pure LPVs C k V k Mixed LD Mixed LPVs

Clustering-based procedure - Step 2 Clustering of the LPVs Find the clusters and their centers that minimize LPV space LPV space D 2 D 1 D 3