Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition



Similar documents
Cursive Handwriting Recognition for Document Archiving

III. SEGMENTATION. A. Origin Segmentation

Lecture 6. Artificial Neural Networks

Handwritten Digit Recognition with a Back-Propagation Network

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

1 Maximum likelihood estimation

Graph Mining and Social Network Analysis

The Backpropagation Algorithm

HMM : Viterbi algorithm - a toy example

Statistical Machine Learning

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Bayesian Network Modeling of Hangul Characters for On-line Handwriting Recognition

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Multi-Class and Structured Classification

Off-line Handwriting Recognition by Recurrent Error Propagation Networks

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Analecta Vol. 8, No. 2 ISSN

6.2.8 Neural networks for data mining

Generalization Capacity of Handwritten Outlier Symbols Rejection with Neural Network

Coding and decoding with convolutional codes. The Viterbi Algor

Hardware Implementation of Probabilistic State Machine for Word Recognition

Data Mining using Artificial Neural Network Rules

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

How To Understand The Network Of A Network

Data Mining: An Overview. David Madigan

IBM SPSS Neural Networks 22

Automatic parameter regulation for a tracking system with an auto-critical function

Hidden Markov Models

Introduction to General and Generalized Linear Models

Towards Online Recognition of Handwritten Mathematics

Machine Learning and Data Mining -

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Car Insurance. Havránek, Pokorný, Tomášek

Batch Production Scheduling in the Process Industries. By Prashanthi Ravi

Java Modules for Time Series Analysis

Optimization Modeling for Mining Engineers

CSCI567 Machine Learning (Fall 2014)

DLP Driven, Learning, Optical Neural Networks

Lecture 13: Validation

Research on the UHF RFID Channel Coding Technology based on Simulink

POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches

4.2 Description of the Event operation Network (EON)

DESIGN OF DIGITAL SIGNATURE VERIFICATION ALGORITHM USING RELATIVE SLOPE METHOD

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Goals of the Unit. spm adolfo villafiorita - introduction to software project management

An Introduction to Neural Networks

Tracking Groups of Pedestrians in Video Sequences


An Approach for Utility Pole Recognition in Real Conditions

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

Logistic Regression for Spam Filtering

COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU

Signature Segmentation from Machine Printed Documents using Conditional Random Field

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

L13: cross-validation

Offline sorting buffers on Line

Università degli Studi di Bologna

Comp Fundamentals of Artificial Intelligence Lecture notes, Speech recognition

Low-resolution Character Recognition by Video-based Super-resolution

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

A Learning Based Method for Super-Resolution of Low Resolution Images

Simplified Machine Learning for CUDA. Umar

Chapter 4: Artificial Neural Networks

Environmental Remote Sensing GEOG 2021

Data Mining and Neural Networks in Stata

VEHICLE ROUTING PROBLEM

Hybrid Evolution of Heterogeneous Neural Networks

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

1. Classification problems

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Application of Neural Network in User Authentication for Smart Home System

Predictive Dynamix Inc

Segmentation & Clustering

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Semi-Supervised Support Vector Machines and Application to Spam Filtering

CONATION: English Command Input/Output System for Computers

Lecture 2: The SVM classifier

Document Image Retrieval using Signatures as Queries

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

Efficient on-line Signature Verification System

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Predict Influencers in the Social Network

Neural network software tool development: exploring programming language options

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

Neural Networks algorithms and applications

Reinforcement Learning

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Signature verification using Kolmogorov-Smirnov. statistic

Transcription:

1 Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition Émilie CAILLAULT LASL, ULCO/University of Littoral (Calais) Christian VIARD-GAUDIN IRCCyN, University of Nantes

2 Aim Problem : Training of a neuro-markovian system in the context of cursive handwritten word recognition. Difficulty : Bootstrapping of the system allowing a training from scratch. Proposed solution : Implicit segmentation procedure Definition of a segmentation control function Definition of a quality segmentation measurement

3 How train a neuro-markovian system? Character level training Teaching Signal: character Signal Différents hybrides MMC-MMC RN-MMC Classifier HMM MMC - RN SVM-MMC Kppv-MMC Word level training Teaching signal: word RN-MMC Parole/Hors-ligne Classifier HMM

Global system overview : MS-TDNN 1- MsTDNN-HMM 4 xr y dx dy cx cy P/D P O bs1 1 state / character: 66 states TD NN O bs1 O bs n O bservation w indows O t of the TD NN... TDN N inputs : tem poral sequence of features vectors for the w ord «quand» z... 3 states / c b a character: 198 states = 198 output neurons O bs1 Sam e weigths m atrices O bs n TD NN O bst Sam e convolutional w indow inside the observation window s of the TD NN O bs n O bservations vectors sequences processed by the H M M Scanning of the input word by a TDNN (temporal convolution). TDNN produces the sequence of observation probabilities. No explicit segmentation is carried out but just an overlapping window is regularly shifted. Advantages of the TDNN architecture: weight sharing reduction in computation times and in memory requirements. Reduction of the number of computation with an adapted topology of the TDNN due to the sliding observation window.

1- MsTDNN-HMM Dynamic Programming Likelihood computation 5 Word Lexicon ---------------------- un deux trois quand q s q 1 q 2 q 3 q f q s q 1 q f Words HMM = concatenation of letters HMM 1-state 3-state letter letter model model q s 1 u 1 n 1 qf q s 1 q 1 u 1 a 1 n 1 d 1 q f DP DP Likelihood Scores For each entry in the lexicon, a pseudo-hmmword model is constructed dynamically by concatenating letter HMM Hence, the likelihood for each word in the lexicon is computed by multiplying the observation probabilities over the best path through the graph using the Viterbi algorithm. The word model with the highest likelihood is the top one recognition candidate. Character models a b c d.n o p q.u.z a ij = cst transition probability O1 O2 O3.. b j (O t ), observation probability Best Viterbi path Dynamic Programming to find the best path alignment Ot, Observation sequences q q q q u a a n n d d d

Training Original data 1- MsTDNN-HMM Training : 6 TDNN Delta x - coord y - coord x - direction Y _ direction P ( Class 1 X ) P ( Class 2 X ) P ( Class 3 X ) P ( Class 4 X ) Global cost function Backpropagation of the gradient Normalized data 1.. Sequence of features vectors Normalisation Delta x - coord y - coord x - direction Y _ direction P ( Class 1 X ) P ( Class 2 X )... P ( Class 3 X ) P ( Class 4 X ) Obs 1 ObsT Sequence of vectors of observation probabilities, DP Features Extraction Character Likelihood Word Likelihood Computation Recognition Word Lexicon List of words ranked by their WL score N Word level training Classical backpropagation algorithm Cost function mixing MLE, MMI and character level discrimination [ICDAR 2005] «one» recognized as «over» True Hmm: (+) Best Hmm: ( ) ooonneee oovveeer Best character: ( ) aouvneel

Illustration of the segmentation problem 2- Implicit Segmentation Control 7 Random initialisation of the TDNN all outputs are equally probable Example: Possible paths for the English word of with 10 observations (TDNN windows) o f offfffffff ooooffffff ooooooofff ooffffffff ooooofffff ooooooooff ooofffffff ooooooffff ooooooooof

How to obtain desired Likelihood distribution? 2- Implicit Segmentation Control 8 Segmentation paths for the word of offfffffff 1 ooffffffff ooofffffff ooooffffff ooooofffff ooooooffff ooooooofff ooooooooff ooooooooof 9 P(Γ) Existing solutions: Training with a character database - duration model for HMM sub-unit [Penacee] - manual segmentation constraints [Remus, Npen++] Proposed solution: - no modification of the HMM structure (transitions=1) - no training with a character database - integration of a duration model in the emission probabilities

Weighting function f T : 2- Implicit Segmentation Control 9 Outputstl (, ) = min( Outputs ( t, l)* f ( H );1) NN T Weighting of TDNN outputs % 100 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 B H Observations range corresponding to one letter H=100 H>1000 E B = First observation position for the concerned letter E = Last observation position for the concerned letter H = Height of the function ƒ T S = E-B = letter size

Weighting function applied : case of the word of 2- Implicit Segmentation Control 10 100 weighting coefficient 50 o f 1 0 1 2 3 4 5 6 7 8 9 10 observation position H=100

2- Implicit Segmentation Control How to evaluate the effectiveness of this weighting function? ASR 11 ASR N N NLs( e) 1 nb _ obsi ( ) = ( ) NLci ( )* T / NLe ( ) e= 1 i= 1 ASR : Averaged homogenous Segmentation Rate e: index of the considered sample T: number of observations scanned by the TDNN N: (fixed or limited) number of training samples NL(e): number of letters in the word label NLs(e): number of sequences of different letters i: index of a sequence of letters from 1 to NLs NLc(i): number of the same consecutive letter and nb_obs(i): number of observations of the letter i in the optimal Viterbi path. Case: of NL(o) NL(f) 1 9 2 8 3 7 4 6 5 5 6 4 7 3 8 2 9 1 ASR 0.36 0.64 0.84 0.96 1 0.96 0.84 0.64 0.36

Experiments : 12 Original data Normalized data Normalisation Training in 2 stages : Global cost function Backpropagation of the gradient TDNN Delta x - coord y - coord x - direction Y _ direction P ( Class 1 X ) P ( Class 2 X ) P ( Class 3 X ) P ( Class 4 X ).. Sequence of features vectors P ( Class 1 X ) P ( Class 2 X )... P ( Class 3 X ) P ( Class 4 X ) Obs 1 ObsT Sequence of vectors of observation probabilities Weighting function ƒ T ƒ T P ( Class 1 X ) P ( Class 2 X ) P ( Class 3 X ) P ( Class 4 X ) 1 True HMM Word Likelihood Computation Features Extraction Delta x - coord y - coord x - direction Y _ direction Character Likelihood Label of the true word P ( Class 1 X ) P ( Class 2 X )... P ( Class 3 X ) P ( Class 4 X ) Obs 1 ObsT Constrained Sequence wtih duration model,, N 1-f T segmentation constraint function for 20 epochs 2- training with no segmentation constraint

Results on online word IRONOFF database IRONOFF word database English / French Lexicon : 197 words 2/3 Training : 20 898 words 1/3 TEST : 10 448 words 13 System 1State- TDNN 1S-TDNN Segmentation constraint 3State-TDNN 3S-TDNN Segmentation constraint Recognition rate (%) 87.09 91.31 90.51 94.81 Relative improvement (%) 32.6 28.1 Average Segmentation Rate (ASR) 0.423 0.498 0.612 0.684 Relative improvement (%) 17.7 11.8

Conclusions 14 A simple and effective process to bootstrap a hybrid system without any labeled character database Important increasing of the recognition rate by an initialisation stage with a duration model apply directly on the TDNN outputs Weighting function ƒ T Better quality of segmentation better recognition rates ASR Averaged Segmentation Rate

Future works : controlled segmentation stage 15 Measure of the ASR for each training iteration Adaptation of the parameter H in the weighting function to reduce the segmentation constraints according to the ASR value. H=ƒ(ASR). Initialisation with H = 100 high segmentation constraint, only balanced segmentation paths are possible Then relax constraint on H when ASR has reached a reasonable value.

16 Bonus

Introduction Problème Reconnaissance Car. Reconnaissance Mot Conclusion 17 Le moteur de reconnaissance : le TDNN MLP TDNN......... Global Vision Local vision to global vision Weight sharing