A Continuous Restricted Boltzmann Machine with a Hardware-Amenable Learning Algorithm



Similar documents
What is Candidate Sampling

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Lecture 2: Single Layer Perceptrons Kevin Swingler

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Calculating the high frequency transmission line parameters of power cables

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

An Interest-Oriented Network Evolution Mechanism for Online Communities

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Damage detection in composite laminates using coin-tap method

Recurrence. 1 Definitions and main statements

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

An Alternative Way to Measure Private Equity Performance

Statistical Approach for Offline Handwritten Signature Verification

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Calculation of Sampling Weights

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Time Domain simulation of PD Propagation in XLPE Cables Considering Frequency Dependent Parameters

Implementation of Deutsch's Algorithm Using Mathcad

Logistic Regression. Steve Kroon

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Quantization Effects in Digital Filters

Biometric Signature Processing & Recognition Using Radial Basis Function Network

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Faraday's Law of Induction

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

On Autoencoders and Score Matching for Energy Based Models

Hybrid-Learning Methods for Stock Index Modeling

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Hedging Interest-Rate Risk with Duration

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

IMPACT ANALYSIS OF A CELLULAR PHONE

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Support Vector Machines

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Learning from Multiple Outlooks

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

Portfolio Loss Distribution

Modelling of Web Domain Visits by Radial Basis Function Neural Networks and Support Vector Machine Regression

On the Use of Neural Network as a Universal Approximator

Detecting Global Motion Patterns in Complex Videos

The Application of Fractional Brownian Motion in Option Pricing

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Traffic State Estimation in the Traffic Management Center of Berlin

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Support vector domain description

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

1 Example 1: Axis-aligned rectangles

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

Negative Selection and Niching by an Artificial Immune System for Network Intrusion Detection

Large Scale Extreme Learning Machine using MapReduce

A FEATURE SELECTION AGENT-BASED IDS

Big Data Deep Learning: Challenges and Perspectives

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

A Secure Password-Authenticated Key Agreement Using Smart Cards

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Gender Classification for Real-Time Audience Analysis System

Stochastic Six-Degree-of-Freedom Flight Simulator for Passively Controlled High-Power Rockets

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

Mooring Pattern Optimization using Genetic Algorithms

A Crossplatform ECG Compression Library for Mobile HealthCare Services

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Detecting Credit Card Fraud using Periodic Features

How To Create An Emoton Recognzer

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Vehicle Detection and Tracking in Video from Moving Airborne Platform

Snake-Based Segmentation of Teeth from Virtual Dental Casts

Transcription:

A Contnuous Restrcted Boltzmann Machne wth a Hardware-Amenable Learnng Algorthm Hsn Chen and Alan Murray Dept. of Electroncs and Electrcal Engneerng, Unversty of Ednburgh, Mayfeld Rd., Ednburgh, EH93JL, UK, {Hsn.Chen,A.F.Murray}@ee.ed.ac.uk Abstract. Ths paper proposes a contnuous stochastc generatve model that offers an mproved ablty to model analogue data, wth a smple and relable learnng algorthm. The archtecture forms a contnuous restrcted Boltzmann Machne, wth a novel learnng algorthm. The capabltes of the model are demonstrated wth both artfcal and real data. 1 Introducton Probablstc generatve models offer a flexble route to mproved data modellng, wheren the stochastcty represents the natural varablty of real data. Our prmary nterest s n processng and modellng analogue data close to a sensor nterface. It s therefore mportant that such models are amenable to analogue or mxed-mode VLSI mplementaton. The Product of Experts (PoE) has been shown to be a flexble archtecture and Mnmsng Contrastve Dvergence (MCD) can underpn a smple learnng rule [1]. The Restrcted Boltzmann Machne (RBM) [2] wth an MCD rule has been shown to be amenable to further smplfcaton and use n real applcatons [3]. The RBM has one hdden and one vsble layer wth only nter-layer connectons. Let s and s j represent the states of the stochastc unts, j, and w j be the nterconnect weghts. The MCD rule for RBM replaces the computatonallyexpensve relaxaton search of the Boltzmann Machne wth: w j = η(< s s j > 0 < ŝ ŝ j > 1 ) (1) ŝ and ŝ j correspond to one-step Gbbs sampled reconstructon states, and <> denotes expectaton value over the tranng data. By approxmatng the probabltes of vsble unts as analogue-valued states, the RBM can model analogue data [1][3]. However, the bnary nature of the hdden unt causes the RBM to tend to reconstruct symmetrc analogue data only, as wll be shown n Sect. 3. The rate-coded RBM (RBMrate) [4] removes ths lmtaton by samplng each stochastc unt for m tmes. The RBMrate unt thus has dscrete-valued states, whle retanng the smple learnng algorthm of (1). RBMrate offers an mproved ablty to model analogue mage [4], but the repettve samplng wll cause more spkng nose n the power supples of a VLSI mplementaton, placng the crcuts n danger of synchronsaton [5]. J.R. Dorronsoro (Ed.): ICANN 2002, LNCS 2415, pp. 358 363, 2002. c Sprnger-Verlag Berln Hedelberg 2002

A Contnuous Restrcted Boltzmann Machne 359 2 The Contnuous Restrcted Boltzmann Machne 2.1 A Contnuous Stochastc Unt Addng a zero-mean Gaussan wth varance σ 2 to the nput of a sampled sgmodal unt produces a contnuous stochastc unt as follows: ( ) s j = ϕ j w j s + σ N j (0, 1), (2) wth ϕ j (x j )=θ L +(θ H θ L ) 1 1 + exp( a j x j ) (3) where N j (0, 1) represents a unt Gaussan, and ϕ j (x) s a sgmod functon wth lower and upper asymptotes at θ L and θ H, respectvely. Parameter a j controls the steepness of the sgmod functon, and thus the nature of the unt s stochastc behavour. A small value of a j renders nput nose neglgble and leads to a near-determnstc unt, whle a large value of a j leads to a bnary stochastc unt. If the value of a j renders the sgmod lnear over the range of the added nose, the probablty of s j remans Gaussan wth mean w js and varance σ 2. Replacng the bnary stochastc unt n RBM by ths contnuous form of stochastc unt leads to a contnuous RBM (CRBM). 2.2 CRBM and Dffuson Network The model and learnng algorthms of the Dffuson Network (DN) [6][7] arse from ts contnuous stochastc behavour, as descrbed by a stochastc dfferental equaton. A DN conssts of n fully-connected unts and an n n real-valued matrx W, defnng the connecton-weghts. Let x j (t) be the state of neuron j n a DN. The dynamcal dffuson process s descrbed by the Langevn equaton: ) dx j (t) =κ j ( w j ϕ (x (t)) ρ j x j (t) dt + σ db j (t) (4) where 1/κ j > 0 and 1/ρ j > 0 represent the nput capactance and resstance of neuron j. db j (t) s the Brownan moton dfferental [7]. The ncrement, B j (t + dt) B j (t), s thus a zero-mean Gaussan random varable wth varance dt. The dscrete-tme dffuson process for a fnte tme ncrement t s: x j (t + t) =x j (t)+κ j w j ϕ (x (t)) t κ j ρ j x j (t) t + σz j (t) t (5) where z j (t) s a Gaussan random varable wth zero mean and unt varance. If κ j ρ j t = 1, the terms n x j (t) cancel and wrtng σ t = σ, ths becomes: x j (t + t) =κ j w j ϕ (x (t)) t + σ z j (t) (6)

360 H. Chen and A. Murray If w j = w j and κ j s constant over the network, the RHS of (6) s equvalent to the total nput of a CRBM as gven by (2). As s j = ϕ j (x j ), the CRBM s smply a symmetrc restrcted DN (RDN), and the learnng algorthm of the DN s thus a useful canddate for the CRBM. 2.3 M.C.D. Learnng Algorthms for the CRBM The learnng rule for the parameter λ j of the DN s [6]: λ j =<S λj > 0 <S λj > (7) where <> 0 refers to the expectaton value over the tranng data wth vsble states clamped, and <> to that n free-runnng equlbrum. S λj s the systemcovarate [6], the negatve dervatve of the DN s energy functon w.r.t. parameter λ j. The restrcted DN can be shown to be a PoE [8] and we choose to smplfy (7) by once agan mnmsng contrastve dvergence [1]. ˆλ j =<S λj > 0 < Ŝλ j > 1 (8) where <> 1 ndcates the expectaton values over one-step sampled data. Let ϕ(s) represent ϕ j (s) wth a j = 1. The energy functon of CRBM can be shown to be smlar to that of the contnuous Hopfeld model [9][6]. U = 1 w j s s j + 2 j ρ a s 0 ϕ 1 (s)ds (9) (8) and (9) then lead to the MCD learnng rule for the CRBM s parameters: wˆ j = η w (<s s j > 0 < ŝ ŝ j > 1 ) (10) ( ) ρ sj j â j = η a a 2 ϕ 1 (s)ds j (11) where ŝ j denotes the one-step sampled state of unt j, and <> n (11) refers to the expextaton value over the tranng data. To smplfy the hardware desgn, we approxmate the ntegral term n (11) as sj ŝ j ϕ 1 (s)ds (s j +ŝ j )(s j ŝ j ) (12) The tranng rules for w j and a j thus requre only addng and multplyng calculaton of local unts states. ŝ j 3 Demonstraton: Artfcal Data Two-dmensonal data were generated to probe and to compare the performance of RBM and CRBM on analogue data (Fg.1(a)). The data nclude two clusters

A Contnuous Restrcted Boltzmann Machne 361 Fg. 1. (a) Artfcal-generated analogue tranng data. (b) Reconstructon by the traned RBM (c) Reconstructon by the traned CRBM (d)learnng trace of a j. of 200 data. Fgure 1(b) shows ponts reconstructed from 400 random nput data after 20 steps of Gbbs samplng by an RBM wth 6 hdden unts, after 4000 tranng epochs. The RBM s tendancy to generate data n symmetrc patterns s clear. Fgure 1(c) shows the same result for a CRBM wth four hdden unts, η w =1.5,η a = 1 and σ =0.2 for all unts. The evoluton of the gan factor a j of one vsble unt s shown n Fg.1(d) and dsplays a form of autonomous annealng, drven by (11), ndcatng that the approxmaton n (12) leads to sensble tranng behavour n ths stylsed, but non-trval example. 4 Demonstraton: Real Heart-Beat (ECG) Data To hghlght the mproved modellng rchness of the CRBM and to gve these results credence, a CRBM wth four hdden unts was traned to model the ECG data used n [3] and [10]. The ECG trace was dvded nto one tranng dataset of 500 heartbeats and one test dataset of 1700 heartbeats, each of 65 samples. The 500 tranng data contan sx Ventrcular Ectopc Beats (VEB), whle the 1700 test data contan 27 VEBs. The CRBM was traned for 4000 epochs wth η w =1.5, η a =1,σ =0.2 for vsble unts and σ =0.5 for hdden unts. Fgure 2 shows the reconstructon by the traned CRBM, from ntal vsble states as 2(a) an observed normal QRS complex 2(b) an observed typcal VEB, after 20 subsequent steps of unclamped Gbbs samplng. The CRBM models both forms of heartbeat successfully, although VEBs represent only 1% of the

362 H. Chen and A. Murray Fg. 2. Reconstructon by the traned CRBM wth nput of (a) a normal QRS and (b) a typcal VEB. (c) The receptve felds of the hdden bas and the four hdden unts tranng data. Followng [3], Fg.2(c) shows the receptve felds of the hdden bas unt and the four hdden unts. The Bas unt codes an average normal QRS complex, and H3 adds to the P- and T- waves. H1 and H2 drve a small horzontal shft and a magntude varaton of the QRS complex. Fnally, H4 encodes the sgnfcant dp found n a VEB. The most prncpled detector of VEBs n test data s the log-lkelhood under the traned CRBM. However, loglkelhood requres complcated hardware. Fgure 2(c) suggests that the actvtes of hdden unts may be usable as the bass of a smple novelty detector. For example, the actvtes of H4 correspondng to 1700 test data are shown n Fg.3, wth the nose source n equaton (2) removed. The peaks ndcate the VEBs clearly. VL n Fg.3 ndcates the mnmum H4 actvty for a VEB and QH marks the datum wth maxmum H4 actvty for a normal heartbeat. Therefore, Fg. 3. The actvtes of H4 correspondng to 1700 test data.

A Contnuous Restrcted Boltzmann Machne 363 even a smple lnear classfer wth threshold set between the two dashed lne wll detect the VEBs wth an accuracy of 100%. The margn for threshold s more than 0.5, equvalent to 25% of the total value range. A sngle hdden unt actvty n a CRBM s, therefore, potentally a relable novelty detector and t s expected that layerng a supervsed classfer on the CRBM, to fuse the hdden unt actvtes, wll lead to mproved results. 5 Concluson The CRBM can model analogue data successfully wth a smplfed MCD rule. Experments wth real ECG data further show that the actvtes of the CRBM s hdden unts may functon as a smple but relable novelty detector. Component crcuts of the RBM wth the MCD rule have been successfully mplemented [5][11]. Therefore, the CRBM s a potental contnuous stochastc model for VLSI mplementaton and embedded ntellgent systems. References 1. Hnton, G.E.: Tranng Products of Experts by mnmsng contrastve dvergence. Techncal Report: Gatsby Comp. Neuroscence Unt, no.tr2000-004. (2000) 2. Smolensky, P.: Informaton processng n dynamcal systems: Foundatons of harmony theory. Parallel Dstrbuted Processng. Vol.1 (1986) 195 281 3. Murray, A.F.: Novelty detecton usng products of smple experts-a potental archtecture for embedded systems. Neural Networks 14(9) (2001) 1257 1264 4. Teh, Y.W., Hnton, G.E.: Rate-coded Restrcted Boltzmann Machne for face recognton. Advances n Neural Informaton Processng Systems, Vol.13 (2001) 5. Woodburn, R.J., Astaras, A.A., Dalzell, R.W., Murray, A.F., McNell, D.K.: Computng wth uncertanty n probablstc neural networks on slcon. Proc. 2nd Int. ICSC Symp. on Neural Computaton. (1999) 470 476 6. Movellan, J.R.: A learnng theorem for networks at detaled stochastc equlbrum. Neural Computaton, Vol.10(5) (1998) 1157 1178 7. Movellan, J.R., Mnero, P., Wllam, R.J.: A Monte-Carlo EM approach for partally observable dffuson process: Theory and applcatons to neural network. Neural Computaton (In Press) 8. Marks, T.K., Movellan, J.R.: Dffuson networks, products of experts, and factor analyss. UCSD MPLab Techncal Report, 2001.02. (2001) 9. Hopfeld, J.J.: Neurons wth graded response have collectve computatonal propertes lke those of two-state neurons. Proc. Nat. Academy of Scence of the USA, Vol.81(10). (1984) 3088 3092 10. Tarassenko, L., Clfford G.: Detecton of ectopc beats n the electrocardogram usng an Auto-assocatve neural network. Neural Processng Letters, Vol.14(1) (2001) 15 25 11. Fleury, P., Woodburn, R.J., Murray, A.F.: Matchng analogue hardware wth applcatons usng the Products of Experts algorthm. Proc. IEEE European Symp. on Artfcal Neural Networks. (2001)63 67