ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble



Similar documents
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

What is Candidate Sampling

Forecasting the Direction and Strength of Stock Market Movement

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

An Interest-Oriented Network Evolution Mechanism for Online Communities

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Lecture 2: Single Layer Perceptrons Kevin Swingler

Logistic Regression. Steve Kroon

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Gender Classification for Real-Time Audience Analysis System

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

A novel Method for Data Mining and Classification based on

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Improved SVM in Cloud Computing Information Mining

Support Vector Machines

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

8 Algorithm for Binary Searching in Trees

Fault tolerance in cloud technologies presented as a service

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

An Alternative Way to Measure Private Equity Performance

Can Auto Liability Insurance Purchases Signal Risk Attitude?

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Single and multiple stage classifiers implementing logistic discrimination

Learning from Multiple Outlooks

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

1 Example 1: Axis-aligned rectangles

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

The Greedy Method. Introduction. 0/1 Knapsack Problem

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Bayesian Cluster Ensembles

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

A Secure Password-Authenticated Key Agreement Using Smart Cards

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Performance Analysis and Coding Strategy of ECOC SVMs

HARDWARE SPECIALIZATION OF MACHINE-LEARNING KERNELS: POSSIBILITIES FOR APPLICATIONS AND POSSIBILITIES FOR THE PLATFORM DESIGN SPACE

L10: Linear discriminants analysis

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

HowHow to Find the Best Online Stock Broker

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Financial market forecasting using a two-step kernel learning method for the support vector regression

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Marginal Returns to Education For Teachers

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Fuzzy Set Approach To Asymmetrical Load Balancing In Distribution Networks

Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques

Mining Multiple Large Data Sources

Politecnico di Torino. Porto Institutional Repository

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Optimal Choice of Random Variables in D-ITG Traffic Generating Tool using Evolutionary Algorithms

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

A Multi-mode Image Tracking System Based on Distributed Fusion

A DATA MINING APPLICATION IN A STUDENT DATABASE

Modeling and Analysis of 2D Service Differentiation on e-commerce Servers

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

Realistic Image Synthesis

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Dynamic Pricing for Smart Grid with Reinforcement Learning

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

Sketching Sampled Data Streams

Development of an intelligent system for tool wear monitoring applying neural networks

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

A Fast Incremental Spectral Clustering for Large Data Sets

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Ensembling Neural Networks: Many Could Be Better Than All

Planning for Marketing Campaigns

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

The circuit shown on Figure 1 is called the common emitter amplifier circuit. The important subsystems of this circuit are:

AD-SHARE: AN ADVERTISING METHOD IN P2P SYSTEMS BASED ON REPUTATION MANAGEMENT

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

Research Article Integrated Model of Multiple Kernel Learning and Differential Evolution for EUR/USD Trading

A Novel Auction Mechanism for Selling Time-Sensitive E-Services

Credit Limit Optimization (CLO) for Credit Cards

Research Article QoS and Energy Aware Cooperative Routing Protocol for Wildfire Monitoring Wireless Sensor Networks

Time Domain simulation of PD Propagation in XLPE Cables Considering Frequency Dependent Parameters

Forecasting and Modelling Electricity Demand Using Anfis Predictor

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Transcription:

1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In ths paper, we propose to use classfer ensemble (CE) as a method to enhance the robustness of machne learnng (ML) kernels n presence of hardware error. Dfferent ensemble methods (Baggng and Adaboost) are explored wth decson tree (C4.5) and artfcal neural network (ANN) as base classfers. Smulaton results show that ANN s nherently tolerant to hardware errors wth up to 1% hardware error rate. Wth smple majorty votng scheme, CE s able to effectvely reduce the classfcaton error rate for almost all tested data sets, wth maxmum test error reducton of 48%. For tree ensemble, Adaboost wth decson stump as weak learner gves best results; whle for ANN, baggng and boostng outperform each other dependng on data set. I. INTRODUCTION Ths project s motvated by recently works appeared n machne learnng (ML) on slcon [1]. There s a growng nterest n VLSI and crcut area to effcently brng ML kernels onto slcon, largely due to the performance and power lmtaton of software only approaches. On the other hand, n deep sub-mcro era, devces exhbt statstcal behavors whch degrade the system relablty. To reduce the energy consumpton of crcuts, sub/near threshold computng has been proposed to offer 1X power reducton at the expense of large delay varaton as shown n Fg. 1. To enhance the relablty and robustness of ML kernels on slcon, Verma et al. propose to utlze ML algorthm to effcently learn the error behavor and make correct predcton/classfcaton n presence of hardware errors. Km [2] et al. shows that some ML algorthm nherently has tolerance to hardware errors. These work motvate us to look nto methods to effectvely enhance the robustness of ML kernels n presence of hardware errors. Classfer ensemble (CE, also referred to as Multple Classfer System) have been employed to enhance the performance of sngle classfer system [3], [4]. As summarzed n [5] [8], popular ensemble methods nclude: Baggng, Adaboost, Bayesan votng, Random forest, Rotaton forest, Error correctng output codng, etc. In terms of decson combnng, [9], [1] summarzes typcal classfer fuson methods. Therefore, n ths paper, we explore the dea of usng CE to enhance the robustness of ML kernels. Dfferent ensemble methods (Baggng and Adaboost) are explored wth decson tree (C4.5) and artfcal neural network (ANN) as base classfers. Smulaton results show that ANN s nherently tolerant to hardware errors wth up to 1% hardware error rate. Wth smple majorty votng scheme, CE s able to effectvely reduce the classfcaton error rate for almost all tested data sets, wth maxmum test error reducton of 48%. For tree ensemble, Adaboost wth decson stump as weak learner gves best results; whle for ANN, baggng and boostng outperform each other dependng on data set. The rest of the report s organzed as follows: secton 2 gves background of the CE methods we use n the project, secton 3 presents error model, and smulaton setup; secton 4 presents smulaton results wth concluson provded n secton 5. A. Ensemble Methods II. BACKGROUND We gve a bref overvew of methods to construct classfer ensembles. 1) Average Bayes Classfer: In the average Bayes settng, every classfer output s nterpreted as a posteror probablty of class membershp, condton on the nput and the hypothess. h(x) = P (f(x) = y x, h) (1) Here we assume the hypothess follow a dstrbuton n the hypothess space H. If the hypothess space s small, t s possble to enumerate all possble classfers (h(x)). The fnal output s the average of these hypothess, weghted by the posteror probablty of h(x), as shown n the followng equaton: P (f(x) = y S, x) = h H h(x)p (h S) (2) Fg. 1. Wth reduced supply voltage, energy effcency s mproved but delay varaton s larger. Usng Bayes rule, we can wrte the posteror probablty P (h S) as:

2 P (h S) P (S h)p (h) (3) The practcal dffculty n mplementng ths method les n that when the hypothess space s large, t s dffcult to enumerate all h(x), also t s not always possble to know the pror of each of the hypothess. Fg. 2 provdes a llustraton of the algorthm. The true mappng whch the algorthm tres to learn s denoted as f(x). To construct the ensemble, all hypothess h(x) n the space H s learned, and the fnal mappng s a weghted average based on (2). Fg. 2. Average Bayes Classfer 2) Baggng: Bootstrap aggregatng s a popular method to construct ensembles. The basc dea s to frst generate many tranng sets from orgnal tranng samples by random samplng wth replacement, then tran multple classfers, each based on one of the tranng sets, as shown n Fg. 3. By random samplng wth replacement, each tranng set contans on average 63.3% of the orgnal tranng set. The fnal decson s made by takng the majorty votng of ndvdual weak classfers. Baggng has been shown to mprove the performance of unstable classfers, such as neural networks, decson trees etc. 3) Adaptve Boostng: Adaboost s another popular method for ensemble generaton. The basc dea can be depcted n the Fg. 4. The tranng has T teratons; each teraton wll produce a new weak classfer. The basc dea s that after each teraton step, the examples are re-weghted so that mssclassfed examples get hgher weght. In the subsequent teraton step, the tranng process wll focus more on classfyng the mss-classfed samples from prevous teraton. Fg. 5 gves the algorthm flow of Adaboost. Note that the basc dea of Adaboost s very smlar wth Baggng, wth two notable dfferences: 1) n generatng dfferent tranng sample, Baggng uses random sample wth replacement whle Adaboost uses the accuracy of prevous classfers to gude the selecton of subsequent tranng samples; and 2) Baggng uses smple majorty votng to generate decson whle Adaboost uses weghted average. It can be shown that the selecton of tranng sample dstrbuton and weghts n Adaboost wll mnmze an exponental loss functon defned as Fg. 3. Baggng Fg. 5. Adaboost algorthm L(f(x), y) = exp( y f(x )) (4) Proof: We are tryng to fnd a classfer of the form: H(x) = M α f m 1 (x) m=1 to mnmze loss n (4). At step m, we have H m (x) = H m 1 (x) + α m f m (x) The mnmzaton problem can be formulated as:

3 Fg. 4. Adaboost [α m, f m ] = arg mn [α m, f m ] = exp( y (H m 1 (x) + αf(x))) arg mn { N exp( y H m 1 (x)) exp(α)1[y f(x)] + exp( y H m 1 (x)) exp( α)1[y = f(x)]} [α m, f m ] = arg mn {exp(α) N exp( y H m 1 (x)) 1[y f(x)] + exp( α)( exp( y H m 1 (x)) exp( y H m 1 (x)) exp( α)1[y = f(x)])} Ths s equvalent to [α m, f m ] = arg mn {exp(α) N w m 1 () 1[y f(x)] + exp( α)(1 w m 1 ()) exp( α)1[y = f(x)])} where w m 1 () = smplfy the optmzaton nto: exp( y H m 1(x) exp( y H m 1(x)) [α m, f m ] = arg mn {exp( α) + [exp(α) exp( α)], we can further w m 1 () 1[y f(x)]} The optmal α and f can be solved by settng the dervatve wth respect to them to, we can thus obtan the optmal α m, f m as: f m = arg mn f α m = 1 2 log(1 ε m ε m ) w m 1 () 1[y f(x )] where ε m = N w m 1 () 1[y f(x)]. Therefore, we can see that n Adaboost, the dstrbuton and weght are selected such that the loss functon n (4) s mnmzed. 4) Injecton Randomness: The fnal method to construct ensemble s by njectng randomness nto the classfer. For good ensemble performance, the weak classfer used to construct the ensemble need to be unstable. Good unstable classfers are trees or neural networks. In the random njectng method, the ensemble s constructed by smply changng the ntal the weghts of NN, or choosng randomly the features used to splt the trees. B. Decson Combnaton The second mportant aspect of CE s to decde how to combne the output of each classfer to obtan the fnal decson. Many decson combnaton methods have been proposed, as shown n Fg. 6. If the output of each classfer s not a hard decson but a contnuous measure. We can use averagng method to get the ensemble output, popular averagng method ncludes mean, medan, weghted mean etc. If the output of classfer s a dscrete number, votng methods can be used. The smplest votng methods s majorty votng, whch takes the fnal output as the class most of the classfers agree on. Weghted average method can also be used to weght the decson of each classfer based on ther accuracy measure as n the case of Adaboost. A. Hardware error smulaton III. METHODOLOGY Error njecton s performed n smulaton level snce we do not have realstc hardware realzaton of the classfers. The error njecton presents dffculty due to at least two

4 TABLE I EXPERIMENTAL SETUP Base Classfer Ensemble Methods Combnaton ANN Baggng Majorty votng C4.5 Boostng Weghed votng Dec. stump Intal weghts Baggng, smple majorty votng scheme s employed. We use 4 data sets from UCI machne learnng repostory. For each data set, 1 fold cross-valdaton methods are used to evaluate ther performances. Fg. 6. Decson combnaton methods requrements: 1) the mappng from nput space to output error space need to have suffcent randomness, other wse t wll be trval to remove the error by addng some bas to the network output. 2) for same nput pattern, t wll always generate the same random error pattern. To solve ths ssue, we use a nput pattern dependent random number generator for error njecton, as shown n Fg. 7. When presented wth a feature vector, the random generator uses each dmenson as a seed to generate a Gaussan random number; the random number from dfferent dmenson are then added and scaled to provde an output err N(, 1). Ths method ensures that for dfferent feature vector, the generated random errors are dfferent, whle at the same tme same feature vector always results n same error value. The err s then used to nject hardware errors nto trees and ANNs. For decsons tree, we generate two threshold T hreshold 1 and T hreshold 2 for each node, when err falls nto the nterval [T hreshold 1, T hreshold 2 ],the decson of the node s flpped. For ANNs, smlar thresholds are generated for each weght, and f the err falls nto [T hreshold 1, T hreshold 2 ], the weght matrx s perturbed to smulate hardware defects. We can control the error rate by controllng the thresholds,.e. hgher error rate can be acheved f T hreshold 2 T hreshold 1 s ncreased. IV. RESULTS Fg. 8 shows the test error rate vs. ensemble sze for all evaluated ensemble methods. For Baggng wth decson trees, test error rate decreases as ensemble sze ncreases for both hardware error free (red) and hardware error njected (blue) case. However, hardware error degrade the performance of the ensemble and ths degradaton cannot be effectvely compensated for wth ncreased ensemble sze. For Adaboost wth decson tree, Fg. 8 ndcates that the ensemble suffer from over-fttng. As ensemble sze grows, the error rate frst decreases, then starts to ncrease at ensemble sze of 3 and 2 for error free and erroneous case. Adaboost wth decson stump gves best performance n the famly of tree ensembles, the error njected ensemble almost acheves the same error reducton as the error free ensemble. For ANN wth baggng, both error njected and error free ensemble acheve effectve reducton of test error rate wth ncreasng ensemble sze. However, for ANN wth Adaboost, over-fttng starts to occur at ensemble sze of 4 and 3 for error free and erroneous ensemble, respectvely. Fg. 7. Error njecton methodology B. Experment Setup Table I shows the experment setup. In ths project, we explore Baggng and Adaboost wth ANN and C4.5 decson tree and decson stump (a decson tree wth depth 1) as base classfers. When tranng ANN, random ntal weghts are used to further ntroduce dversty nto the ensemble. For Adaboost, we use weghted votng as shown n Fg. 5, and for Fg. 8. Results: test error rate vs. ensemble sze Fg. 9 shows the result for 4 data set from UCI machne learnng repostory at dfferent njected hardware error rate. From the fgure several observaton can be made: 1) In terms of nherent error tolerance, ANN s better than decson trees. Ths can be seen from Fg. 9 by notng that at 3% error rate, erroneous C4.5 already have severely degraded performance at data set 1 and 2, but ANN s able to mantan smlar performance regardless of error njecton

5 Error rate of tree ensemble Error rate of NN ensemble 3%.4.2 C4.5 error free C4.5 error njected C4.5 baggng C4.5 boostng Dec stump boostng.4.2 NN error free NN error njected NN baggng NN boostng 1%.4.4.2.2 2%.4.4.2.2 Fg. 9. Results: test error rate for dfferent data set at error rate of 3%, 1% and 2% tll 1% percent error rate. The dfference can be partally explaned by notng that for decson trees, at each node a hard decson s made, so error have hgh chance to propagate to output, whle for ANN, each layer does not make a hard decson but use a sgmod functon to suppress the output, whch mght help reduce the effect of node errors. 2) For tree ensemble, Adaboost wth decson stump always gve best performance, consstent wth Fg. 8. Boostng decson tree tends to suffer from over-fttng at low error rate. Ths problem s mtgated at hgh error rate due to the fact that at hgh error rate, the ensemble sze tends to be reduced. 3) For ANN ensemble, Adaboost and Baggng can out perform each other dependng on data set, and there s no consstent behavor across dfferent error rate. The over-fttng problem of ANN s less severe compared wth trees, and smlarly the over-fttng gets mtgated at hgh error rate due to the reducton of ensemble sze. V. CONCLUSION AND FUTURE WORK In ths paper, we show that CE s an effectve method to enhance the robustness of ML hardware. Baggng and Adaboost are explored wth decson tree and ANN as base classfers. Smulaton results show that ANN s nherently tolerant to hardware errors wth up to 1% hardware error rate. Wth smple majorty votng scheme, CE s able to effectvely reduce the classfcaton error rate for almost all tested data sets, wth maxmum test error reducton of 48%. For tree ensemble, Adaboost wth decson stump as weak learner gves best results; whle for ANN, baggng and boostng outperform each other dependng on data set. In the future, we can extend the framework to nclude more ensemble methods, to handle mult-class problems, to mtgate the over-fttng problem by usng regularzaton. Also t wll be helpful to mplement part of the algorthm n hardware to evaluate the performance of CE on ML kernels. ACKNOWLEDGMENT The author would lke to thank Prof. Hasegawa-Johnson and Sujeeth Subramanya Bharadwaj for ther help and gudance. REFERENCES [1] N. Verma, K. H. Lee, K. J. Jang, and A. Shoeb, Enablng system-level platform reslence through embedded data-drven nference capabltes n electronc devces, n Acoustcs, Speech and Sgnal Processng (ICASSP), 212 IEEE Internatonal Conference on, 212, pp. 5285 5288. [2] J. Cho, E. P. Km, R. A. Rutenbar, and N. R. Shanbhag, Error reslent mrf message passng archtecture for stereo matchng, n Sgnal Processng Systems (SPS), 213 IEEE Workshop on, 213, pp. 348 353. [3] L. yng Yang, Z. Qn, and R. Huang, Desgn of a multple classfer system, n Machne Learnng and Cybernetcs, 24. Proceedngs of 24 Internatonal Conference on, vol. 5, 24, pp. 3272 3276 vol.5. [4] G. Gacnto, F. Rol, and G. Fumera, Desgn of effectve multple classfer systems by clusterng of classfers, n Pattern Recognton, 2. Proceedngs. 15th Internatonal Conference on, vol. 2, 2, pp. 16 163 vol.2. [5] T. G. Detterch, Ensemble methods n machne learnng, n MULTI- PLE CLASSIFIER SYSTEMS, LBCS-1857. Sprnger, 2, pp. 1 15. [6] J. Rodrguez, L. Kuncheva, and C. Alonso, Rotaton forest: A new classfer ensemble method, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol. 28, no. 1, pp. 1619 163, 26. [7] D. Optz and R. Macln, Popular ensemble methods: An emprcal study, Journal of Artfcal Intellgence Research, vol. 11, pp. 169 198, 1999. [8] L. B. Statstcs and L. Breman, Random forests, n Machne Learnng, 21, pp. 5 32.

[9] L. Xu, A. Krzyzak, and C. Suen, Methods of combnng multple classfers and ther applcatons to handwrtng recognton, Systems, Man and Cybernetcs, IEEE Transactons on, vol. 22, no. 3, pp. 418 435, 1992. [1] T. K. Ho, J. Hull, and S. Srhar, Decson combnaton n multple classfer systems, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol. 16, no. 1, pp. 66 75, 1994. 6