Large Scale Extreme Learning Machine using MapReduce



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Forecasting the Direction and Strength of Stock Market Movement

What is Candidate Sampling

Lecture 2: Single Layer Perceptrons Kevin Swingler

An Interest-Oriented Network Evolution Mechanism for Online Communities

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Support Vector Machines

Improved SVM in Cloud Computing Information Mining

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

A Programming Model for the Cloud Platform

Performance Analysis and Coding Strategy of ECOC SVMs

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Project Networks With Mixed-Time Constraints

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Single and multiple stage classifiers implementing logistic discrimination

Mining Multiple Large Data Sources

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Can Auto Liability Insurance Purchases Signal Risk Attitude?

A DATA MINING APPLICATION IN A STUDENT DATABASE

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

L10: Linear discriminants analysis

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

1 Example 1: Axis-aligned rectangles

This circuit than can be reduced to a planar circuit

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Recurrence. 1 Definitions and main statements

Distributed Column Subset Selection on MapReduce

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Modelling of Web Domain Visits by Radial Basis Function Neural Networks and Support Vector Machine Regression

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

IMPACT ANALYSIS OF A CELLULAR PHONE


Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Dynamic Resource Allocation for MapReduce with Partitioning Skew

A Simple Approach to Clustering in Excel

An Alternative Way to Measure Private Equity Performance

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Conferencing protocols and Petri net analysis

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

BERNSTEIN POLYNOMIALS

A Secure Password-Authenticated Key Agreement Using Smart Cards

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

A Performance Analysis of View Maintenance Techniques for Data Warehouses

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Financial market forecasting using a two-step kernel learning method for the support vector regression

Lei Liu, Hua Yang Business School, Hunan University, Changsha, Hunan, P.R. China, Abstract

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Learning from Multiple Outlooks

Watermark-based Provable Data Possession for Multimedia File in Cloud Storage

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Automated Network Performance Management and Monitoring via One-class Support Vector Machine

CONSTRUCTING A SALES FORECASTING MODEL BY INTEGRATING GRA AND ELM:A CASE STUDY FOR RETAIL INDUSTRY

Offline Verification of Hand Written Signature using Adaptive Resonance Theory Net (Type-1)

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

PERRON FROBENIUS THEOREM

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Audio Data Mining Using Multi-perceptron Artificial Neural Network

A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION

Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

New Approaches to Support Vector Ordinal Regression

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques

A novel Method for Data Mining and Classification based on

Gender Classification for Real-Time Audience Analysis System

Loop Parallelization

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Genetic Algorithm Based Optimization Model for Reliable Data Storage in Cloud Environment

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

7.5. Present Value of an Annuity. Investigate

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Transcription:

Large Scale Extreme Learnng Machne usng MapReduce Large Scale Extreme Learnng Machne usng MapReduce * L Dong, Pan Zhsong, 3 Deng Zhantao, 4 Zhang Yanyan Insttute of Command Automaton, PLA Unversty of Scence and echnology, anjng, 0007, Jangsu Provnce, PR Chna donggeat006@yahoo.com.cn, hotpzs@hotmal.com, dengzhantao@sna.cn zhyany@gmal.com Abstract Extreme learnng machne (ELM) s a new method n neural networks, contrasted wth conventonal gradent-based algorthms such as BP, It can remarkably shorten the tranng tme, all the learnng process s done by only once. he algorthm cannot deal wth large scale dataset due to the memory lmtaton. Here we mplement a large scale ELM based on MapReduce, the new parallel programmng model. he foundaton of the algorthm s parallel matrx multplcaton, whch has been dscussed somewhere else, but we gves out the whole computaton and I/O cost n detals. Experments on large scale dataset show scalablty of ths method.. Introducton Keywords: ELM, MapReduce, Large scale We are now undergong the ncreasng of massve nformaton and people have emergng demand to deal wth bg data. Recently, large scale machne learnng technology receves hghly attenton as ts performance of mnng not so bg data n past decades. Many tradtonal algorthms could fal dealng wth large scale data smply because they cannot load all the data nto the memory at once, whle these algorthms are desgned based on the hypothess that all data can be read. here are roughly two classes of approaches whch can work, streamng data to an onlne learnng algorthm and parallelzng a batch-learnng algorthm []. hs paper focuses on the second approach. MapReduce s a new parallel programmng model publshed by Google at 004 [], whch can make programs run at large cluster bult up by common computers as well as mult-cores system. hs technology smplfed processng bg data through supplyng hgh level nterfaces, hdng system related detals et al. Usng MapReduce to parallelze machne learnng algorthms begns at Chu et al [3]. hey concluded that algorthms that ft the Statstcal Query model can be wrtten n summaton form whch can be easly parallelzed. Qng He et al [4] gves out several popular parallel classfcaton algorthms based on MapReduce. Cu et al [5] used MapReduce to parallelze an algorthm for fndng communty n a moble socal network. here exsts a few mplementatons of MapReduce, but Apache Hadoop [6] becomes the de-facto standard verson [7] and get wdely used n ndustry [8]. he man experments n ths paper are taken on ths platform.. Extreme learnng machne Extreme learnng machne (ELM) s proposed by Guang-bn Huang at 004 [9], and later the whole detals and expermental results are publshed at 006 [0].It shows that the ELM has great advantages at tranng speed compared wth tradtonal backpropagaton (BP) algorthm. And ELM has better generalzaton performance n most cases. he new research [] shows that ELM and SVM have the same optmzaton objectve functon, whle the formal has mlder constrants. Here gves the bref revew of ELM. Based on rgorously proved n theory that the nput weghts and the hdden layer bas of Snglehdden Layer Forward etworks (SLF) can be chosen randomly, Huang proposed ELM, whch shows how output weghts (lnkng the hdden layer to the output layer) could be analytcally determned. Internatonal Journal of Dgtal Content echnology and ts Applcatons(JDCA) Volume6,umber0,ovember 0 do:0.456/jdcta.vol6.ssue0.7 6

Large Scale Extreme Learnng Machne usng MapReduce For samples ( x, t ),where x [,,..., ] n x x xn R, t [,,..., ] m t t tm R,a SLF has hdden nodes and actve functon g( x) can be expressed as: g ( x) g( wx b) o, j,... () j j where w [,,..., ] w x wm s the weghts connect the nput layer and the th hdden node, [,,..., ] m s the weghts connect the output layer and the th hdden node, b s the bas of th hdden node. gx can be Sgmod or RBF, or even many o s the output of sample j. ( ) j nondfferentable actvaton functons. he tranng error s formula () can be wrtten as Where, j o t, when totally fttng, j j H () H ( w,... w, b,... b, x,... x ) gw ( xb) gw ( x b ) g( wx b) g( w x b ) t, t m m H s called hdden layer output matrx, ts the output of th hdden node. ow the weght w and bas b have generated randomly, the task s to fnd proper. In most cases, lner Equaton () has no solutons, Accordng to the error mnmal prncple ^ mn H ( w,... w, b,... b, x,... x ), the smallest norm least squares soluton of the above lnear system s : ^ H (3) Where H s the Moore Penrose generalzed nverse of matrx H, whch can be acqured by sngular value decomposton (SVD), and also can be computed by the flowng formula, If matrx H H s nonsngular ( H H H) H (4) Or f matrx HH s nonsngular H H ( HH ) (5) 63

Large Scale Extreme Learnng Machne usng MapReduce But f the above two matrces are sngular, accordng to rdge regresson theory, a postve value added to dagonal of orgnal matrx could help to get the stable soluton. So, formula (4) (5) can be wrtten as I H ( H H) H (6) or I H H ( HH ) (7) Huang et al [9] ponts out that calculated by above optmzaton problem. H s consstent wth the followng 3. Parallel Extreme learnng machne 3.. MapReduce mn H (8) When dataset exceeds the memory lmtaton, the hdden layer output matrx H cannot be loaded once, Moore Penrose generalzed nverse of matrx H and object weghts are all can t be calculated by above formulas. Lang et al [] developed an onlne sequental learnng algorthm called OS-ELM, makng the tranng data arrve one by one or chunk by chunk. hs method partly loosens the memory bottle neck when all data can be stored n one computer. When the data scale up to dstrbuted storage, such method may have problem to access data over machnes, and usng only one processng unt may brng up neffcency n ths case. Due to the fact that all data s stored n multple machnes, dstrbuted computng s a good choce. And movng processng to data, namely localty s the hghlght of MapReduce [3]. User mplement MapReduce programs should specfy two operatons: Map and Reduce. Map takes key/value pars as nput and generates mmedate results n the same form; Reduce takes all values whch have the same key, and processes n further step. he data flows and types can be expressed as followng: Map (k,v) lst(k,v) Reduce (k,lst(v)) lst(v) It s to be notced that the data type of Map s output must be the same as Reduce s nput. All the detals of communcaton and synchronzaton are hdden by system as well as fal-torrent mechansm. Many Maps and Reduces can be run smultaneously over machnes. 3. Matrx Multplcaton When the dataset becomes large, for example we have 0 7 samples, t s dffcult to calculate Moore Penrose generalzed nverse of matrx H through (6) or (7).However, the foundaton of them s matrx multplcaton. Sun et al [4] summarzed three schemes of matrx multplcaton n the background to solve MF. Part of them wll be used here. mn nk For common matrx A, B, the basc operaton of multplcaton AB s to dvde A as rows and B as columns, and the element of result matrx s the nner producton of two vectors. 64

Large Scale Extreme Learnng Machne usng MapReduce a a AB b b bk am ab abk a b a b m k m MapReduce can calculate the rows of scheme algorthm-. (9 AB n parallel wthout Reduce operaton. We call ths Map key s the row d of A, value s the correspondng row of A newvalue=value*b wrte(key,newvalue)to HDFS Fgure. Algorthm- Map Algorthm- can work well f m n, and matrx B s shared across over machnes. Large matrx A and the result both stored n rows. Or reversely, calculate the columns of B n parallel, and share A across over machnes. Algorthm- could fal f the two matrces are both large, any one cannot be shared n memory; there s a dfferent dvson method. Matrx A s dvded as columns and B as rows; the result s sum of matrces, each one s the outer producton of two vectors. We call ths scheme algorthm-. AB a a b a b b an bn 0 Where denotes outer producton a b a bk a b a b a b m k m Algorthm- works well when n m, n k. Matrx A s stored n columns, and B n rows. he result can be stored n ether rows or columns. he outer producton and summarzaton both need a MapReduce job. he detals s gven as fgure 5. 65

Large Scale Extreme Learnng Machne usng MapReduce Map key s the row d of A or the column d of B, value s the correspondng row of A or column of B newvalue= row(a) or column(b) Pass(key,newvalue) to phase- Reduce Fgure. Algorthm- phase-i Map Reduce phase-i Map s output newvalue= a b Wrte(key,newvalue)to HDFS Fgure 3. Algorthm- phase-i Reduce Map phase-i Reduce s output Pass(key,value) to phase-ii Reduce Fgure 4. Algorthm- phase-ii Map Reduce phase-ii Map s output newvalue=sum(lst[values]) Wrte(key,newvalue) to HDFS Fgure 5. Algorthm- phase-ii Reduce 3.3 Parallel ELM It s crtcal to choose dfferent matrx multplcaton schemes accordng to partcular demand of algorthm. In ELM, the hdden layer output matrx H commonly has far more rows than columns,, the formal s the number of samples and the later s number of hdden nodes, whch s controllable. A MapReduce job read fles by lnes through Maps, and many Maps access fles H n rows. he multplcaton H H n (6) fts algorthm- perfectly, whle HH n (7) s hard to calculate. What s more, HH s szed by, whch s dffcult to get the nverse matrx. Storng matrx H n rows s equvalent to store H n columns. In ths case, algorthm- could be reduced to one job, the map calculates outer smultaneously. So t s reasonable to store large matrx producton of a row and ts self, and the reduce summarzes. Denote C ( I H H),C, as the number of hdden nodes s controllable,the nverse operaton s easy to mplement n memory through exstng tools such as LAPACK. H CH, H s stored n columns, C can be shared n memory, ths multplcaton can be well done usng algorthm-, one can smultaneously calculate the nner producton of each column of H and C ;the result H s stored n columns. 66

Large Scale Extreme Learnng Machne usng MapReduce H, c s the orgnal object values, c s the dmenson, for regresson equals and classfcaton equals the number of class labels. c, s stored n rows, the multplcaton fts algorthm- perfectly. In predcaton phase, we need to calculate the predcaton matrx Y., can be shared n memory, ths multplcaton can be done usng algorthm-. In testng phase, we get object matrx and predcaton Y are both stored n rows, here we need a MapReduce job to compare the predcted value and actual value of each sample, and then calculate the fnal RMSE for regresson or success rate for classfcaton. 3.4 Cost analyss Recent researches begn to consder dsk a cost and network cost when dealng wth large scale data usng MapReduce [5], nstead of only evaluatng the tradtonal computaton cost. Yu et al [6] ponts out the tranng tme actually contans two parts, tme to run data n memory and tme to access data from dsk. In ELM, the formal part s manly the tmes of multplcaton. Assumng we have large memory, and use formula (6) as the tranng rule, theoretcal cost of ELM s ( C ) tmes multplcaton and loadng samples. If there are k mapers and reducers n the MapReduce runnng system, computaton cost s ( C ) / k tmes multplcaton. However, evaluatng the network cost and I/O cost s a bt more complcated. ot all mapper s output s shuffled to reducers through network. Actually, MapReduce framework mnmzes the network cost by assgnng reduce tasks to the machnes whch have already stored the requred data [3].o smplfy the estmatng, we could assume the rato of actual shuffled data s a constant r. In computng H H, the shuffled data s r ; In computng H, there are no reduces; In computng, the shuffled data of two phases s r( C C ). he total network cost s H etwork[( C C ) r / k] () where etwork (.) means the network cost whch s manly affected by transfer speed. All the mmedate results are stored on dsks, the readng tasks are done by mapers and wrtng tasks by reducers n most cases. he I/O cost of MapReduce jobs n the tranng phase of parallel ELM s c Read(3 C ) k Wrte C C k ( ) (3) where Read (.) and Wrte (.) means the readng and wrtng tme respectvely. able shows above costs of jobs nvolved wth tranng. he major cost comes from computng, but network and dsk I/O s also tme-consumng. 67

Large Scale Extreme Learnng Machne usng MapReduce able. ranng Cost for Parallel ELM 7 0, 0, Data: MB C me: Sec Shuffle Read Wrte me H H 0.99 3940.8 69 CH 0 390 4374 47 H p- 976 6760 0774 6 H p- 0.86 673 0.85 90 Other costs such as solvng nverse matrx and loadng small varables are neglgble when the number of hdden layer nodes s small. 4 Experments 4. Experments setup All experments are conducted on a cluster of 8 common servers; each has a quad-core.8ghz CPU, 8GB memory and B dsk, wth one ggabt Ethernet connected. he operaton system s Lnux server, nstalled wth Hadoop 0.0. and JDK.6. Here shows the capablty of ths parallel algorthm to handle large scale data from two aspects, regresson and classfcaton. 4. Regresson Artfcal dataset snc has been used wdely n regresson problem. All data can be generated by followng sn( x) / x, x 0 yx ( ) (4), x 0 A tranng set and test set can be generated at any scale, where x are randomly dstrbuted n nterval (-0, 0), noses n nterval (-, ) can be added to tranng data whle testng data remans orgnal. he crtera of parallel algorthm nclude RMSE and tranng tme. Fgure6 shows the tranng tme ncludng four parts showed n table as data scale up from 0 7 to 0 8, and the dataset scale vares from 335MB to 3.3GB. he total tranng tme ncreases sub lnearly due to proper matrx multplcaton schemes chosen by the algorthms. 6000 5000 HtH CHt H total 4000 tme(seconds) 3000 000 000 0 3 4 5 6 7 8 9 0 number of samples x 0 7 Fgure 6. ranng tme for regresson 68

Large Scale Extreme Learnng Machne usng MapReduce Fgure7 shows the speedup acheved by addng computaton node. Each node has 5 reducers and 7 mappers n our system. he hghest speedup s 5.7 usng 8 nodes n the experment. 8 7 6 Speedup 5 4 3 speedup deal lnear 3 4 5 6 7 8 umber of node(each has 5 reducers) Fgure 7. Speedup for regresson 4. Classfcaton ELM s essentally a supervsed learnng technology, whch requres the labeled tranng data. But t s hard to fnd large scale dataset labeled wth every sample; In fact automatcally labelng for unlabeled samples s an mportant motvaton of unsupervsed learnng. Above parallel algorthm manly execute the matrx multplcaton n parallel, other to mprove the orgnal algorthm n theory. So here shows the capablty to handle large scale data, all tranng data can be acqured by smply duplcatng orgnal data to a specfy scale. Fgure8 shows the tranng tme ncreases as the dataset scales up. he orgnal dataset s Image segmentaton wth 9 features and 7classes, has been wdely used n classfcaton problem. Smlar to the regresson case, here exsts sub lnearty between data scale and tranng tme. 000 800 600 ranng tme(second) 400 00 000 800 parallel ELM deal lnear 600 400 3 4 5 6 7 8 9 0 Dataset(GB) Fgure 8. ranng tme for classfcaton 5. Concluson In ths paper, we manly mplement parallel extreme learnng machne based on MapReduce. he foundaton of ths mplement s proper formula to calculate the parameters n ELM, and the 69

Large Scale Extreme Learnng Machne usng MapReduce correspondng parallel matrx multplcaton schemes. Experment shows the lnearty between tranng tme and dataset scale. As two man methods to deal wth large scale problem, parallel algorthm and onlne learnng algorthm may have somethng n common. It s an nterestng topc to compare these two methods n the case of ELM, and to dscuss the best appled area of ether n the future. 6. References [] John Langford, Lhong L, ong Zhang, Sparse Onlne Learnng va runcated Gradent, Journal of Machne Learnng Research, vol.0,no.0, pp.777-80, 009. [] J. Dean, S. Ghemawat, Mapreduce: Smplfed data processng on large clusters, In Proceedngs of Operatng Systems Desgn and Implementaton, pp.37 49, 004. [3] Cheng-ao Chu, Sang Kyun Km, Y-An Ln et al, Map-Reduce for machne learnng on multcore, In Proceedngs of Advances n eural Informaton Processng Systems, pp.8 88, 006. [4] Q.He, F.Z.Zhuang, J.C.L et al, Parallel mplementaton of classfcaton algorthms based on mapreduce, In Proceedngs of Rough Sets and Knowledge echnology, pp.655-6, 00. [5] Wen Cu, Guoyong Wang, Ke Xu, "Parallel Communty Mnng n Socal etwork usng Mapreduce", IJAC: Internatonal Journal of Advancements n Computng echnology, vol. 4, no. 5, pp. 445 453, 0 [6] Apache Hadoop, http://hadoop.apache.org/ [7] A. Verma, X. Llora, D. E. Goldberg, et al, Scalng genetc algorthms usng mapreduce, In Proceedngs of Internatonal Conference on Intellgent Systems Desgn and Applcatons, pp.3-8, 009 [8] LEI Le, "owards a Hgh Performance Vrtual Hadoop Cluster", JCI: Journal of Convergence Informaton echnology, vol. 7, no. 6, pp. 9 303, 0 [9] Guang-Bn Huang, Qn-Yu Zhu, Chee-Kheong Sew, Extreme Learnng Machne: A ew Learnng Scheme of Feedforward eural etworks, In Proceedngs of Internatonal Jont Conference on eural etworks, pp.985-990, 004. [0] Guang-Bn Huang, Qn-Yu Zhu, Chee-Kheong Sew, Extreme learnng machne: heory and applcatons, eurocomputng, vol.70, no.-3, pp.489-50, 006. [] G.-B. Huang, H. Zhou, X. Dng et al, Extreme Learnng Machne for Regresson and Multclass Classfcaton, IEEE ransactons on Systems, Man, and Cybernetcs - Part B: Cybernetcs, vol.4, no., pp.53-59, 0. [] Lang -Y, Huang G-B, Saratchandran P et al, A fast and accurate on-lne sequental learnng algorthm for feedforward networks, IEEE ransactons on eural etwork, vol.7, no.6, pp.4 43, 006. [3] J. Ln, C. Dyer, Data-Intensve ext Processng wth MapReduce, Morgan & Claypool Publshers, USA, 00. [4] Zhengguo Sun, ao L, aphtal Rshe, Large-Scale Matrx Factorzaton usng MapReduce, In Proceedngs of IEEE Internatonal Conference on Data Mnng Workshops, pp.4-48, 00 [5] Robson Leonardo Ferrera Cordero, Caetano rana Jr, Agma Juc Machado rana et al, Clusterng Very Large Mult-dmensonal Datasets wth MapReduce, In Proceedngs of ACM SIGKDD Conference on Knowledge Dscovery and Data Mnng, pp.690-698, 0 [6] Hsang-Fu Yu, Cho-Ju Hseh, Ka-We Chang et al, Large Lnear Classfcaton When Data Cannot Ft In Memory, In Proceedngs of ACM SIGKDD Conference on Knowledge Dscovery and Data Mnng, pp.833-84, 00. 70