Applying Multiple Neural Networks on Large Scale Data

0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree Sinthupinyo Departent of Coputer Engineering Chulalongkorn University, Bangkok, Thailand E-ail: g5akl@cpengchulaacth Departent of Coputer Engineering Chulalongkorn University, Bangkok, Thailand E-ail: sukrees@chulaacth Abstract Analysis on large data sets is highly iportant in data ining Large aount of data norally requires a specific learning ethod Especially soe standard ethods, for exaple the Artificial Neural Network, need very long learning tie This paper presents a new approach which can work efficiently with the neural networks on large data sets Data is divided into separated segents, and learned by a sae network structure Then all weights fro the set of networks are integrated The results fro the experients show that our ethod can preserve the accuracy while the training tie is draatically reduced Keywords: Neural Network, Large Scale Dataset, Increental Learning Introduction Studies in the neural network have been divided into several aspects whether it is a study in structure odelling, network design, and perforance iproveent to quickly learn and to achieve ore accurate results [] There are any neural network related techniques including applications in various fields, for exaple, data ining, iage recognition, weather forecasting, traffic, stock arket, etc Research in neural network is also aied to iprove in different ways including faster network processing, ore efficiency, or fewer errors These are still a focus of research attention This research will focus on the analysis of large data by applying ultiple neural networks to learn several sub dataset Shibata and Ikeda showed that the nuber of neurons and the nuber of hidden layers in the network can affect perforance [], because a sall nuber of layers can process faster than a big one Their work focuses on the structure level of the neural network Generally, nuber of hidden layer can increase the accuracy of learning But it will affect the learning tie uch ore than a sall layer In addition, a large data set is difficult to learn at one tie It requires both resources and tie Thus, this paper presents an idea that we can separate the large data sets to be ultiple subsets each of which is trained using sae structure neural networks Then we iprove the overall accuracy using technique that replaces the appropriate weight fro the sallest error to each node of the neuron Then, the weights of the best sall data sets to are used to create a new network The purposed technique help the neural network learn larger data sets Using our purposed technique, the accuracy is coparable to the network trained by the whole data set while the training tie is draatically decreased Backpropagation Neural Network In this research, a faous learning technique, ie back-propagation, will be used to train a neural network Backpropagation is a supervised learning technique, in which neurons connect each other with weighted connection A signal transitted to the next neuron is weighted by a link connecting fro one node to another node as shown in (Fig ) 89

Y Y i n i n j k Y k Input layer Hidden layer Output layer l Y l Fig : Backpropagation Neural Network Structure Weight and threshold will be deterined in a transfer functions It will be transitted by the neuron in the hidden layer Its output is given by () P is a learning exaple in a dataset, n is nuber of input of j neurons in a hidden layer, i is input i that transit to neuron and Y j is output, W is weight of each neuron and is a threshold The sigoid function, is shown in () () The results fro the calculation will be fed to output layer using the function below (3) Where P is a learning exaple in dataset, denotes the nuber of neuron in output layers, k shows the output layer, jk is input fro hidden layer, and Y is output W is weight of each neuron, and θ k is a threshold of this output node 3 Methodology 3 Training Separation Our experient is aied at iproving the original training algorith to be able to train a large dataset by separated equally sub datasets The idea behind the proposed ethod is that the saller sub data set will consue less training tie than a big one However the error fro each sub data set is unstable and all weights are separated Hence, we will collect the weight fro each node in the lowest error network and use these weights to replace the trained weight of other networks with the sae structure Assue that we divide the original training set (N 0 ) into n sub data sets, naely N to N n Each equally separated set will contain /n instances and each sub set will be trained by the sae BP network structure with a single hidden layer The concept of our training ethod is showed if Fig () N Network N N 0 Network N n Network n Original Datasets Sub Datasets Input Fig : Training ethod which divides the original datasets into n sub data sets each of which will be used as training sets for n BP networks Hidden Output 90

When all networks are copletely trained, we will evaluate results of each network The weight fro the best trained network (N best ) will be used as a starting weight set in the weight integration process N best will be applied to all other datasets to clarify that N best has better result or the network trained fro of the set itself is better than N best During this process, the weight replaceent which will be described in section 3 is eployed After all network coparisons and weight integrations and copleted, we will have a neural network with the sae structure as the original, but the weights have been chosen to best suited the all sub data sets 3 Network Integration Method We change the weights in N best using two ethods, ie Weight Cobination and Node Creation, The first ethod is used when two hidden nodes fro different networks are closed to each other The new weights of one node in N best will be set as the average weights fro both hidden nodes as shown in Equation () N best where, and denote the weight vector of two hidden nodes which are closed to each other In the latter case, when a node fro N i cannot be cobined to N best, that node is directly inserted into 4 Experiental Results 4 Data Preparation In this experient, we used two data sets fro UCI repository [9] Both are Iris and Letter Recognition The Letter Recognition contains 0,000 records which take very long learning tie using the ordinary training ethod We used a achine learning tool, WEKA [0], and MBP [] in all of our experients 4 Network Convergence () Fig 3: Error convergence of Iris (left) and Letter Recognition (right) dataset Fig 3 shows error of dataset, Iris-0 indicates the ain dataset, and Iris- to Iris-0 denotes ten sub datasets The right graph shows error on Letter Recognition dataset Letter-0 to Letter-0 are error fro sub dataset to 0 respectively Letter-00 is the original training set We can see fro both graphs that the error of the sub datasets is uch lower than the error of the original training set The trained weights converge to the point that gives the lower error rate on the training set 43 Experiental Results In this section, the accuracy of sub datasets fro both datasets is shown in Table N 9 and N 6 were the best networks fro Iris and Letter Recognition, respectively After we obtained the best network (N best ), we could integrate weights fro all networks and the results and the results are shown in Table 9

5 Conclusion Table : RMS Error of each network and ain network on its own dataset RMS Error Data Set Iris Letter Recognition N all 000874657 007047465 N 000858700 0057487767 N 000097 0060338648 N 3 000456998 0058505 N 4 0009377 0057673889 N 5 00038304094 00569459839 N 6 00038304094 0054749649 N 7 0004999995 00554568 N 8 00009637 005873707 N 9 00007008063 005969564 N 0 00039080 00556769854 Table : RMS Error of N best before and after integration Dataset Weight Set RMS Error Iris N best 000874657 N best (after integration) 0003989667 Letter Recognition N best 007047465 N best (after integration) 006705443 This paper has proposed a new ethod which can apply Backpropagation Neural Network to large datasets We can see fro our experient that weights of sall dataset converged faster than the original dataset However, the weights trained fro sub dataset did not achieve better result when they were tested on the original dataset Hence, our weight integration approach was introduced After the rule integration, we found that the accuracy of the weight obtained fro the proposed ethod is better than the weight set which is the best aong the sub datasets 6 References [] Y Zhao, J Gao, and Yang, A Survey of neural network ensebles, IEEE Trans Pattern Analysis and Machine Intelligence, 005 [] K Shibata and Y Ikeda, Effect of nuber of hidden neurons on learning in large-scale layered neural networks, ICROS-SICE International Joint Conference, 009, p5008-503 [3] K Kaikhah and S Doddaeti, Discovering Trends in Large Datasets Using Neural Networks, Applied Intelligence, Springer Science + Business Media, Inc, Netherlands, vol 4, 006, pp 5 60 [4] B Liang and J Austin, A neural network for ining large volues of tie series data, IEEE Transactions on Neural Networks, 005, pp688-693 [5] L Fu, H Hsu, and J Principe, Increental Backpropagation Learning Networks, IEEE TRANSACTIONS ON NEURAL NETWORKS, vol 7, no 3, 996, pp 757-76 [6] D ia, F Wu, Zhang, and Y Zhuang, Local and global approaches of affinity propagation clustering for large scale data, Journal of Zhejiang University SCIENCE, 008, p373-38 [7] J Heaton, Introduction to Neural Networks for C#, Second Edition, Heaton Research, Inc, Chesterfield, St Louis, United States, Second Edition, 008, pp 37-64 [8] N Lopes and B Ribeiro, Hybrid Learning in a Multi Neural Network Architecture, IEEE Transactions on Neural Network, CISUC- Centro de Inforatica e Sisteas, Departent of Inforatics Engineering, University of Coibra, Portugal, 00, pp 788-793 [9] UCI Data Sets, The UCI Machine Learning Repository, Center for Machine Learning and Intelligent Systes, University of California, Irvine, United States, 007, http://archiveicsuciedu/l/datasetshtl 9

[0] M Hall, E Frank, G Holes, B Pfahringer, P Reuteann, and I H Witten The WEKA Data Mining Software: An Update SIGKDD Explorations, Volue, Issue, 009 [] Multiple Back-Propagation, Back-Propagation Siulation Tool, Noel de Jesus Mendonça Lopes Instituto Politécnico da Guarda, Portugal, 009, http://ditipgpt/mbp 93