A Hybrid Malicious Code Detection Method based on Deep Learning
|
|
|
- Rose Barton
- 10 years ago
- Views:
Transcription
1 , pp A Hybrid Malicious Code Detection Method based on Deep Learning Yuancheng Li, Rong Ma and Runhai Jiao School of Control and Computer Engineering, North China Electric Poer University, Beiing, China [email protected], [email protected], [email protected] Abstract In this paper, e propose a hybrid malicious code detection scheme based on AutoEncoder and DBN (Deep Belief Netorks). Firstly, e use the AutoEncoder deep learning method to reduce the dimensionality of data. his could convert complicated high-dimensional data into lo dimensional codes ith the nonlinear mapping, thereby reducing the dimensionality of data, extracting the main features of the data; then using DBN learning method to detect malicious code. DBN is composed of multilayer Restricted Boltzmann Machines (, Restricted Boltzmann Machine) and a layer of BP neural netork. Based on unsupervised training of every layer of, e make the output vector of the last layer of as the input vectors of BP neural netork, then conduct supervised training to the BP neural netork, finally achieve the optimal hybrid model by fine-tuning the entire netork. After inputting testing samples into the hybrid model, the experimental results sho that the detection accuracy getting by the hybrid detection method proposed in this paper is higher than that of single DBN. he proposed method reduces the time complexity and has better detection performance. Keyords: Malicious code Detection, AutoEncoder, DBN,, deep learning. Introduction Malicious code is the softare hich intentionally damage or destroy the function of system through adding, changing, deleting some code by unauthorized users in normal circumstances. In recent years, malicious code causing far-reaching influence mainly includes: viruses, Worm, roan horse, etc. According to the statistical results in [], in 200, Symantec recorded more than 3,000,000,000 malicious code attacks, and monitoring more than 280,000,000 independent variant malicious code samples. Compared to 2009, there is groth of 93% for the attack based on the Web. With the increase in the number of malicious code, this shos that the harm and loss is groing. As an important technology of netork security, intrusion detection discovers and recognizes intrusion behaviors or attempts in the system through the collection and analysis of key data in the netork and computer system. Efficient, accurate identification of malicious code can improve the efficiency of intrusion detection, therefore, malicious code analysis and detection is a key problem in intrusion detection technology. For detection of malicious code, according to the detected position it is currently divided into to approaches host-based and netork-based [2]: Netork-based detection methods, including Honeypot-based approach [3-4], and based on Deep packet Inspection [5]; Host-based detection methods, including check sum-based approach [6], signature-based approach [7-9], heuristic data mining approach [0]. he data mining method adopted many machine learning methods, hich had an effective detection of unknon malicious code through learning the characteristics of malicious code and the normal code [] revieed a variety of feature-extraction methods and machine learning ISSN: IJSIA Copyright c 205 SERSC
2 methods in a variety of malicious code detection applications, including naive Bayes, decision trees, artificial neural netorks, Support Vector Machine, etc., [2] proposed a static system call sequences based on N-gram and to automatic feature-selection methods, and adopted K-nearest neighbor algorithm, SVM, decision tree as the classifier. he literature [3] presented a malicious code behavior feature extraction and detection method based on semantics to obtain the behavior of malicious code hich has great anti-amming capabilities. Although the above methods have achieved certain results in the aspect of malicious code detection, there are still some problems. Such as, feature-extraction is not appropriate, the detection rate and the detection accuracy are not high, and the complexity of the algorithm is high. his paper selects KDDCUP 99 data set as experimental data, and proposes a hybrid malicious code detection model based on deep learning; Based on the AutoEncoder for data dimensionality reduction, this paper proposes to set DBN as a classifier. For the malicious code behavior, using multiple deep learning achieved better effects than surface learning model. Finally, this method improves the malicious code detection rate and detection accuracy, and reduces the time complexity of the hybrid model. 2. Hybrid Malicious Code Detection Model based on Deep Learning Netork data usually contains the normal data and the malicious data. Malicious code detection is to differentiate beteen the normal data and malicious code data separately, so essentially it belongs to binary classification problems. o get a good performance of the malicious code detection model, there are to aspects of ork need to be done: Firstly, finding the essential characteristics of malicious code data; secondly, constructing a good performance of classifier model to accurately differentiate the malicious data from the normal data. In this paper, e make use of the advantages of deep learning, the organic integration of to deep learning methods, AutoEncoder and DBN. his hybrid model extracts the essence of malicious code data, reduces the complexity of the model, and improves the detection accuracy of malicious code. 2. AutoEncoder Dimensionality Reduction AutoEncoder [4] is a kind of deep learning method for learning efficient code hich is proposed by G. E. Hinton in hrough the study of the compression coding of specified set of data, it can achieve the purpose of data dimensionality reduction. AutoEncoder structure is divided into part of encoder and decoder, including input layer, hidden layer, output layer. he cross section beteen encoder and decoder named code layer is the core of AutoEncoder that can reflect the essential characteristics of high dimensional data set ith nested structure, and to set the intrinsic dimensions of high-dimensional data sets. When the number of hidden layer neurons are less than the number of input layer and output layer neurons, e can get the compressed vector of input layer called the data dimensionality reduction. AutoEncoder consists of three steps, hich are pretraining, Unrolling and fine-tuning process [4], as shon in Figure. 206 Copyright c 205 SERSC
3 Figure. AutoEncoder Structure In the pretraining process, e set the output of each hidden layer neuron as the input of the next. consists of the visible units and hidden units. We use the vectorv and H represent the visible units and the hidden unit state respectively. he structure is shon in Figure 2. eights hi Hidden layer unit h v Hidden layer unit v Figure 2. he Netork Structure of Where vi denotes the state of the i visible unit, h denotes the state of the hidden unit, visible and hidden units meet the energy formula (): E v h b v b h v h (), i i i i iv H i, In the process of adustment of eight training, firstly e update the state of the hidden layer neuron, and then update the state of the visible layer, thus get the adusting eights. he eight updating rule as shon in formula (2): Where i t t t v h v h (2) i i i i i i denotes the eight adustment, i () t denotes the connection eights(hen in step t beteen the i, neuron), denotes the learning rate, vh i denotes the average forard correlation (Equal to the output of product of neurons in the hidden and visible neurons), vh i denotes the average reverse correlation. After the pre-training is completed, combining the current output unit ith the next input unit as the independent layer. Unrolling process is to connect these independent into a multi-layered AutoEncoder, the Unrolling process as shon in Figure 3. Copyright c 205 SERSC 207
4 30 4 Decoder Reconstructed data Code layer Initial data Initial data Encoder Pretraining Unrolling Figure 3. he Unrolling Process of AutoEncoder Fine-tuning process is the process that does the further adustments to the initial eights after pretraining process to get optimal eights. We mainly use the multiclass cross-entropy error function [5] for evaluation. he multiclass cross-entropy error function is the difference beteen the measurement of target probability distribution and the actual probability distribution, that the smaller, the to distributions are similar, and the better. AutoEncoder uses BP algorithm to adust the eights of the multiclass cross-entropy error function, as shon in formula (3): H [ y log y ˆ ( y )log( y ˆ )] (3) i i i i i i Where yi denotes the characteristics of the data sample values, yˆi denotes the Characteristics of the data sample after reconstruction. AutoEncoder adusts the eights in the fine-tuning process, out layer eight adustment rules shon as formula (4): H m i ti yi O (4) i Hidden layer eights adustment rules shon as formula (5): Where H m H m neti H m i O net net i i i i H m Oi H m O O ( O )O O net O i i i denotes the adustment step, 2. 2 DBN Deep Learning Structure O denotes the upper output neurons. DBN is a deep learning machine hich consists of an unsupervised multi-layer netork and a supervised BP netork. Each layer unit captures highly relevant implicit correlations from the hidden units of the front layer. he adacent layers of the DBN can be decomposed into a single limited, shon as Figure 4. In Figure 4, deep belief netorks shon as Figure (), and Figure (2) indicated that the use of each lo layer as input data for the training of the next, get a set of by the greedy learning. (5) 208 Copyright c 205 SERSC
5 Figure 4. he Structure and the Corresponding DBN Netork DBN training process is divided into to steps: he first step, train each layer of separately by the unsupervised ay; he second step, BP neural netork in the last layer of DBN, e set the output vector of the last as the input vector of BP neural netork, then do the supervised training to entity relation classifier. he paper [5] believes that, in the typical DBN hich has one hidden layer the relationship beteen visual layer v and hidden layer h can be expressed as formula (6): l2 l k k,,, 2 l 2 l P v h h P h h P h, h (6) k As Figure 2 shon, are mutually connected by the visible and the hidden layers. he connection matrix and the biases beteen the layers are get by unsupervised greedy algorithm. In specific training process, firstly, mapping the visual unit v i to the hidden layer unit h ; then, reversely reconstructing the v i using h ; Repeating this process, and updating the values of the connection matrix and the biases unless the reconstruction error is acceptable. Associated difference beteen hidden layer units and visual layer units ill form the basis for each eight update. Mapping probability of hidden layer units and visual layer units shon as formula (7) and (8): I ph v; i vi a (7) i Where i layer units, I pvi h; i hi b (8) denotes the connection eights beteen the visual layer units and hidden bi and a denotes biases respectively, sigmoid function denotes the incentive function. By using the gradient of the log likelihood probability log, ; e derive the eight update rule, as shon in formula (9): - mod i data i el i p v h, E v h E v h (9) Where denotes the expectation value, Edata vi h denotes the expectation value defined in the model. Because Emod el vih is difficult to calculate, e alays use the Gibbs sampling replace Emod el vih by using the contrast gradient divergence algorithm hich is similar to the gradient. hrough a combination of bottom-up s hich have carried out massive learnings layer by layer can construct an initial DBN. Copyright c 205 SERSC 209
6 hen fine tune the hole DBN from the back to the front by the supervised learning method hich is similar to the traditional BP neural netork. Finally, e can establish the trained DBN model Hybrid Malicious Code Detection based on DBN and AutoEncoder Deep learning has nonlinear mapping of the deep structure ith the multilayer hich has the benefits complex function can be expressed ith feer parameters. Compared ith surface learning, it can realize complex function approximation, and has strong ability for the massed learning of the essential characteristics of data set from a fe samples. Based on the above considerations, this paper proposes a hybrid malicious code detection model based on deep learning; Reducing dimensionality of the data by using the AutoEncoder s space mapping ability of different dimensionality, then abstracting the main characteristics. Based on this, setting DBN as the classifier for several times deep learnings. hen improving the detection accuracy, and reducing the time complexity of the hybrid model. Figure 5 depicts the process of mixing pre-trained detection algorithm. Begin Input raining sample dataset Set layer i= raining netork according Learning rules Data preprocessing AutoEncoder dimensionality reduction dimensionality reduction Output raining sample Dataset after dimensionality reduction Classification Reserve eights and biases If i<=max layer NO BP neural netork classification(supervised learning) YES Set layer i=i+ Netork parameters Normal code Malicious code stop Figure 5. A DBN Malicious Code Detection Method based on AutoEncoder Dimensionality Reduction he hybrid detection algorithm is described as follos: () Initialization, input training samples; then digitizing and normalizing the input data; (2) Reducing the dimension, AutoEncoder as used to realize the feature mapping; (3) Input eigenvector ith dimensionality reduction, netork parameter to initialize DBN classifier; (4) Set the layer i=; (5) rain the netork layer by layer according to learning rules, then save the result including the eights and biases; (6) If i<=max layer, set i=i+; hen i>max layer, do the supervised learning for BP netork; (7) Input the test samples into the trained classifier to detect malicious code and the normal code. 3. Experimental Results and Analysis 3. Analysis and Pretreatment of Experimental Data In this paper, KDDCUP'99 dataset [6] as used to detect malicious code data. hey include five categories: probe, UZR (User to Root), RZL (Remote to Local), DoS (Denial-of-Service) as ell as Normal data. his paper adopted 0% of the samples of KDDCUP'99 as a dataset, containing a total of 494,02 training data and 3,029 testing 20 Copyright c 205 SERSC
7 data. In the dataset of KDDCUP'99, each data contains 4 properties. here are to types of data: numerical and character type. For numerical data, e can treat it directly as number; for the character of character data, e can achieve numeric in the standard method of keyords. o eliminate the effects caused by differences of the magnitude, and to reduce the excessive reliance on individual characteristics in the process of classification, e need to normalize data. Firstly, each feature as standardized according to the formula (0) xi AVERAGE x ' i (0) SAND AVERAGE x x2 xn () n SAND x AVERAGE xn AVERAGE (2) n AVERAGE 0,' x 0; SAND 0,' x 0 i Secondly, the standardized features need to be normalized, as shon in the formula (3): x' i min x x ' i (3) max x x' i Where x denotes the value of the original training sample, max (or min) denotes the maximum value for the sample data in the condition of same indicator (or minimum). 3.2 Evaluation Index Experimental Results his paper uses the folloing indexes to evaluate experimental results, hich are PR (rue Positive Rate), FPR (False Positive Rate), Accuracy, CPU time consumption. hey are defined as follos: PR = the number of correct results of normal code samples/the actual number of normal code samples, FPR = the number of malicious code samples hich are predicted to be normal code/the actual number of malicious code samples. 3.3 Comparison of Experimental esults Experimental test environment: the platform of Intel Core Duo CPU 2.0GHz and 2.00G RAM's, Matlab v7.. his paper uses 2000 samples extracted from 0% samples in proportion hich contain the 4 attacks recorded test data and additional 4 types of experiments. he experiment designed the AutoEncoder hich consists of five layers. he numbers of neurons in the previous four-layer netork are 4, 300, 50, 75, respectively. Furthermore, the number of neurons in the last layer is variable, hich determine the dimension of data number after dimensionality reduction. After the pretraining process of the training and testing data, e use AutoEncoder for data dimensionality reduction. hrough changing the iterations of the pretraining and fine-tuning, e could get different models, including AutoEncoder + DBN 5-5 (pretraining iterations 5 times, fine-tuning5 times); AutoEncoder + DBN 0-0 (pre-training iterations 0 times, fine-tuning 0 times); AutoEncoder + DBN 0-5 (pre-training iterations 0 times, fine-tuning five times). he detection results of malicious code as shon in able. able. he Results of the Different Detection Methods Model PR FPR Accuracy CPU time(s) i DBN 95.34% 9.02% 9.4%.26 Copyright c 205 SERSC 2
8 Mean Squared Error Mean Squared Error International Journal of Security and Its Applications AutoEncoder+DBN % 5.79% 89.75% AutoEncoder+DBN % 9.7% 88.95%.47 AutoEncoder+DBN %.58% 92.0%.243 he experimental results sho that ith the increase in the number of iterations, in the respect of detection accuracy, the proposed method is superior to the method of single DBN, hich as used in the first experiment. Apparently, using AutoEncoder to achieve data dimension reduction is effective, it can improve the detection accuracy, for using AutoEncoder can capture the essential characteristics of date efficiently. Meanhile, the accuracy of detection (P) is reduced. Overall, in the respect of prediction accuracy, the mentioned method described in the paper is superior to the single DBN method. It can adapt to the complex environment, achieve effective detection of malicious code, moreover, it consumes less time. Figures 6 and 7 sho the error rate in the process of pretraining and fine-tuning. After to iterations, the error rate is maintained at a loer level stably. 8 x Reconstruction Error for pre-training Iteration Figure 6. Pretraining Reconstruction Error Reconstruction Error for fine-tuning raining esting Iteration Figure 7. Fine-tuning Reconstruction Error 22 Copyright c 205 SERSC
9 Detecting accuracy raining time (in seconds) Accuracy International Journal of Security and Its Applications Detecting accuracy ith differet dimension [5 5] iterations [0 0] iterations [0 5] iterations dimension Figure 8. Effect of Dimensions on the Correct Detecting Accuracy raining time ith differet dimension [5 5] iterations [0 0] iterations [0 5] iterations Dimension Figure 9. Effect of Dimensions on the ime Consumption here are many parameters in the AutoEncoder, such as netork structure, output dimension of data after dimensionality reduction, the number of iterations for pretraining and fine-tuning, etc. he output dimension of data after dimensionality reduction is one of the maor parameters among them. his paper explores the impact of these parameters on these mentioned methods. Figures 8 and 9 respectively sho the effect on the detection accuracy and the time consumption of the method. In figure 8, detection accuracy increases ith increasing number of iterations. In Figure 9, ith the increase of the number of iterations, CPU time consumption varies, but the dimension and training time consumption have no direct correlation, because AutoEncoder can restore data based on less information loss and error Detecting accuracy ith different iteration Pretraining iterations Fine-tuning iterations iteration Figure 0. he Relations beteen the Correct Detecting Accuracy and Iterations Copyright c 205 SERSC 23
10 raining time (in seconds) International Journal of Security and Its Applications 2.8 raining time ith different iteration Pretraining iterations Fine-tuning iterations iteration Figure. he Relations beteen the ime Consumption and Iterations Figure 0 and Figure sho the effect on the detection accuracy of the number of iterations and the time consuming. Figure 0 sho that hen pretraining iterations increased to 0 times, the detection accuracy reached the highest point. Figure shos that hen pretraining iterations increased to 0 times, most of the time consumption is maintained at a lo level. Fine-tuning process is to adust the eights using back-propagation, for lo-dimensional data, the netork is over-learning. he iterations of fine-tuning do not affect to assessed value directly. AutoEncoder reduces the data dimensions and extracts the main features of data through the nonlinear mapping for complex multidimensional data; this makes the effectiveness of the experiment increased hen applying DBN to classify. In short, for the detection of malicious code, the hybrid method mentioned in this paper is apparently superior to the single DBN method in the first experiment on the hole. 4. Conclusion Against the problem of detecting malicious code, e propose a hybrid method of detecting malicious code based on deep learning, hich combines the advantages of AutoEncoder and DBN respectively. Firstly, the method used AutoEncoder for data dimensionality reduction to extract the main feature of data. hen the method uses DBN to detect malicious code. Finally, the experiment as verified by KDDCUP'99 dataset. Experimental results sho that compared ith the detection method using single DBN, the proposed method improves detection accuracy, hile reducing the time complexity of the model. Hoever, in practical application, according to actual situation, the method proposed in this paper needs to have further improvements in order to improve its performance. Acknoledgements his ork as supported in part by he Fundamental Research Funds for the Central Universities (No. 204MS29). References [] Symantec Corporation, Symantec Internet security threat report trends for 200 [EB/OL] (20-04) [ ], (20), Internet Security hreat report 200.pdf. [2] N. Idika and A. P Mathur, Survey of malare detection technical, echnical Report, Department of Computer Science, Purdue University, (2007). [3] C. L. sai, C. C. seng and C. C. Han, Editors, Intrusive behavior analysis based on honey pot tracking and ant algorithm analysis, 43rd Annual 2009 International Carnahan Conference, (2009) October 5-8, Zurich, Sitzerland. [4] W. Wang and I. Murynets, J. Security and Communication Netorks, vol. 6, no., (203). [5] P. C. Lin, Y. D. Lin, Y. C. Laiand and. H. Lee, J. Computer Practices, vol. 4, no. 4, (2008). 24 Copyright c 205 SERSC
11 [6] Y. Saaya, A. Kubota and Y. Miyake, Editors, Detection of attackers in services using anomalous host behavior based on traffic flo statistics, IEEE/IPSJ th International Symposium, (20) July 8-2, Munich, Bavaria, Germany. [7] M. Milenkovic, A. Milenkovic and E. Jovanov, J. ACM SIGARCH Computer Architecture Nes, vol. 33, no., (2005). [8] M. Christodorescu, S. Jha, S. A. Seshia, D. Song and R. E. Bryant, Editors, Proceedings of the 2005 IEEE Symposium Security and Privacy, (2005) May 8-; Oakland, California. [9] M. Christodorescu and S. Jha, Static analysis of executables to detect malicious patterns, Wisconsin: University of Wisconsin, (2006). [0] D. G. Kong, X. B. an, H. S. Xi,. Gong and J. M. Shuai, J. Journal of Softare, vol. 22, no. 3, (20). [] A. Shabtai, R. Moskovitch, Y. Elovici and C. Glezer, J. Information Security echnical Report, vol. 4, no., (2009). [2] Y. X. Ding, X. B. Yuan, D. Zhou, L. Dong and Z. C. An, J. Computers &Security, vol. 30, no. 6, (20). [3] R. Wang, D. G. Feng, Y. Yang and P. R. Su, J. Journal of Softare, vol. 23, no. 2, (202). [4] G. E. Hinton and R. R. Salakhutdinov, J. Science, vol. 33, no. 5786, (2006). [5] G. E. Hinton, Distributed representations, ech. Report, University of oronto, (984). [6] KDDCUP99, Available on, (2007), Authors Yuancheng Li, received the Ph.D. degree from University of Science and echnology of China, Hefei, China, in From 2004 to 2005, he as a postdoctoral research fello in the Digital Media Lab, Beihang University, Beiing, China. Since 2005, he has been ith the North China Electric Poer University, here he is a professor and the Dean of the Institute of Smart Grid and Information Security. From 2009 to 200, he as a postdoctoral research fello in the Cyber Security Lab, Pennsylvania State University, Pennsylvania, USA. His current research interests include Smart Grid operation and control, information security in Smart Grid. Copyright c 205 SERSC 25
12 26 Copyright c 205 SERSC
The Research on Demand Forecasting of Supply Chain Based on ICCELMAN
505 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
Network Traffic Prediction Based on the Wavelet Analysis and Hopfield Neural Network
Netork Traffic Prediction Based on the Wavelet Analysis and Hopfield Neural Netork Sun Guang Abstract Build a mathematical model is the key problem of netork traffic prediction. Traditional single netork
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
An Imbalanced Spam Mail Filtering Method
, pp. 119-126 http://dx.doi.org/10.14257/ijmue.2015.10.3.12 An Imbalanced Spam Mail Filtering Method Zhiqiang Ma, Rui Yan, Donghong Yuan and Limin Liu (College of Information Engineering, Inner Mongolia
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
KEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
Chapter 1 Hybrid Intelligent Intrusion Detection Scheme
Chapter 1 Hybrid Intelligent Intrusion Detection Scheme Mostafa A. Salama, Heba F. Eid, Rabie A. Ramadan, Ashraf Darwish, and Aboul Ella Hassanien Abstract This paper introduces a hybrid scheme that combines
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
LCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS
LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS George E. Dahl University of Toronto Department of Computer Science Toronto, ON, Canada Jack W. Stokes, Li Deng, Dong Yu
A Survey on Intrusion Detection System with Data Mining Techniques
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
A Neural Network Based System for Intrusion Detection and Classification of Attacks
A Neural Network Based System for Intrusion Detection and Classification of Attacks Mehdi MORADI and Mohammad ZULKERNINE Abstract-- With the rapid expansion of computer networks during the past decade,
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
E-mail Spam Filtering Using Genetic Algorithm: A Deeper Analysis
ISSN:0975-9646 Mandeep hodhary et al, / (IJSIT) International Journal of omputer Science and Information Technologies, Vol. 6 (5), 205, 4266-4270 E-mail Spam Filtering Using Genetic Algorithm: A Deeper
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
A survey on Data Mining based Intrusion Detection Systems
International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
Feature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Proactive Drive Failure Prediction for Large Scale Storage Systems
Proactive Drive Failure Prediction for Large Scale Storage Systems Bingpeng Zhu, Gang Wang, Xiaoguang Liu 2, Dianming Hu 3, Sheng Lin, Jingwei Ma Nankai-Baidu Joint Lab, College of Information Technical
Application of Neural Network in User Authentication for Smart Home System
Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM
A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM MS. DIMPI K PATEL Department of Computer Science and Engineering, Hasmukh Goswami college of Engineering, Ahmedabad, Gujarat ABSTRACT The Internet
Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme
Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme Chunyong Yin 1,2, Yang Lei 1, Jin Wang 1 1 School of Computer & Software, Nanjing University of Information Science &Technology,
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK
Proceedings of the IASTED International Conference Artificial Intelligence and Applications (AIA 2013) February 11-13, 2013 Innsbruck, Austria LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK Wannipa
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION
CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION MATIJA STEVANOVIC PhD Student JENS MYRUP PEDERSEN Associate Professor Department of Electronic Systems Aalborg University,
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
E-mail Spam Classification With Artificial Neural Network and Negative Selection Algorithm
E-mail Spam Classification With Artificial Neural Network and Negative Selection Algorithm Ismaila Idris Dept of Cyber Security Science, Federal University of Technology, Minna, Nigeria. [email protected]
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee [email protected] [email protected] Abstract We
NEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
A Dynamic Flooding Attack Detection System Based on Different Classification Techniques and Using SNMP MIB Data
International Journal of Computer Networks and Communications Security VOL. 2, NO. 9, SEPTEMBER 2014, 279 284 Available online at: www.ijcncs.org ISSN 2308-9830 C N C S A Dynamic Flooding Attack Detection
Neural network software tool development: exploring programming language options
INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira [email protected] Supervisor: Professor Joaquim Marques de Sá June 2006
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
AnalysisofData MiningClassificationwithDecisiontreeTechnique
Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Data Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract - Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware Cumhur Doruk Bozagac Bilkent University, Computer Science and Engineering Department, 06532 Ankara, Turkey
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks
2011 International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks Reyhaneh
The Applications of Deep Learning on Traffic Identification
The Applications of Deep Learning on Traffic Identification Zhanyi Wang [email protected] Abstract Generally speaking, most systems of network traffic identification are based on features. The features
RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE
RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE WANG Jizhou, LI Chengming Institute of GIS, Chinese Academy of Surveying and Mapping No.16, Road Beitaiping, District Haidian, Beijing, P.R.China,
Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks
Performance Evaluation On Human Resource Management Of China S *1 Honglei Zhang, 2 Wenshan Yuan, 1 Hua Jiang 1 School of Economics and Management, Hebei University of Engineering, Handan 056038, P. R.
Intrusion Detection using Artificial Neural Networks with Best Set of Features
728 The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 Intrusion Detection using Artificial Neural Networks with Best Set of Features Kaliappan Jayakumar 1, Thiagarajan Revathi
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Research on the Performance Optimization of Hadoop in Big Data Environment
Vol.8, No.5 (015), pp.93-304 http://dx.doi.org/10.1457/idta.015.8.5.6 Research on the Performance Optimization of Hadoop in Big Data Environment Jia Min-Zheng Department of Information Engineering, Beiing
International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163 Volume 1 Issue 11 (November 2014)
Denial-of-Service Attack Detection Mangesh D. Salunke * Prof. Ruhi Kabra G.H.Raisoni CEM, SPPU, Ahmednagar HOD, G.H.Raisoni CEM, SPPU,Ahmednagar Abstract: A DoS (Denial of Service) attack as name indicates
A Stock Pattern Recognition Algorithm Based on Neural Networks
A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo [email protected] Xun Liang [email protected] Xiang Li [email protected] Abstract pattern respectively. Recent
CRITERIUM FOR FUNCTION DEFININING OF FINAL TIME SHARING OF THE BASIC CLARK S FLOW PRECEDENCE DIAGRAMMING (PDM) STRUCTURE
st Logistics International Conference Belgrade, Serbia 8-30 November 03 CRITERIUM FOR FUNCTION DEFININING OF FINAL TIME SHARING OF THE BASIC CLARK S FLOW PRECEDENCE DIAGRAMMING (PDM STRUCTURE Branko Davidović
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Adaptive Anomaly Detection for Network Security
International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 5, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com Adaptive Anomaly Detection for
A Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network
Distributed Sensor Networks Volume 2015, Article ID 157453, 7 pages http://dx.doi.org/10.1155/2015/157453 Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network
Choosing the Optimal Object-Oriented Implementation using Analytic Hierarchy Process
hoosing the Optimal Object-Oriented Implementation using Analytic Hierarchy Process Naunong Sunanta honlameth Arpnikanondt King Mongkut s University of Technology Thonburi, [email protected] King
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
How To Prevent Network Attacks
Ali A. Ghorbani Wei Lu Mahbod Tavallaee Network Intrusion Detection and Prevention Concepts and Techniques )Spri inger Contents 1 Network Attacks 1 1.1 Attack Taxonomies 2 1.2 Probes 4 1.2.1 IPSweep and
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Face Recognition For Remote Database Backup System
Face Recognition For Remote Database Backup System Aniza Mohamed Din, Faudziah Ahmad, Mohamad Farhan Mohamad Mohsin, Ku Ruhana Ku-Mahamud, Mustafa Mufawak Theab 2 Graduate Department of Computer Science,UUM
Design call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
A simple application of Artificial Neural Network to cloud classification
A simple application of Artificial Neural Network to cloud classification Tianle Yuan For AOSC 630 (by Prof. Kalnay) Introduction to Pattern Recognition (PR) Example1: visual separation between the character
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Application of Data Mining Techniques in Intrusion Detection
Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology [email protected] Abstract: The article introduced the importance of intrusion detection, as well as
An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation
An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation Shanofer. S Master of Engineering, Department of Computer Science and Engineering, Veerammal Engineering College,
An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus
An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: [email protected]
Fault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
