Using Dalvik opcodes for Malware Detection on Android
|
|
|
- Shawn Dennis
- 10 years ago
- Views:
Transcription
1 Using Dalvik opcodes for Malware Detection on Android José Gaviria de la Puerta, Borja Sanz, Igor Santos and Pablo García Bringas DeustoTech Computing, University of Deusto Abstract. Over the last few years, computers and smartphones have become essential tools in our ways of communicating with each-other. Nowadays, the amount of applications in the Google store has grown exponentially, therefore, malware developers have introduced malicious applications in that market. The Android system uses the Dalvik virtual machine. Through reverse engineering, we may be able to get the dierent opcodes for each application. For this reason, in this paper an approach to detect malware on Android is presented, by using the techniques of reverse engineering and putting an emphasis on operational codes used for these applications. After obtaining these opcodes, machine learning techniques are used to classify apps. Keywords: Android, Malware, Opcodes, Detection, Machine Learning. 1 Introduction As we all known, in the last few years, mobile terminals, also known as smartphones, have become very popular devices. Nowadays, many smartphones have more computing capabilities and memory than many computers that are only a few years old. Like any personal device, the smartphone uses an operating system, which is usually pretty friendly for that the users do not have problems when using it. Throughout history, dierent operating systems to these devices have emerged. In an interview with Andy Rubin, former Android boss, he stated that "there should be nothing that users can access on their desktop that they cannot access on their cell phone."[1]. Thanks to this sentence, we can demonstrate the progress that these small computers are having. By its hardware, which consists of a wide range of sensors such as camera, accelerometer and GPS, we are provided a wealth of information and data, which put a number of additional requirements to mobile operating systems. People use mobile devices for a wide range of purposes as if they were desktop computers: web browsing, social networking, online banking, and more. The socalled smartphones also oer features that are unique to mobile phones like, for instance, SMS messaging, location data constantly updated, and ubiquitous access. As a result of their popularity and functionality, smartphones are a growing target of malicious activity.
2 Today, one of the most common ways to perform malicious actions is by using malicious code or malware. The term malware comes from the Anglo-Saxon words MALicious and software, which comes to mean malicious software. Typically, such software poses as legitimate applications to run its malicious actions without the user's knowledge. One of the primary objectives of the malware is to make enormous prot, whether economic or theft of information, for as long as possible. In order to achieve this goal, these malicious applications try to stay hidden in the system without the user to see an anomalous behaviour in the device. It is well known that malware has grown in recent years and that is has become one of the biggest threats in recent times. According to Kaspersky Labs antivirus company, about 145,000 new malware samples for mobile devices appeared in 2013, tripling the samples detected in the previous year 1. Lately, malware has increased exponentially, specially in the Android platform. The section 2 is a small state of the art with related researchs of android systems that have been carried out by the scientic community. In section 3 we present the scope of this experiment, and how we obtained the necessary information to perform it. Section 4 summarizes the dierent classiers used for this experiment. Section 5 makes a statement of the results, and the parameters to consider with such experimental validation. Finally, the section 6 shows the conclusions obtained by the experiment and some possible lines of future work to be carried out. 2 Related Work The Android operating system is designed to run each application on its own virtual machine, Dalvik. This type of implementation makes the system more robust and limits the damage caused by bad programming[2]. In addition, the Android systems are supplied with a permission system that does not allow third party applications to access resources that should not be accessible. These permissions are assigned to the application at installation time but the user is the one that has to make that nal decision. He or she is the person to choose to install the application or not, depending on the permissions that the application has. Malicious applications developers can request more permissions than they really need, for example, to obtain private user information[3]. Along this line, a research which was conducted by Sanz et al. [4] showed us that just by taking a look at the permissions that the app required, they could classify it as malicious or benign. Other authors have also used this feature in their researches [5, 6]. Nevertheless, Android is a system that is constantly changing and now takes groups of permissions. With those clusters, these approaches lose eectiveness. 1 Mobile-malware-evolution-3-infection-attempts-per-user-in-2013? ClickID=c4azsxkfiallqkfvsvzavqvkz4ixn4q7fnqn
3 A classical approach in mobile devices is to analyse the behaviour of dierent hardware components, such as the CPU or the battery, searching for anomalies[7, 8]. However, even if it is true that these approaches can help us discover strange behaviour in the system, they have the problem of the complexity in the programming, and also, the constant updates that a program could have in a month. These problems can impact in these elements but not for that reason are malicious applications. Meanwhile, Schmidt et al. [9] developed a framework that used system calls as a feature for the classication of benign and malicious applications in the Android system. Still, this type of classication has the problem that it is very expensive to get such information. Furthermore, depending on the number of calls, the performance of this approach may be very low. Shabtai et al. [10] did some research using the decompiled les of an Android application. In this study, they managed to classify malware using extracted features, such as the classes used by the application, as well as some machine learning techniques. For this approach we need a very large number of features extracted from the applications to get a good ranking. One of the techniques that is often used in the creation of Android malware is to use a legitimate application and include a malicious code in it. Along this line, Zhou et al. [11] did some research on third party app stores. A major limitation of this approach is the necessity of the legitimate application to see if malicious code is inserted into it. 3 Experimentation scope In this context, following the guidelines used for the detection of malicious code on desktops led by Santos et al [12], the authors propose a study that consists of: 1) Capture of information on the operational codes used by an application, 2) malicious or benign classication of an application, by using machine learning techniques. After obtaining all applications, the main objective of the research is the ability, through supervised learning techniques, to detect with the fewest false positives and false negatives, the malicious apps generated to be used within the Android platform. In the next subsections, we dene the methodology that has been used for modelling the applications, as well as its characteristics that will be used to present this proof of concept. This methodology uses all the possibilities of the application created by the University of Waikato, Weka 2, to employ dierent ranking algorithms. 2 Weka: Data Mining Software is a collection of machine learning algorithms for automated data mining tasks:
4 3.1 Samples collection On the one hand, to obtain benign code samples, we used the Selenium 3 application for automating web browser. With this automation we use the application web APIfy 4 for downloading Android applications from dierent categories. On the other hand, the samples of malicious code have been used throughout the dataset provided by Android Genome Project [13]. 3.2 Information Gathering This phase has proceeded to create a platform in the programming language C#. This platform has been implemented as a plugin for obtaining dierent opcodes used by an apk to execute their actions. Studying these opcodes, a particular one called RSUB_INT has been found. It is only published on benign code applications, specially in 639 applications. 4 Machine learning and supervised classication In this paper the authors present an experimental model that makes use of modeling techniques described above. The objective is to obtain the opcodes for such application that may be employed by some classiers for classify applications. In problems of machine learning and supervised classication, a phenomenon represented by a vector X in R d which can be classied in K ways according to label Y, is studied. To this end, we have D n = {(X i, Y i )} n i=1 called training set, where X i represents the events corresponding to the phenomenon X while Y i is the label that puts it in the category that the classier takes as correct. For example, in the present case, we are talking about an application X i dened by a set of opcodes that represent it, where Y i the category assigned to that application as estimated by the classier. In this learning case, all classiers were done using the method of representation of vector space model for the representation as a vector of dierent frequencies of opcodes. This method represents documents in natural language to a formal way using vectors in a multi-dimensional linear space. The basic form of this method is represented by the cosine of the angle between the two vectors generated for the similarity between the terms. 4.1 Classication algorithms In this research, we have chosen to compare the performance of dierent classication algorithms given the occasionally notable dierences in eectiveness that can be observed in similar experiments conducted in other areas [14]. The
5 algorithms used for the tests in Section 5 are the following: Random Forest, J48, Bayes Theorem-based algorithms, K-Nearest Neighbor (KNN), Sequential Minimal Optimization (SMO) and Simple Logistic. Random Forest. Random Forest is an aggregation classier developed by Leo Breiman [15] which is formed by a bunch of decision trees considered in a way in which the introduction of a stochastic component improves de precision of the classier, either in the construction of the trees or either in the training dataset. J48. J48 is an open source implementation for Weka of the C4.5 algorithm [16]. C4.5 creates decision trees given an amount of training information making use of the concept information entropy [17]. The training data consist of a group S = s 1, s 2,..., s n of already classied samples s 1 = x 1, x 2,..., x m in which x 1, x 2,..., x m represent the attributes or characteristics of each sample. In each node of the decision tree, the algorithm will choose the attribute in the data that most eciently divides the dataset in enriched choruses of a given class using the entropy dierence or the already mentioned normalized information gain as selective criteria. Bayes Theorem-based algorithms. The Bayes Theorem, the base for the Bayesian inference, is a statistical method which determines, based on a number of observations, the probability of a certain hypotheses being true. For the classication needs here exposed, this is the most important capability of Bayesian networks: in our case, the probability of an app being malicious or bening The theorem is capable of adjusting the probabilities as soon as new observations are performed. Thus, Bayesian networks conform a probabilistic model that represents a collection of randomized variables and their conditional dependencies by means of a directed graph. We have trained our models with three dierent search algorithms: K2, Hill Climbing and TAN. In this group we have also considered the inclusion of Naïve Bayes. The idea is that if the number of independent variables managed is too big, it does not make any sense to make probability tables [18]. Then, the reduced model with simplied datasets give to the algorithm the appellative of Naïve. K-Nearest Neighbor (KNN). The KNN algorithm is one of the most simple classication algorithms amongst all of those available for the machine learning techniques. It takes decisions based on the results of the k closest neighbours to the analysed sample in the experimental n-dimensional space ( k N). In this case, and taken into account the simplicity of the algorithm, we have explored even more values (k = 1, 3, 5) so as to determine if this enlargement would throw any kind of additional advantage. Sequential Minimal Optimization (SMO). SMO, invented by John Platt [19], is an iterative algorithm used for the solution of the optimization problems that appear when training Support Vector Machines (SVM). Basically, SMO divides the problem into a series of smaller subproblems which are analytically solved lately.
6 At this point, we have selected dierent kernels with these algorithms: a polynomial kernel, a normalized polynomial kernel, RBF and Pearson-VII. Simple Logistic. This algorithm is used to predict the result of a variable function of the independent variables, or predictor. The logistic regression formula is: 1 Y i = (1) 1 + e (β0+β1x1,i+...+β dx d,i ) Being Y i the classication to predict by the model, in our case it would be goodware or malware. The variable X is the vector with the opcodes generated for a specic application, nding that X d,i is the value assigned to an n-gram of opcodes in d enforcement position that is in row i. Parameters beta ast are determined by the algorithm in the training phase. So, having a downloaded and validate, using the VirusTotal platform 5, dataset as goodware, and secondly, using the Android Genome Project[13] dataset with malicious applications, it has been easy to label the purpose of each of the applications as goodware or malware to generate the training datasets used in the section 5. 5 Experimental validation Next step, we recollect the information stored in each of the apps. First of all, a total of 1,494 applications from Google Play 6 were downloaded. In particular we got them from the lists of "the most free downloaded", in "the most free downloaded in Spanish", applications that "have generated more revenue", those of "lifestyle" and those of "tools". The choice of these lists for obtaining samples was completely random to have a heterogeneous applications dataset from Google's store. To verify that the samples do not contain malicious code, we have used the online platform VirusTotal. This platform uses 43 antivirus engines to scan the sample that previously has been sent. This analysis returns the total number of engines that have been detected as malicious and malware is for that engine. Since the experiment is desired to have all possible clean malware samples, it was decided that if only one antivirus engine detected it sample as malicious, it would be separated from the dataset. The ones that were detected as adware 7, were also separated from that dataset. After the analysis with VirusTotal, we found that we had a 14.59% of applications considered malware or adware in our dataset. Specically there were 218 apps which, according to the metrics we have said before, could not be within the dataset itself benign code. Given that for the dataset of malicious code we used the one provided by the Android Genome Project, which consists of 1, Adware is a type of action hidden in applications, which send targeted advertisements to our device when you run an application
7 samples, so our dataset benign still had to be reduced in 17 more samples to be balanced. These samples were selected randomly from the total, leaving the nal data set with 2,518 samples applications, half benign and half malignant. For the information of the samples was done using the Dedexer tool, a disassembler for.dex les that is on the Android platform. Using this tool, we obtained the operational codes of the dierent samples analyzed. These codes are the minimum operating instructions means the Dalvik virtual machine to run applications on it. Figure 1 we see a sequence extracted from an Android application opcodes. Fig. 1. Example of operational codes with variables used in an Android application. const-string v0, acursor invoke-interface v3, v0, <ref Map.get(ref) _def_map_get@ll> move-result-object v0 check-cast v0, <t: String> const-string v1, ahasmore invoke-interface v3, v1, <ref Map.get(ref) _def_map_get@ll> move-result-object v1 check-cast v1, <t: String> const-string v3, atrue_0 invoke-virtual v3, v1, <boolean String.equalsIgnoreCase(ref) _def_string_equalsignorecase@zl> move-result v7 invoke-virtual this, v0, <boolean KiwiPurchaseUpdatesCommandTask.isNullOrEmpty(ref) _def_kiwipurchaseupdatescommandtask_isnullorempty@zl> move-result v1 if-eqz v1, loc_c1ad2 sget-object v6, Oset_BEGINNING With all the samples we have disassembled, we generated opcode les ranging from 10 KB to 32 MB for the goodware and from 5 KB to 20 MB in the case of malware. In these les we can meet the dierent operational codes used by both benign and malicious applications. From here there has been a Ar le for use with the Weka tool. For each of them, an experiment to demostrate its validity as predictors using dierent classiers detailed in section 4. Thus, this section will detail the results obtained when evaluating the dierent opcodes. This evaluation will be performed according to the following parameters, usually employed to compare the performance of dierent algorithms in the eld of machine learning: True Positive Ratio (T P R),which is calculated by dividing the number of bening apps correctly classied (T P ) between the total samples taken (T P + F N). T P T P R = (2) (T P + F N)
8 Table 1. Performance of classiers when analyzed correctly categorize applications using opcodes. Classier TPR FPR Precision (%) ROC IBk 1 0, , ,761 0,97198 IBk 3 0, , ,727 0,97924 IBk 5 0, , ,495 0,98102 Simple Logistic 0, , ,379 0,97839 NaiveBayes 0, , ,561 0,88461 BayesNet K2 0, , ,006 0,87881 BayesNet TAN 0, , ,181 0,94768 SMO PolyKernel 0, , ,022 0,94039 SMO Norm. PolyKernel 0, , ,002 0,93078 J48 0, , ,071 0,94434 RandomTree 0, , ,749 0,92654 RandomForest I=10 0, , ,953 0,98879 RandomForest I=50 0, , ,322 0,99208 RandomForest I=100 0, ,032 96,758 0,99255 False Positive Ratio (F P R),which is calculated by dividing the number of samples corresponding to malicious app whose classication (F P ) were missed by the total number of samples (F P + T N). F P R = F P (F P + T N) (3) Precisión (P ), which is calculated by dividing the total hits by the total number of instances in the dataset. P = T P + T N T P + F P + T N + F N Area Under ROC Curve (AU C) [14], that establishes the relationship amongst the false negatives and the false positives. The ROC Curve it is usually used to generate statistics that represent the performance or the eectiveness in a wider sense of a classier. As such, the results are displayed in Table 1. The best results have been obtained for the Random Forest classier with the number of trees equal to 100, with a classication accuracy of % and value of the area under the ROC curve of In contrast, it has been observed that classiers that have worked have been worse Naive Bayes (with an overall accuracy of % and a value of AURC of ) and BayesNet with K2 algorithm (with a total accuracy of % and a value of AURC of ). In Table 2 we can see the results of classication of malicious applications using application permissions on our dataset, conducted by Sanz et al. [4]. In this table we can see one of the best classiers using application permissions is (4)
9 Table 2. Performance of classiers when analyzed correctly categorize applications using permissions. Classier TPR FPR Precision (%) ROC IBk 1 0, , ,103 0,98597 IBk 3 0, , ,717 0,98737 IBk 5 0, , ,127 0,98709 Simple Logistic 0, , ,288 0,9894 NaiveBayes 0, , ,854 0,95679 BayesNet K2 0, , ,463 0,96868 BayesNet TAN 0, , ,378 0,98233 SMO PolyKernel 0, , ,221 0,95329 SMO Norm. PolyKernel 0,9745 0, ,751 0,96009 J48 0, , ,686 0,96116 RandomTree 0, , ,824 0,95412 RandomForest I=10 0,961 0, ,189 0,99148 RandomForest I=50 0, , ,511 0,99361 RandomForest I=100 0, , ,423 0,99382 Random Forest with number of trees equal to 100, wich is taking the area under the ROC curve of and an accuracy of %. We can also see that one of the worst is SMO classiers using Normalized PolyKernel with an area under the ROC curve of and an accuracy of %. Comparing the two methods it can be seen that the two approaches follow the same trend roughly in values obtained. On the one hand we can see that the area under the ROC curve values tend to be 95% or higher in most of the classiers. On the other hand if you consider that the best classier for both approaches is the Random Forest Tree number Conclusions and future work The opcodes are instructions performed by an application. All Apps must use these codes to perform the activities for which they are scheduled. In this paper we use the operational codes with machine learning techniques for classication of malicious apps on the Android operating system, with a comparison with application permissions. To validate our approach, we collected 1,259 samples totally benign applications from Google Play and 1,259 Android malware samples obtained from the Android Genome Project. After that, the operational codes were extracted and models were created to evaluate each conguration of classiers by the area under the ROC curve. As the application permissions are very quick to get, operational codes require more computation time, thus penalizing the performance of the approach. In
10 contrast, nowadays there are no longer unique permissions as before, but there are groups of permissions, making this approach not so optimal. Therefore, today you can use this method as one of the rst approaches to malware detection on Android. As future work, the analysis of 639 samples containing the opcode RSUB_INT is proposed. By doing this we would be able to see why this opcode only appears in benign samples. This study may result in the generation of a pattern. This would be used to detect unknown malware faster and it could also limit the amount of possible malicious codes in many applications. References 1. Waters, D.: Google bets on Android future (February 2008) co.uk/2/hi/technology/ stm. 2. Enck, W., Octeau, D., McDaniel, P., Chaudhuri, S.: A study of android application security. In: USENIX Security Symposium. (2011) 3. Fragkaki, E., Bauer, L., Jia, L., Swasey, D.: Modeling and enhancing android's permission system. In: Computer SecurityESORICS Springer (2012) Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Bringas, P.G., Álvarez, G.: Puma: Permission usage to detect malware in android. In: International Joint Conference CISIS'12-ICEUTE 12-SOCO 12 Special Sessions, Springer (2013) Shin, W., Kiyomoto, S., Fukushima, K., Tanaka, T.: Towards formal analysis of the permission-based security model for android. In: Wireless and Mobile Communications, ICWMC'09. Fifth International Conference on, IEEE (2009) Shin, W., Kiyomoto, S., Fukushima, K., Tanaka, T.: A formal model to analyze the permission authorization and enforcement in the android framework. In: Social Computing (SocialCom), 2010 IEEE Second International Conference on, IEEE (2010) Jacoby, G.A., Davis IV, N.J.: Battery-based intrusion detection. In: Global Telecommunications Conference, GLOBECOM'04. IEEE. Volume 4., IEEE (2004) Buennemeyer, T.K., Nelson, T.M., Clagett, L.M., Dunning, J.P., Marchany, R.C., Tront, J.G.: Mobile device proling and intrusion detection using smart batteries. In: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual, IEEE (2008) Schmidt, A.D., Bye, R., Schmidt, H.G., Clausen, J., Kiraz, O., Yuksel, K.A., Camtepe, S.A., Albayrak, S.: Static analysis of executables for collaborative malware detection on android. In: Communications, ICC'09. IEEE International Conference on, IEEE (2009) Shabtai, A., Fledel, Y., Elovici, Y.: Automated static code analysis for classifying android applications using machine learning. In: Computational Intelligence and Security (CIS), 2010 International Conference on, IEEE (2010) Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of the second ACM conference on Data and Application Security and Privacy, ACM (2012)
11 12. Santos, I., Brezo, F., Sanz, B., Laorden, C., Bringas, P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET information security 5(4) (2011) Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Security and Privacy (SP), 2012 IEEE Symposium on, IEEE (2012) Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. International Journal of Computer Applications in Technology (2009) 35(2) (2009) Breiman, L.: Random forests. Machine Learning 45 (2001) /A: Quinlan, J.: C4. 5: programs for machine learning. Morgan kaufmann (1993) 17. Salzberg, S.L.: C4.5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., Machine Learning 16 (1994) /BF Jiang, L., Wang, D., Cai, Z., Yan, X.: Survey of improving naive bayes for classication. In Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O., eds.: Advanced Data Mining and Applications. Volume 4632 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2007) / _ Platt, J.C.: Sequential minimal optimization: A fast algorithm for training support vector machines (1998)
Index Terms: Smart phones, Malwares, security, permission violation, malware detection, mobile devices, Android, security
Permission Based Malware Detection Approach Using Naive Bayes Classifier Technique For Android Devices. Pranay Kshirsagar, Pramod Mali, Hrishikesh Bidwe. Department Of Information Technology G. S. Moze
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA
Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA [email protected] M. Chuah CSE Dept Lehigh University 19 Memorial
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 3, March 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Android Security - Common attack vectors
Institute of Computer Science 4 Communication and Distributed Systems Rheinische Friedrich-Wilhelms-Universität Bonn, Germany Lab Course: Selected Topics in Communication Management Android Security -
IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
Detection and Identification of Android Malware Based on Information Flow Monitoring
Detection and Identification of Android Malware Based on Information Flow Monitoring Radoniaina Andriatsimandefitra, Valérie Viet Triem Tong To cite this version: Radoniaina Andriatsimandefitra, Valérie
... Mobile App Reputation Services THE RADICATI GROUP, INC.
. The Radicati Group, Inc. 1900 Embarcadero Road, Suite 206 Palo Alto, CA 94303 Phone 650-322-8059 Fax 650-322-8061 http://www.radicati.com THE RADICATI GROUP, INC. Mobile App Reputation Services Understanding
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Detecting client-side e-banking fraud using a heuristic model
Detecting client-side e-banking fraud using a heuristic model Tim Timmermans [email protected] Jurgen Kloosterman [email protected] University of Amsterdam July 4, 2013 Tim Timmermans, Jurgen
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
Mobile App Reputation
Mobile App Reputation A Webroot Security Intelligence Service Timur Kovalev and Darren Niller April 2013 2012 Webroot Inc. All rights reserved. Contents Rise of the Malicious App Machine... 3 Webroot App
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware Cumhur Doruk Bozagac Bilkent University, Computer Science and Engineering Department, 06532 Ankara, Turkey
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
CSci 538 Articial Intelligence (Machine Learning and Data Analysis)
CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Twitter Content-based Spam Filtering
Twitter Content-based Spam Filtering Igor Santos, Igor Miñambres-Marcos, Carlos Laorden, Patxi Galán-García, Aitor Santamaría-Ibirika, and Pablo G. Bringas DeustoTech-Computing, Deusto Institute of Technology
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Football Match Winner Prediction
Football Match Winner Prediction Kushal Gevaria 1, Harshal Sanghavi 2, Saurabh Vaidya 3, Prof. Khushali Deulkar 4 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
The Droid Knight: a silent guardian for the Android kernel, hunting for rogue smartphone malware applications
Virus Bulletin 2013 The Droid Knight: a silent guardian for the Android kernel, hunting for rogue smartphone malware applications Next Generation Intelligent Networks Research Center (nexgin RC) http://wwwnexginrcorg/
Prediction of Software Development Modication Eort Enhanced by a Genetic Algorithm
Prediction of Software Development Modication Eort Enhanced by a Genetic Algorithm Gerg Balogh, Ádám Zoltán Végh, and Árpád Beszédes Department of Software Engineering University of Szeged, Szeged, Hungary
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Static Analysis of Executables for Collaborative Malware Detection on Android
Static of Executables for Collaborative Malware Detection on Android Aubrey-Derrick Schmidt, Rainer Bye, Hans-Gunther Schmidt, Jan Clausen, Osman Kiraz, Kamer Ali Yüksel, Seyit Ahmet Camtepe, and Sahin
Integration Misuse and Anomaly Detection Techniques on Distributed Sensors
Integration Misuse and Anomaly Detection Techniques on Distributed Sensors Shih-Yi Tu Chung-Huang Yang Kouichi Sakurai Graduate Institute of Information and Computer Education, National Kaohsiung Normal
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Mobile Advertising Security Risk Assessment Model Based on AHP
International Symposium on Computers & Informatics (ISCI 2015) Mobile Advertising Security Risk Assessment Model Based on AHP Jing Li 1, Hui Fei 2,ei Jin 1 1 Computer Network Emergency Response Technical
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
Detecting Internet Worms Using Data Mining Techniques
Detecting Internet Worms Using Data Mining Techniques Muazzam SIDDIQUI Morgan C. WANG Institute of Simulation & Training Department of Statistics and Actuarial Sciences University of Central Florida University
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation
Mobile Monetization Scenario Design & Big Data Arther Wu Senior Director of Monetization and Business Operation Agenda Quick update of Cheetah Mobile Ad Scenario Design Big Data / Relation with Advertising
A Practical Analysis of Smartphone Security*
A Practical Analysis of Smartphone Security* Woongryul Jeon 1, Jeeyeon Kim 1, Youngsook Lee 2, and Dongho Won 1,** 1 School of Information and Communication Engineering, Sungkyunkwan University, Korea
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Are free Android virus scanners any good?
Authors: Hendrik Pilz, Steffen Schindler Published: 10. November 2011 Version: 1.1 Copyright 2011 AV-TEST GmbH. All rights reserved. Postal address: Klewitzstr. 7, 39112 Magdeburg, Germany Phone +49 (0)
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Malware Detection in Android by Network Traffic Analysis
Malware Detection in Android by Network Traffic Analysis Mehedee Zaman, Tazrian Siddiqui, Mohammad Rakib Amin and Md. Shohrab Hossain Department of Computer Science and Engineering, Bangladesh University
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 1, March, 2013 ISSN: 2320-8791 www.ijreat.
Intrusion Detection in Cloud for Smart Phones Namitha Jacob Department of Information Technology, SRM University, Chennai, India Abstract The popularity of smart phone is increasing day to day and the
Detection of Spyware by Mining Executable Files
Master Thesis Computer Science Thesis no: MSC-2009:5 Detection of Spyware by Mining Executable Files Raja M. Khurram Shahzad Syed Imran Haider School of Computing Blekinge Institute of Technology Box 520
Malware Detection Module using Machine Learning Algorithms to Assist in Centralized Security in Enterprise Networks
Malware Detection Module using Machine Learning Algorithms to Assist in Centralized Security in Enterprise Networks Priyank Singhal Student, Computer Engineering Sardar Patel Institute of Technology University
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
Exploring Big Data in Social Networks
Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
Android Application Analyzer
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume. 1, Issue 4, August 2014, PP 32-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Android
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Efficient Security Alert Management System
Efficient Security Alert Management System Minoo Deljavan Anvary IT Department School of e-learning Shiraz University Shiraz, Fars, Iran Majid Ghonji Feshki Department of Computer Science Qzvin Branch,
Predicting Critical Problems from Execution Logs of a Large-Scale Software System
Predicting Critical Problems from Execution Logs of a Large-Scale Software System Árpád Beszédes, Lajos Jenő Fülöp and Tibor Gyimóthy Department of Software Engineering, University of Szeged Árpád tér
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
