Associative Classification Mining for Website Phishing Classification
|
|
|
- Jerome Barnett
- 10 years ago
- Views:
Transcription
1 Associative Classification Mining for Website Phishing Classification 1 Neda Abdelhamid, 1 Aladdin Ayesh, 2 Fadi Thabtah 1 Informatics Dept, De Montfort University, Leicester, LE1 9BH 1 p @my .dmu.ac.uk [email protected] 2 E-Business Dept, CUD, Dubai 2 [email protected] Abstract --Website phishing is one of the crucial research topics for the internet community due to the massive number of online daily transactions. The process of predicting the phishing activity for a website is a typical classification problem in data mining where different website s features such as URL length, prefix and suffix, IP address, etc., are used to discover concealed correlations (knowledge) among these features that are useful for decision makers. In this article, an Associative classification (AC) data mining algorithm that uses association rule methods to build classification systems (classifiers) is developed and applied on the important problem of phishing classification. The proposed algorithm employs a classifier building method that discovers vital rules that possibly can be utilised to detect phishing activity based on a number of significant website s features. Experimental results using the proposed algorithms and three other rule based algorithms on real legitimate and fake websites collected from different sources have been conducted. The results reveal that our algorithm is highly competitive in classifying websites if contrasted with the other rule based classification algorithms with respective to accuracy rate. Further, our algorithm normally extracts smaller classifiers than other AC algorithm because of its novel rule evaluation method which reduces overfitting. Keywords: Associative Classification, Data Mining, Phishing Detection, WEB Security 1. INTRODUCTION Associative classification in data mining is about constructing classification systems (classifier) from an input data called the training data set aiming to predict the class value of unseen data called test data set accurately [1]. One distinguishing feature of AC algorithms is their ability to discover new hidden knowledge and then extract them as simple If-Then rules. In the last decade, different research studies on AC mining have resulted in the disseminations of various algorithms including CBA [2], CMAR [3], LCA [4], ADA [5]and others. These studies have revealed that AC is able to construct more accurate classifiers than rule based classification data mining approaches including rule induction and decision tree. Nevertheless, the numbers of rules discovered by the AC algorithms are normally huge which therefore limits its applicability sometimes in business domains. One primary reason of the large numbers of rules resulting from these AC algorithms is inherited from association rule since all correlations among the attribute values and the class attribute are tested in the training phase and many rules are derived. One way to control the exponential growth in the number of rules is to develop rule filtering methods that minimise rules redundancy during building the classifier. Rule evaluation sometimes called filtering or pruning usually occurs during building the classifier in AC mining. So once the complete set of rules are found in the training phase and sorted based on certain conditions (e.g. rule s confidence, support, body length, etc), the AC algorithm has to decide the way it should choose a subset of effective rules to represent the classifier. There are different ways used in AC to choose the classifier s rules. For instance, CBA [6] utilises the database coverage rule where rules that cover correctly a certain number of training cases are marked as accurate rules and the remaining rules get discarded. Lazy AC algorithms like L3G algorithms employ lazy pruning that stores primary and secondary rules in the classifier. In this paper, we first treat the problem of generating large classifiers in AC by proposing a new rule evaluation method for removing useless and redundant rules during constructing the classifier. The new rule evaluation method is an enhancement of a current AC called Multiclass Associative Classification (MA) [7]. We have enhanced MAC rule pruning method and classification procedure in which rather than using one rule for prediction in the proposed algorithm we utilise group of rules prediction to enhance the accuracy rate. Further, in building the classifier we developed a rule evaluation method that increases the training coverage per rule in order to reduce the classifier size and thus end-user can control and understand the classifier easily. The proposed rule evaluation method ensures larger training data coverage per classifier rule by taking into account only the similarity of rule s body and the training case attribute values while building the classifier. Whereas other current AC algorithms like MCAR consider the class similarity between the candidate rule and the training data, and the attribute values in the candidate rule body and those belonging to the training data. The two enhancements have resulted in a new algorithm that we call Enhanced Multiclass Associative Classification (emac). So emac s rule evaluation method ensures less number of rules in the classifier. We show the applicability of emac on a crucial domain related to web security named website phishing classification that normally criticised of having dense data because of the correlations among the website s features. Phishing is considered a form of web-threats that is defined
2 as the art of impersonating a website of an honest enterprise aiming to acquire private information such as usernames, password s and social security numbers [8]. Phishing websites are created by dishonest people to impersonate a webpage of genuine websites. Almost these websites have high visual similarities to the legitimate ones in an attempt to defraud the innocent people. Some of these websites designed to be almost similar to the genuine ones. Social engineering and technical tricks are commonly combined together in order start a phishing attack [8]. Phishing websites has become a serious problem not only because of the increased number of those websites but also due to the smart strategies used to design such websites, and thus even those having a good experience in the computer and internet might be deceived. The process of detecting the type of website is a typical classification problem where different features like URL length, sub-domains, and adding prefix and suffix, etc, are utilised to learn important hidden knowledge among these features. This knowledge is in fact the classification system that in turn is used to automatically guess the phishing activities of the website when a user browses it. The phishing problem is considered a vital issue in.com industry especially e-banking and e-commerce taking the number of online transactions involving payments. This article deals with two problems 1) Improvement of current AC algorithms particularly the generation of a large number of rules by proposing a new method that reduces the number of rules discovered without drastically impacting the predictive accuracy of the classifiers. In other words, and during constructing the classifier, we would like to minimise the number of rules derived by an AC algorithm. This can help decision makers especially in understanding, controlling and maintaining the final set of rules primarily when making a prediction decision. 2) The applicability of AC mining on the website phishing problem to learn important hidden knowledge from the website s features correlations. These correlations will be extracted as If-Then rules in order to be used by end-user for the automatic classification of websites. A number of fake and legitimate websites collected from known sources like Phishtank ( and millersmiles ( in the experimintation section to evaluate the performance of the proposed algorithm. Further, emac and three other AC and rule based algorithms have been contrasted with respect to different performance measures like classificaiton accuracy and numberof rules. More details are given in Section4., This article is structured as follows: Section 2 presents the phishing problem and related definitions to AC in data mining. The proposed algorithm and its main steps are explained in Section 3. Section 4 is devoted to experimentations and finally conclusions are given in Section THE PHISHING PROBLEM AND ASSOCIATIVE CLASSIFICATION MINING Typically, a phishing attack starts by sending an that appears to be from an authentic organisation to victims urging them to update or validate their information by following a fake URL link within the body. remains the main spreading channel for phishing links since 65% of phishing attacks start by visiting a link received within an (Kaspersky Lab, 2013). Typically, two common approaches are used to detect phishing activities, i.e. blacklist and features methods [6]. In the black list approach the website URL is basically compared with those in the black list to identify whether it is legitimate or fake. On the other hand a more realistic approach which is based on extracting the website features and using a heuristic method to identify the phishing activities have been successfully utilised [9]. Unlike the blacklist approach, the features based approach distinguishes new created phishing in real-time [8]. The effectiveness of the features methods depends on selecting a set of significant features that could help in determining the phishy website [9]. Phishing detection for websites is a typical classification in data mining problem where the goal is to forecast the type of the website based on a number of features that can be stored in the training data set. For simplicity we can consider the website phishing detection a two class problem (binary classification) since the target class has only two possible values; Phishy or Legitimate. Once a webpage is loaded on the browser a set of features will be extracted from the webpage. Those features have an influence in determining the type of the webpage. Website features like IP address, long URL, https and SSL are examples of important features that are used for learning knowledge. An AC data mining model will learn from the websites features important knowledge (correlations between the features values and the class attribute) to classify the webpage as either Phishy or Legitimate. We start formulating the phishing detection problem in AC data mining with definitions given in [4]. Let T denote the domain of the training data containing phishing features and C be a list of classes. Each training data t T may be given a single class ck where ck C, and is represented as a pair (t, ck ) where ck is connected with the data instance t in the training data. Let H denote the set of classifiers for T C where each case t T is given a classes and the goal is to find a classifier h H that maximises the probability that h(t) = c for each test data. So, for the training data set T with m attributes A1, A2,, Am and C is a set of classes, Definition 1: An attribute value set (AttValSet) can be described as a set of disjoint attribute values contained in a training case, denoted < (A i1, a i1 ),, (A ik, a ik )>. Definition 2: A rule r is of the form < AttValSet, c>, where c C is the class. Definition 3: The actual occurrence (ActOccr) of r in T is the number of cases in T that match r s antecedent. Definition 4: The support count (SuppCount) of r is the number of cases in T that matches r s antecedent, and belong to a class c i. Definition 5: A rule r passes the user minimum support threshold (minsupp) if for r, the SuppCount(r)/ T minsupp, where T is the number of cases in T. Definition 6: A rule r passes the user minimum confidence threshold (minconf) if SuppCount(r)/ActOccr(r) minconf.
3 Generally, an AC algorithm operates in three main phases. Firstly, it discovers all frequent attribute values which hold enough supports. Once all frequent attribute values are found, then it transforms the subset of which hold enough confidence values into rules. In other words, the algorithm finds and extracts rules that pass user defined thresholds denoted by minimum support (minsupp), and minimum confidence (minconf). In the second phase, rule pruning operates where only rules with high quality (confidence and support values) are selected to represent the classifier. Lastly, the classifier is utilised to forecast the class values on new unseen data. 3. THE PROPOSED ALGORITHM The proposed algorithm utilises AC learning strategies to generate the rules. It comprises of three main steps: rules discovery, classifier building and class assignment procedure (prediction step). In the first step, it iterates over the input training data set in which the rules is found and extracted using minsupp and minconf thresholds. Then in the second step it tests the discovered rules on the training data set in order to select one subset to represent the classifier. The final step involves assigning classes to test data. The general description of the emac learning algorithm is depicted in Figure 1, and details are given in the next subsections. We assume that the input attributes are categorical or continuous attributes. For continuous attributes any discretisation measure is employed before the training phase. Missing values attributes will be treated as other existing values in the data set RULE DISCOVERY EMAC uses a training method that employs a simple intersection among ruleitems locations in the training data set (TIDs) to discover the rules. The TID of a ruleitem holds the row numbers that contain the attribute values and their corresponding class labels in the training data set. The proposed algorithm discovers the frequent ruleitem of size 1 (F1) after iterating over the training data set. Then, it intersects the TIDs of the disjoint ruleitems in F1 to discover the candidate ruleitems of size 2, and after determining F2 the possible remaining frequent ruleitems of size 3 are obtained from intersecting the TIDs of the disjoint ruleitems of F2, and so forth. The TIDs of a ruleitem comprises useful information that are utilised to Input: Training data D, minsupp and minconf thresholds locate values easily in the training data set especially in computing the support and confidence for rules. When frequent attribute values are identified, emac generates any of which as a rule when it passes the minconf threshold. Now, when an attribute value is connected with more than one class and became frequent, EMAC considers only the largest frequency class associated with the attribute value and ignores the other. In cases that the classes frequencies in the training data set when connected with the attribute value is similar the choice is random RULE RANKING METHOD There are several different rule ranking formulas containing different criteria considered by scholars in AC. For instance, CBA algorithm [2] and its successors consider the rule s confidence and support as main criteria for rule favouring, CMAR [3] and MCAR [4] algorithms add on top of that the rule s length and the majority class count respectively when rules having identical confidence and support. On the other hand, lazy AC algorithms [10] place specific rule first (rules with large number of attribute values in their body) since they claim these rules are often more accurate. Though, this approach has been criticised of ending up with very large classifiers that are hard to be maintained, understood and updated. We argue that the minority class frequency as a rule preference parameter should be employed rather than the majority class count as in MCAR when rules are having similar confidence, support and length. This is since the numbers of rules for the lower frequency class are normally smaller than that of the largest frequency class. Therefore, ranking rules with smaller frequency class higher gives them a better chance to survive during rule evaluation and be part of the classifier and resulting with more representation in the context of rules for each class with low frequency in the training data. We have favoured rules associated with less frequent class in rule ranking since such a class is not well represented by rules in the classifier and usually has less number of rules CLASSIFIER CONSTRUCTION After rules are sorted a subset of which gets chosen to comprise the classifier. The classifier is built by emac as follows: For each training case emac iterates over the set of discovered rules and selects the first rule that matches the Output: A classifier that comprises rules Step One: Iterate over the training data set D with n columns to find all frequent ruleitems Convert any frequent ruleitem that passes minconf to a single label rule Sort the rules set according to Section Step Two: Evaluate the complete set of rules discovered in step (1) on the training data set in order to remove redundant rules or rules that have no training data coverage Step Three: Classify test cases Fig. 1. The proposed algorithm
4 training case as a classifier rule. The same process is repeated until all training cases are utilised or all candidate rules have been evaluated. In cases when the training data has any uncovered data the default class rule will be formed. This rules will represents the majority class in remaining uncovered training data. Finally, emac outputs all marked rules to form the classifier. The remaining unmarked rules are discarded by the proposed algorithm since some higher ranked rules have covered their training cases during building the classifier and therefore these unmarked rules become redundant and useless. The rule pruning of the proposed algorithm differs from other pruning procedure in AC such as CBA, CMAR, and CPAR in that it does not require the similarity of the class of both the evaluated rule and the training case as a condition of rule significance rather it only considers the matching between the rule body and the training case. This reduces overfitting the training data set since most of current AC algorithms mark the candidate rule as a classifier rule if its body matches the training case and has the same class as the training case. This may result in more accurate prediction on the training data set but not necessarily on new unseen test cases. We argue that the similarity test between the candidate rule class and the training case class has limited effect on the predictive power of the resulting classifiers during the prediction step. Lastly, one obvious advantage of the proposed rule evaluation method is that it ensures more data coverage per rule which consequently often leads to less number of rules in the classifier. This means end-user can control the classifier and understand it easily CLASSIFICATION OF TEST DATA When a test case is about to classify, the prediction procedure of the EMAC algorithm works as follow: It iterates over the set of the rules stored in the classifier, it highlights all rules that are contained in the test data (the rule s body matches some attribute values in the test data). If only one rule is applicable to the test data then the class of that rule is assigned to the test data. In cases where multiple rules are applicable to the test data, the algorithm categorises these rules into groups according to their classes, and counts the number of rules in each group. The class belonging to the group that has the largest number of rules gets assigned to the test data. In case that more than one group having the same number of rules, then the choice will be random. This method which utilises more than one rule to make the class assignment of test data have improved upon single rule prediction procedures such as that of CBA and MCAR that takes the class of the highest ranked rule in the classifier matching the test data to make URL Anchor Request URL the prediction decision. Lastly, in cases when no rules in the classifier are applicable to the test case, the default class (Majority class in the training dataset) will be assigned to that case. Table 1 Sample of the websites features data URL Prefix Sub Subdomain HTTPs Length Suffix IP Domain 4. EXPERIMENTAL RESULTS 4.1. DATA AND PHISHING FEATURES We have investigated a large number of different features contributing in the classification of the type of the websites that have been proposed in [8]. We selected nine effective features among them after applying Chi-square feature selection metric in WEKA against 1228 different websites. The dataset utilised in the experiments consists of 547 and 681 legitimate and fake websites respectively. It has been collected from yahoo directory ( starting point directory ( Phishtank ( and Millersmiles archives ( Seven samples of the websites features data is shown in Table 1 where the class is either 1 (legitimate) or 0 (phishy). The -1 value in the below table denotes Suspicious which can go either phishy or legitimate so the end-user is unsure about the feature s value. The features that we consider are described below, 1. Using IP address: Using IP address in the hostname part of the URL address means user can almost be sure someone is trying to steal his personal information. 2. Long URL: Phishers resort to hide the suspicious part of the URL, which may redirect the information submitted by the users or redirect the uploaded page to a suspicious domain. 3. Adding Prefix and Suffix to URL: Phishers try to deceive users by reshaping the URL to look like legitimate one. A technique used to do so is by adding prefix or suffix to the legitimate URL thus the user may not notice any difference. 4. Sub-domain(s) in URL: Another technique used by the phishers to deceive the users is by adding subdomain(s) to the URL thus the users may believe that they are dealing with a credited website. 5. Misuse of HTTPs protocol: The existence of the HTTPs protocol every time sensitive information is being transferred reveals that the user certainly connected with an honest website. However, phishers may use a fake HTTPs protocol so that the users may be deceived. 6. Request URL: A webpage consists of a text and Domain age Class
5 some objects such as images and videos. Typically, these objects are loaded to the webpage from the same domain where the webpage exists. If the objects are loaded from a domain different from the domain typed in the URL address bar the webpage is potentially suspicious. 7. URL of Anchor: Similar to Request URL but for this feature the links within the webpage might refer to a domain different from the domain typed on the URL address bar. This feature is treated exactly as Request URL. 8. Website Traffic: Legitimate websites having high web traffic since they are visited regularly. Phishing websites often have short life thus their web traffic is either does not exist or its rank is less than the limit that gives it the legitimate status. 9. Age of Domain: The website is considered Legitimate if the domain aged more than 2 years. Otherwise, the website is considered Phishy EXPERIMENTS RESULTS Ten-fold cross-validation was utilised to evaluate the classification models and to produce error rates in the experiments. Four dissimilar rule based classification algorithms which utilise a variety of rule learning methodologies have been considered for contrasting purposes with EMAC. These algorithms are CBA [2], PRISM [11], PART [11], and MCAR [4]. Our selection of the above classification algorithms is because firstly all these algorithms generate rules in the form of If-Then rules for fair comparison. Secondly, the chosen algorithms use different learning methodologies in discovering and producing the rules. The learning strategy exploited by CBA is based on Apriori association rule technique where frequent ruleitems are produced iteratively based on the minsupp threshold inputted by the end-user. On the other hand, MCAR uses vertical mining methodology to discover the rules. Mainly, it utilises ruleitem s locations in the training data set (tidlist) to perform tid-list intersections to compute the ruleitems s support and confidence which in turn are used to decide whether the ruleitem is a rule. Finally, PRISM is a covering algorithm that divides the data set into parts according to the available class labels and produces all rules for each class. For each class, it is starts with an empty rule and adds the highest expected accuracy for each possible attribute value. It stops adding attribute values to the rule body when the candidate rule expected accuracy reaches 100% and at that point it generates the rules and removes all training data covered by the rules from the training data set. The algorithm repeats the same step until the data belonging to the selected class gets empty. Once this happens PRISM begins generating rules for another class and so forth. When the data in all parts are covered PRISM merges all rules derived for all class labels and forms the classifier. Lastly, PART algorithm is a combination of decision tree and rule induction algorithm that constructs partial decision trees. The experiments were conducted on an I3 machine with 2.0 Ghz. The experiments of PRISM were carried out in Weka software [11]. For the AC algorithms (CBA, MCAR), CBA source code has been obtained from its prospective authors and (EMAC, MCAR) were implemented in Java. Several researchers in AC, i.e. [2] [3] [4], have revealed that the minsupp threshold usually controls the numbers of rules generated. Thus, we have followed them in setting the support threshold to 1%-5% in the experiments of CBA, MCAR and the proposed algorithm. The confidence threshold, however, has less impact on the general performance of AC algorithms and we set it to 50% for CBA, MCAR and EMAC. Figure 2 displays the classification accuracy of the compared algorithm on the nine phishing detection data set. It is obvious from the figure that the proposed algorithm is highly effective in predictive power when contrasted with other AC algorithms as well as rule based ones. Precisely, EMAC has outperformed PRISM and PART by 7.77% and 0.93% respectively. MCAR algorithm has slightly outperformed the proposed algorithm on the selected nine features data set by 0.21%. Though as we will see shortly, MCAR have produced 56 more rules in the classifier than EMAC, which is approximately 38% larger classifier to accomplish just 0.21% higher accuracy. We believe that there should be a trade-off between the number of rules produced and classification accuracy where one can accept smaller classifier in the exchange with slightly lower accuracy. One possible reason for the slight increase in the accuracy for MCAR over the proposed algorithm is the way it builds the classifier. In particular, MCAR evaluates each candidate rule derived in the learning phase on the training data set in which a rule is considered significant if it covers correctly at least one training data instance MAC MCAR PRISM PART Fig. 2 The classification accuracy (%) for the contrasted algorithms derived from the phishing data
6 The coverage requires that: 1) The candidate rule body (attribute values) must be contained within the training instance 2) The class of the candidate rule and that of the training instance are similar This rule evaluation process limits the data coverage per rule since the above tow conditions must be true in order to consider the rule to be part of the classifier. Alternatively, EMAC inserts the candidate rule into the classifier if only the first condition above is true relaxing the second condition (class similarity). This normally reduces overfitting by allowing the rule to cover larger portion of training cases, which shows the smaller classifiers produced by EMAC if compared to MCAR. Figure 3 depicts the number of rules generated by the contrasted algorithms on the data set we consider in which it clearly shows that the proposed algorithm extracts smaller classifiers than MCAR and PRISM. advantage of the simplicity of a data mining approach called associative classification that extracts simple yet effective classifiers containing easy to understand chunk of knowledge to solve the website phishing detection. Since phishing features are often correlated, we propose an algorithm that reduces the number of rules by using a novel evaluation method which cuts down the number of rules approximately by 38% if contrasted with other AC algorithms like MCAR and without effecting classification accuracy. The new algorithm has been compared with one AC and two rule based classification algorithms with respect to accuracy rate and classifier size on real websites data set. The data size is 1228 websites and it consists of nine significant features that have been collected from different online sources such as Phishtank and Yahoo directory. The features have been chosen after applying Chi-Square testing measure on larger numbers of features set. After experimentations, the results showed that the proposed algorithm scales well if compared to MCAR, MAC MCAR PRISM PART Fig. 3 The classifier size of MCAR and EMAC derived from the phishing data The main reason for the fewer number of rules in EMAC classifier if compared to MCAR is due to the way EMAC constructs the classifier in which it considers the candidate rule part of the classifier when only its body is within the training instance and thus no class check is performed by EMAC. This usually ends up of having the candidate rule covers large number of training instances and therefore several redundant rules will be discarded. In other words, some lower ranked rules will end up having no training data coverage and therefore they will be deleted. PRISM covering algorithm generates the largest classifier since it has no rule pruning at all. As a matter fact PRISM keeps producing rules per class labels as long as there are training instances exist which explains its very large classifiers. On the other hand, PART algorithm utilises rule induction and decision tree pruning heuristics to cut down the possible numbers of rules. To be more precise, it employs information gain approach from decision tree to build partial trees and then a pessimistic error and reduced error pruning methods are applied to remove candidate rules. This explains its small size classifier. Overall, AC algorithms such as MCAR and the proposed algorithm normally extract additional knowledge missed by classic rule based algorithms and thus they end up with more rules in the classifiers. 5. CONCLUSIONS Phishing detection is a vital problem in the online community due to the massive numbers of online transactions performed by users. In this paper, we take PART and PRISM. Specifically, our algorithm has higher accuracy by 7.77% and 0.93% than PRISM and PART respectively. MCAR has slightly outperformed our algorithm by 0.21% yet derived 56 additional rules in the classifier. In near future we intend to plug our algorithm in a browser to determine on the fly the phishing activity and alert users. 6. REFERENCES [1] F Thabtah, Q Mahmood, L McCluskey, and H Abdeljaber, "A new Classification based on Association Algorithm," Journal of Information and Knowledge Management, vol. 9, no. 1, pp , [2] B Liu, W Hsu, and Y Ma, "Integrating Classification and Association Rule Mining," in Knowledge Discovery and Data mining (KDD), 1998, pp [3] W Li, J Han, and J Pei, "CMAR: Accurate and efficient classification based on multiple-class association rule," in Proceedings of the ICDM 01, San Jose, CA., 2001, pp [4] F Thabtah, C Peter, and Y Peng, "MCAR: Multi-class Classification based on Association Rule," in The 3rd ACS/IEEE International Conference on Computer Systems and Applications, 2005, p. 33. [5] X Wang, K Yue, W Niu, and Z Shi, "An approach for adaptive associative classification," Expert Systems with Applications: An International Journal, vol. 38, no. 9, pp , 2011.
7 [6] W Liu, X Deng, G Huang, and A Y. Fu, "An Antiphishing Strategy Based on Visual Similarity Assessment," in IEEE Educational Activities Department Piscataway, NJ, USA, 2006, pp [7] N Abdelhamid, A Ayesh, F Thabtah, S Ahmadi, and W Hadi, "MAC: A multiclass associative classification algorithm," Journal of Information and Knowledge Management (JIKM), pp , [8] R M Mohammad, F Thabtah, and L McCluskey, "An Assessment of Features Related to Phishing Websites using an Automated Technique," in The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012), London, [9] M Aburrous, M A Hossain, K Dahal, and F Thabtah, "Intelligent phishing detection system for e-banking using fuzzy data mining," Expert Systems with Applications: An International Journal, pp , December [10] E Baralis, S Chiusano, and P Graza, "support thresholds in associative classification," in Proceedings of the 2004 ACM Symposium on Applied Computing, Nicosia, Cyprus., 2004, pp [11] E Frank and I Witten, "Generating accurate rule sets without global optimisation," in Proceedings of the Fifteenth International Conference on Machine Learning, Madison, Wisconsin., pp
Keywords Anti-Phishing, Phishing, MapReduce, Hadoop, Machine learning
Volume 3, Issue 6, June 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Phishing Detection
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
Scoring the Data Using Association Rules
Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg
A Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
A Hybrid Approach to Detect Zero Day Phishing Websites
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1761-1770 International Research Publications House http://www. irphouse.com A Hybrid Approach
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Evaluating Data Mining Models: A Pattern Language
Evaluating Data Mining Models: A Pattern Language Jerffeson Souza Stan Matwin Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa K1N 6N5, Canada {jsouza,stan,nat}@site.uottawa.ca
Using Associative Classifiers for Predictive Analysis in Health Care Data Mining
Using Associative Classifiers for Predictive Analysis in Health Care Data Mining Sunita Soni Associate Professor Bhilai Institute of Technology, Durg-491 001, Chhattisgarh, India O.P.Vyas Professor Indian
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
Bisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
The Devil is Phishing: Rethinking Web Single Sign On Systems Security. Chuan Yue USENIX Workshop on Large Scale Exploits
The Devil is Phishing: Rethinking Web Single Sign On Systems Security Chuan Yue USENIX Workshop on Large Scale Exploits and Emergent Threats (LEET 2013) Web Single Sign On (SSO) systems Sign in multiple
Association Rule Mining
Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Index Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Preprocessing Web Logs for Web Intrusion Detection
Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Business Lead Generation for Online Real Estate Services: A Case Study
Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University
Supply chain management by means of FLM-rules
Supply chain management by means of FLM-rules Nicolas Le Normand, Julien Boissière, Nicolas Méger, Lionel Valet LISTIC Laboratory - Polytech Savoie Université de Savoie B.P. 80439 F-74944 Annecy-Le-Vieux,
An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks
2011 International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks Reyhaneh
Ensemble of Classifiers Based on Association Rule Mining
Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer
A Novel Distributed Denial of Service (DDoS) Attacks Discriminating Detection in Flash Crowds
International Journal of Research Studies in Science, Engineering and Technology Volume 1, Issue 9, December 2014, PP 139-143 ISSN 2349-4751 (Print) & ISSN 2349-476X (Online) A Novel Distributed Denial
Building an Iris Plant Data Classifier Using Neural Network Associative Classification
Building an Iris Plant Data Classifier Using Neural Network Associative Classification Ms.Prachitee Shekhawat 1, Prof. Sheetal S. Dhande 2 1,2 Sipna s College of Engineering and Technology, Amravati, Maharashtra,
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Web Usage Association Rule Mining System
Interdisciplinary Journal of Information, Knowledge, and Management Volume 6, 2011 Web Usage Association Rule Mining System Maja Dimitrijević The Advanced School of Technology, Novi Sad, Serbia [email protected]
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring
714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: [email protected]
A Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Anti-Phishing Best Practices for ISPs and Mailbox Providers
Anti-Phishing Best Practices for ISPs and Mailbox Providers Version 2.01, June 2015 A document jointly produced by the Messaging, Malware and Mobile Anti-Abuse Working Group (M 3 AAWG) and the Anti-Phishing
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
An Empirical Study of Application of Data Mining Techniques in Library System
An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Network Intrusion Detection Using a HNB Binary Classifier
2015 17th UKSIM-AMSS International Conference on Modelling and Simulation Network Intrusion Detection Using a HNB Binary Classifier Levent Koc and Alan D. Carswell Center for Security Studies, University
Prediction of DDoS Attack Scheme
Chapter 5 Prediction of DDoS Attack Scheme Distributed denial of service attack can be launched by malicious nodes participating in the attack, exploit the lack of entry point in a wireless network, and
SPMF: a Java Open-Source Pattern Mining Library
Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger [email protected] Department
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
WEB ATTACKS AND COUNTERMEASURES
WEB ATTACKS AND COUNTERMEASURES February 2008 The Government of the Hong Kong Special Administrative Region The contents of this document remain the property of, and may not be reproduced in whole or in
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Copyright 2012 Trend Micro Incorporated. All rights reserved.
Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice. Before installing and using the software, please review the readme files,
Novell ZENworks Asset Management 7.5
Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 USING THE WEB CONSOLE Table Of Contents Getting Started with ZENworks Asset Management Web Console... 1 How to Get Started...
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Removing Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING
A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor
