THE increasing of Internet resources brings up the problem
|
|
- Mark Gibson
- 8 years ago
- Views:
Transcription
1 User Interest Analysis in Web Filtering A-Ning DU and Bin-Xing FANG Abstract Web filtering can help people find the most interesting and valuable information. However, current web filtering techniques can not retrieve results which accurately represent the user interest. This paper investigated the user interest in web filtering and analyzed the problems of current machine learning base web filter. According to the difference of user interest, the task of web filtering is divided into three levels: relativityfilter,similarity-filter and homology-filter. And Biased Support Vector Machine(BSVM) is used to make the filter adaptable according to the difference of user interest. Experiments show that BSVM can greatly improve the web filtering performance. Index Terms Web Filtering, User Interest, Biased Support Vector Machine. I. INTRODUCTION THE increasing of Internet resources brings up the problem of information overload, quality enhancement, which means that people want to read the most interesting messages, and avoid having to read low-quality or uninteresting messages. Web filtering is the activity of classifying a stream of incoming web pages dispatched in an asynchronous way by an information producer to an information consumer[1], which helps people find the most interesting and valuable information and saves Internet users from drowned by the flood of incoming information. Recent years, the machine learning (ML) paradigm[2], instead of knowledge engineering and domain experts, becomes more popular in solving the above problem, because of its automatically-learning and relativity-analysis abilities. However, these ML algorithms are insufficiently accurate and do not adapt well to the ever-changing user interest/approprateness of the web document to the user. For example, distinguishing Pornography from SexEd may be less easy, and distinguishing Pornography from Erotica is even harder, since the border is extremely subjective. This paper studies how to adjust the web filtering results to be more fit for the user interest. Based on the careful study of the user interest, the web filtering result is divided into three scopes of relativity, similarity and homology, which help describe the user interest more accurately. To achieve more precisely the filtering result, the inductive process is improved so that it can get better precision and recall ability according to the user interest. The improved machine learning algorithm in this paper is based on the Support Vector Machine (SVM) algorithm because that of all the generic machine learning algorithms (Decision Tree, Rule Induction, Bayesian algorithm and SVM), SVM algorithm has shown to be superior to other machine learning algorithms with the solid foundation of Statistical Learning Theory (SLT). The improved A-Ning DU and Bin-Xing FANG are with the Research Center of Computer Network and Information Security Technology, Harbin Institute of Technology, People s Republic of China. algorithm is called Biased Support Vector Machine (BSVM), which imports a stimulant function, uses training examples distribution n + /n and a user-adaptable parameter k to deal imbalancedly different classes of the pre-assigned pages so as to adjust the filtering result to be best fit for the user interest. The remainder of the paper is organized as follows: Section 2 introduces web filtering, analyzes the user interest and corresponding difference in filtering result, and discusses the failure of current machine learning approaches. Section 3 puts forward the model of Biased Support Vector Machine, and analyzes its efficiency in web filtering. Section 4 closes the paper with our conclusions and future work. A. Web Filtering II. WEB FILTERING AND USER INTEREST Web filtering is the task of assigning a boolean value to each web page vector d i D, where D is a domain of web pages. A value of TRUE assigned to d i indicates a decision to page d i relative to the user interest, while FALSE indicates not. More formally, the task is to approximate the unknown target function Ψ : D {T RU E, F ALSE} (which describes how web pages ought to be assigned) by means of a function Φ : D {T RU E, F ALSE} called the filter. How to improve the precision and recall of the filter Φ are the core problem of web filtering. The general process of web filtering includes five steps: 1) user interest acquiring: acquire many user-assigned web pages as training set 2) web pages pre-processing: translate the assigned pages into a set of compact representations of page content. Usually a page d i is represented as a vector of term weights d i = {w 1i,w 2i,,w F i }, where F is the set of features that occur at least once in at least one document of D, and 0 < w ki < 1 represents how much feature f k contributes to the semantics of page d i 3) dimensionality reduction: select feature of high contribution to reduce the size of feature set F 4) construction of web filters: build a filter to describe user interest automatically 5) predict unfiltered web pages: use the filter to predict an unmarked web page is relative or not Representation of web pages is the basic step of the process, while the degree of dimensionality reduction is the key infecting factor. And what decides the effectiveness of web filters is that the generalization and description ability of web filtering algorithm. Current implementations of web filtering mainly use four techniques of URL blocking, keyword filtering, rating systems, and intelligent content analysis. URL blocking restricts or allows access by comparing the requested web page s URL 588
2 (and equivalent IP address) with URLs in a stored list. The advantages are speed and efficiency, while this approach requires a URL list, and it is quite costly to generate and maintain the list. Keyword filtering blocks access to web site on the basis of the occurrence of offensive words and phrases on those sites. However, many web sites that do not contain objectionable content will be blocked. Rating systems let web publishers associate labels or metadata with web pages to limit certain web content to target audiences. while in general this approach can not provide a reliable source of information. Intelligent content analysis system can automatically classify web content by use of ML algorithms, such learning and adaptation programs can help give semantic meaning to context-dependent words, and thereby they are the dominate approaches used in web filtering. Almost all existing filtering software use URL blocking, while some also provide rating and keyword option. Performance of a filtering system can be measured in terms of blocking rate which is the percentage of the correctly blocked Web pages, and overblocking rate which is the percentage of legitimate pages that are blocked. The Netprotect project evaluated 50 commercially available filtering systems using 2,794 URLs with pornographic content and 1,655 URLs with normal content [3]. Their results reproduced in Table I show that the accuracy of existing systems is far from satisfactory. TABLE I NETPROTECT S EVALUATION FOR WEB FILTERING TOOLS[3] Filtering Tools Blocking Efficiencies Overblocking Rate BizGuard 55% 10% Cyber Patrol 52% 2% CYBER sitter 46% 3% CYBER Snoop 65% 23% Internet Watcher % 0% Net Nanny 20% 5% Norton Internet Security 45% 6% Optenet 79% 25% SurfMonkey 65% 11% X-Stop 65% 4% B. Analysis of User Interest In practical web filtering applications, the web pages set related to user interest may be considerable large. However, what the user desired may be just several homologous pages. In order to show the difference of user interest, we first give some examples and analyze the true requirement of user. Example 1: Problems in Pornographic Pages Filtering Nowadays, Internet has been becoming an important source of information. However it is also host to pornographic, violent contents and others that are inappropriate for most viewers. Web filtering can be used to block access to pages that are against a defined policy. If a page contains a certain number of forbidden keywords, it is considered undesirable. The problem is that the meanings of words depend on the context. Different Page Subjects: For example, sites about breast cancer research, or sexual harassment, or even the home page of someone named Sexton, could be blocked as a forbidden page of Pornographic Class. Different Writer s Viewpoint: Articles on combating pornographic pages are harmless. Different Expression Orientation: The pornographic pages also contains many sub-classes such as gambling, nudity, violence, drugs, alcohol and so on. For example, Itzin[4] classified pornography into three sets: The sexually explicit and violent; the sexually explicit and nonviolent, but subordinating and dehumanizing; and the sexually explicit, nonviolent, and non-subordinating based upon mutuality. Research consistently shows that harmful effects are associated with the first two, but that the third is usually harmless. Example 2: Problems in Personal Information Filtering Information filtering deals with the delivery of information that is relevant to the user in a timely manner. An information filtering system assists users by filtering the data stream and delivering the relevant information to the user. The system selects the articles deemed to be interesting to the user and eliminates the rest. However, a filtering system might not be able to perfectly differentiate the articles that are actually relevant to the user from the ones that are not. The proportion of irrelevant articles delivered to the user should be as low as possible. The proportion of relevant articles eliminated should also be as low as possible. Different Page Subjects: An information filtering agent assists the user with the task of finding interesting news articles. While the articles may in a particular domain or many domains of academics, entertainment, migration, sports etc. Different Writer s Viewpoint: The user task of finding interesting news articles may only include articles supporting the event, or include all the articles about the event. Different Expression Orientation: For example, the user task of finding news articles about disaster may include articles about bailout, damnification etc, or only the articles about one aspect. As shown in the examples above, the user may be interested in portions of the web filtering result according to the difference of page subjects, writer s viewpoints and expression orientations. So we can divide web filtering tasks into three levels according to the user interest: relativity-filter: the filtering result contains all the web pages with the same key phrases or key sentences. These web pages express the same subject, but may be not consistent in viewpoint or orientation. Typical applications of relativity-filtering include erotic web pages filtering and hot topic tracing which expect to collect all the web pages related to the topic, regardless of approval or not. similarity-filter: the filtering result contains all the web pages that hold the same subject, viewpoint and orientation with the user. Typical applications of similarityfiltering include filtering of web pages on racialism or splittism. The similarity-filtering is more strict than relativity-filtering as not only key words or sentences but also orientation is taken into consideration. homology-filter: the filtering result contains only the web 589
3 pages with quite a lot of same sentences or paragraphs. The filtering results are almost the same as the user interest, and always this is because that the articles from the official or authoritative website are redistributed by other websites with little modification. An examples of homology-filtering is counting which article is the most reprinted one on the Bulletin Board Systems. We can define the all the filtering results acquired by ML algorithms as relative results(r 1 ) and the filtering results which the ML algorithms assign TRUE with probability near-to-1 as homologous results(r k ). So the results of similarity-filtering R i {R k R i R 1 }. As is illustrated in the left of Fig.1, most filtering tasks can be described as application of similarity-filtering with different similarity degrees between the web pages acquired and the user interest. User interest of high similkrity (Rk) User interest (Uk) Adkptkble filtering result (Ri) Generkl Filtering Result (R1) Internet (U) Fig. 1. Analysis and demonstration of filtering result estimation. Outside the biggest circles means filtering scope U, the smallest circle means user interest U k, the biggest circle R 1 is the filtering result of general ML algorithms as content relativity, the smaller one R k is the filtering result as content homology. The middle circle R i means the biased filtering result according to user demand as content similarity. C. Current Machine Learning Approaches and The Failure Web filtering by ML techniques is widely discussed in the literature. A few major ML algorithms are often chosen to construct web filter because of their simplicity, flexibility and robustness: Decision Trees is a ML approach to automatic induction of filtering trees based on training data[5], [6], [7]. It is a graph of nodes connected by arcs with each internal node corresponding to a feature and each arc to a possible value of that feature. Decision tree is easily interpretable by humans and has low computational complexity, which is a quite simple and practical idea in the field of ML. Rules Induction methods[8], [9] try to find a proper set of DNF rules for filtering task such that the error rate on training set is minimal. By use of local optimization techniques, rule induction methods dynamically evaluate rules and revise the covering rule set. K-Nearest neighbor (KNN)[10], [11] selects k most similar documents from the training set and uses the categories of these documents to determine categories of the document being classified. Documents are represented by vectors of words and the similarity between two documents is measured using Euclidean distance or other functions between these vectors. In [12], [13], [10], [14], [15], Naïve Bayes has been applied to web pages filtering. It uses the joint probabilities of words and categories to estimate the probabilities of categories given a document. Documents with a probabilities above a certain threshold are considered relevant. Lee et al.[2] applied Artificial Neural Networks to identify members of the forbidden class, which learns patterns by modifying the weights among nodes based on learning examples. Support Vector Machines (SVM)[16], [17], [18] is also a major statistical method. SVM is a process of finding a surface which separates the positives from the negatives with the widest possible margin among all the surfaces in F - dimensional space. SVM acts well in dealing with large scale training set and it has no need of human and machine efforts in parameter tuning. As is compared in [19], [20], [21], SVM achieved the best performance on different filtering corpus with strong robustness and acceptable efficiency. While the precondition of Naïve Bayes that omitting the feature dependence reduces its web content analysis ability. Artificial Neural Networks is computationally expensive, and over-fitting problem of Decision Trees and Rule Induction occurring in the procedure of user interest description makes it not satisfied. However, as is shown above, web filters based on ML algorithms can not achieve satisfactory results. This is because that it is difficult to understand and express the true meaning of user interest. Current ML algorithms acquire the user interest only by analyzing the arrange modes of words and expressions in the training examples. They neglect much information hidden in the training set, such as the distribution of number of positive example and negative examples, the max distributing radius of positives, the max distributing radius of negatives, and so on. In fact, such hidden information is quite valuable to express what portions of the web filtering result the user may be interested in. As a result, this paper tries to import the ML algorithms the ability to analyze these information. The improved ML algorithm is based on SVM because of its strong robustness and acceptable efficiency. III. BIASED SUPPORT VECTOR MACHINE FOR WEB FILTERING A. Biased Support Vector Machine Algorithm To fit the user interest better, we must import adjusting ability into the ML algorithms. So the approach proposed in this paper imports a stimulant function, uses training examples distribution n + /n and a user-adaptable parameter k to deals imbalancedly different classes of the pre-assigned pages, so as to be best fit for the user interest. The approach is called Biased Support Vector Machine, and a detailed description and analysis are in [22]. In the classical SVM, a penalty function F = C ξ i is introduced as additional capacity control function, where the non-negative variable ξ i is a measure of the misclassification errors and the coefficient C emphasizes the tolerant degree of misclassification error. Consequently the width of the margin decreases with C increasing. 590
4 BSVM introduces a stimulant function, F = C [(k 1) n y ξ i=1 i n + y ξ i= 1 i]/n, as the extension of penalty function. In BSVM, we describe positives as the examples of y i = +1, negatives as the examples of y i = 1, thus we define n + = {y i = +1} and n = {y i = 1}. The stimulant function uses both training examples distribution n + /n and an user-adaptable parameter k to express the user bias degree of different classes. Together with the effect of penalty function, the bias is described in Equation 1. The width of the margin to the positive side decreases with n + /n or k increasing. Thus BSVM can find a proper separating hyperplane with filtering result R i between R 1 and R k. bias= C+C (k 1) n /n = 1+(k 1) n /n C C n + /n 1 n + /n = n +/n+k n /n =k+n + /n (1) n /n BSVM is shown as follows. The generalized optimal separating hyperplane is determined by the vector w, that minimizes the functional, 1 min w,b,ξ 2 w 2 + C ξ i + C 1 ξ i C 2 ξ i y i=1 y i= 1 wherec 1 = C (k 1) n /n,c 2 = C n + /n,k 0 (2) subject to the constraints of: y i (w x i b) 1 ξ i where ξ i 0, i (3) Here C 1 and C 2 are the classification errors stimulant coefficients, k 0 is an adaptable parameter. The solution to the optimization problem of Equation 2 under the constraints of Equation 3 is given by the saddle point of the Lagrangian: L(w,b,ξ,α,β) = 1 2 w 2 + (C + C 1 ) y i=1ξ i + (C C 2 ) y i= 1 ξ i α i (y i [w T x i b] 1 + ξ i ) β i ξ i (4) where α, β are the Lagrange multipliers. The Lagrangian has to be minimized with respect to w,b,ξ and maximized with respect to α,β. Hence the solution to the problem is given by: min Q(α) = 1 2 with constraints of: and α i α j y i y j K(x i,x j ) i,j=1 α i (5) i=1 y i α i = 0 (6) i=1 B. Experiments and Analysis In our experiment, the forbidden pages belong to the category of Adult content. We have collected a total of 500 web pages by searching with the keyword porn. The corpus has been reviewed and classified as containing adult contents by human editors, which includes 100 non-pornographic web pages and 400 pornographic web pages. After taking 1/5 of each as training examples, we measured the training accuracy for SVM and BSVM in Table II. TABLE II TRAINING ACCURACY OF SVM AND BIASED SVM(K=5) Algorithm WebPage Correct Incorrect Total SVM Porngraphic 378(94.5%) 22(5.5%) 400 Non-porngraphic 69(69.0%) 31(31.0%) 100 Total 447(89.4%) 53(10.6%) 500 BSVM Porngraphic 396(99.0%) 4(1.0%) 400 Non-porngraphic 78(78.0%) 22(22.0%) 100 Total 474(94.8%) 26(5.2%) 500 To show the impact of adaptable parameters on BSVM, we experiment on benchmark collections of Chinese web pages 1 prepared by FuDan University. The collections include 9804 training examples and 9833 evaluating documents, which consist of a set of Chinese newswire stories classified under 20 categories. In this paper, we experiment on a document set made of two related categories (history and politics) of the benchmark. The document set contains totally 2800 web pages (2000 pages about politics as positives, 800 pages about history as negatives and 1/10 of each as training examples). We compute the positive sentences filtering precision under different C, and exhibit the influence of d = n + /n and k in Fig. 2. Concluded from the result, the positive sentences filtering precision increases with n + /n and k increasing. Fig. 2. BSVM filtering efficiency on different k and n + /n. The left figure shows the influence of parameter d=n + /n on the positive sentences filtering precision (k=1). The right figure shows the influence of parameter k on the positive sentences filtering precision (n + /n =1). IV. CONCLUSION AND FUTURE WORK In this paper, we give a study on different scopes of filtering result according to different filtering task and user interest. We find that the web filtering result can be divided three sets of relative pages set(r 1 ), similar pages set(r i ) and homologous pages set(r k ) with the relationship of R k R i R 1. To adjust the web filtering result to be more fit for the user 0 α i C + C 1 if y i = 1 0 α i C C 2 if y i = 1 (7) 1 The benchmark and a detailed description(in Chinese) are available at 16\&type=
5 interest, a Biased Support Vector Machine (BSVM) algorithm in introduced which imports a stimulant function, uses training examples distribution n + /n and a user-adaptable parameter k to deals imbalanced different classes of the pre-assigned pages. Experiments show that BSVM can greatly improve the web filtering performance. But problems of user bias description and parameter self-adaptable are still open and we leave them as future work. REFERENCES [1] N. J. Belkin and W. B. Croft, Information Filtering and Information Retrieval: Two Sides of the Same Coin? Communications of the ACM, vol. 35, no. 12, pp , Dec [2] P. Y. Lee, S. C. Hui, and A. C. M. Fong, Neural networks for web content filtering, IEEE Intelligent Systems, vol. 17, no. 5, pp , [3] N. Project, Report on currently available cots filtering tools, Technicle report, [4] O. B. Longe and F. A. Longe, The nigerian web content: Combating pornographic using content filters, Journal of Information Technology Impact, vol. 5, no. 2, pp , [5] J. R. Quinlan, Discovering rules by induction from large collections of examples, Expert Systems in the Micro-Electronic Age, pp , [6] J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, no. 1, pp , [7] J. R. Quinlan, C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., [8] F. D. Chid Apte and S. Weiss, Text miningwith decision rules and decision trees, in Proceedings of the Conference on Automated Learning and Discovery, CMU, June [9] P. Clark and T. Niblett, The cn2 induction algorithm, Mach. Learn., vol. 3, no. 4, pp , [10] M. Iwayama and T. Tokunaga, Cluster-based text categorization: a comparison of category search strategies, in Proceedings of SIGIR- 95, 18th ACM International Conference on Research and Development in Information Retrieval, E. A. Fox, P. Ingwersen, and R. Fidel, Eds. Seattle, US: ACM Press, New York, US, 1995, pp [11] B. Masand, G. Linoff, and D. Waltz, Classifying news stories using memory based reasoning, in SIGIR 92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM Press, 1992, pp [12] S. Chakrabarti, B. E. Dom, and P. Indyk, Enhanced hypertext categorization using hyperlinks, in Proceedings of SIGMOD-98, ACM International Conference on Management of Data, L. M. Haas and A. Tiwary, Eds. Seattle, US: ACM Press, New York, US, 1998, pp [13] K. M. A. Chai, H. L. Chieu, and H. T. Ng, Bayesian online classifiers for text classification and filtering, in SIGIR 02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM Press, 2002, pp [14] A. McCallum and K. Nigam, A comparison of event models for naive bayes text classification, in AAAI-98 Workshop on Learning for Text Categorization, [15] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, A machine learning approach to building domain-specific search engines, in The Sixteenth International Joint Conference on Artificial Intelligence (IJCAI- 99), [16] T. Joachims, Text categorization with support vector machines: Learning with many relevant features, in Proceedings of the European Conference on Machine Learning. Berlin,German: Springer, 1998, pp [17] T. Joachims, N. Cristianini, and J. Shawe-Taylor, Composite kernels for hypertext categorisation, in Proceedings of ICML-01, 18th International Conference on Machine Learning, C. Brodley and A. Danyluk, Eds. Williams College, US: Morgan Kaufmann Publishers, San Francisco, US, 2001, pp [18] V. Vapnik, Statistical Learning Theory. New York: John Wiley, Sons, [19] A. Du and B. Fang, Comparison of maching learning algorithms in chinese web filtering, in proceedings of The third International Conference on Machine Learning and Cybernetics. Shanghai,China: IEEE Press, 2004, pp [20] F. Sebastiani, Machine learning in automated text categorization, ACM Comput. Surv., vol. 34, no. 1, pp. 1 47, [21] Y. Yang and X. Liu, A re-examination of text categorization methods, in SIGIR 99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM Press, 1999, pp [22] A. Du and B. Fang, A biased support vector machine approach to web filtering, in ICAPR 05: Proceedings of the Third International Conference on Advances in Patten Recognition, C. A. P. P. Sameer Singh, Maneesha Singh, Ed. Springer Verlag, Heidelberg, D-69121, Germany, 2005, pp
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationSpidering and Filtering Web Pages for Vertical Search Engines
Spidering and Filtering Web Pages for Vertical Search Engines Michael Chau The University of Arizona mchau@bpa.arizona.edu 1 Introduction The size of the Web is growing exponentially. The number of indexable
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationApplication of Support Vector Machines to Fault Diagnosis and Automated Repair
Application of Support Vector Machines to Fault Diagnosis and Automated Repair C. Saunders and A. Gammerman Royal Holloway, University of London, Egham, Surrey, England {C.Saunders,A.Gammerman}@dcs.rhbnc.ac.uk
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationIntrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationEmail Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationThe Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationCAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationA fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationAUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br
More informationWE DEFINE spam as an e-mail message that is unwanted basically
1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir
More informationSpam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationReference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
More informationInvestigation of Support Vector Machines for Email Classification
Investigation of Support Vector Machines for Email Classification by Andrew Farrugia Thesis Submitted by Andrew Farrugia in partial fulfillment of the Requirements for the Degree of Bachelor of Software
More informationIDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationA MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
More informationModeling Suspicious Email Detection Using Enhanced Feature Selection
Modeling Suspicious Email Detection Using Enhanced Feature Selection Sarwat Nizamani, Nasrullah Memon, Uffe Kock Wiil, and Panagiotis Karampelas Abstract The paper presents a suspicious email detection
More informationAn Imbalanced Spam Mail Filtering Method
, pp. 119-126 http://dx.doi.org/10.14257/ijmue.2015.10.3.12 An Imbalanced Spam Mail Filtering Method Zhiqiang Ma, Rui Yan, Donghong Yuan and Limin Liu (College of Information Engineering, Inner Mongolia
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationComparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu
More informationA Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
More informationFiltering Noisy Contents in Online Social Network by using Rule Based Filtering System
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationNeural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
More informationTheme-based Retrieval of Web News
Theme-based Retrieval of Web Nuno Maria, Mário J. Silva DI/FCUL Faculdade de Ciências Universidade de Lisboa Campo Grande, Lisboa Portugal {nmsm, mjs}@di.fc.ul.pt ABSTRACT We introduce an information system
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationAn Efficient Two-phase Spam Filtering Method Based on E-mails Categorization
International Journal of Network Security, Vol.9, No., PP.34 43, July 29 34 An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization Jyh-Jian Sheu Department of Information Management,
More informationA MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationThree types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
More informationA Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
More informationAUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES
AUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES Anwar Ali Yahya *, Addin Osman * * Faculty of Computer Science and Information Systems, Najran University,
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationFinancial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationLasso-based Spam Filtering with Chinese Emails
Journal of Computational Information Systems 8: 8 (2012) 3315 3322 Available at http://www.jofcis.com Lasso-based Spam Filtering with Chinese Emails Zunxiong LIU 1, Xianlong ZHANG 1,, Shujuan ZHENG 2 1
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationWeb Mining as a Tool for Understanding Online Learning
Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu
More informationInner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationMining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
More informationPredicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
More informationClass-specific Sparse Coding for Learning of Object Representations
Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany
More informationSURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING
I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationMethod of Fault Detection in Cloud Computing Systems
, pp.205-212 http://dx.doi.org/10.14257/ijgdc.2014.7.3.21 Method of Fault Detection in Cloud Computing Systems Ying Jiang, Jie Huang, Jiaman Ding and Yingli Liu Yunnan Key Lab of Computer Technology Application,
More informationA Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationEmail Classification Using Data Reduction Method
Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user
More informationProjektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
More informationIMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT
IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT M.SHESHIKALA Assistant Professor, SREC Engineering College,Warangal Email: marthakala08@gmail.com, Abstract- Unethical
More informationFacilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
More informationDissecting the Learning Behaviors in Hacker Forums
Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationA Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
More informationHow To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationFlexible Neural Trees Ensemble for Stock Index Modeling
Flexible Neural Trees Ensemble for Stock Index Modeling Yuehui Chen 1, Ju Yang 1, Bo Yang 1 and Ajith Abraham 2 1 School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China
More informationBusiness Lead Generation for Online Real Estate Services: A Case Study
Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University
More informationRandom forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
More informationNaive Bayes Spam Filtering Using Word-Position-Based Attributes
Naive Bayes Spam Filtering Using Word-Position-Based Attributes Johan Hovold Department of Computer Science Lund University Box 118, 221 00 Lund, Sweden johan.hovold.363@student.lu.se Abstract This paper
More informationA NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
More informationBayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationModel Trees for Classification of Hybrid Data Types
Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationTo improve the problems mentioned above, Chen et al. [2-5] proposed and employed a novel type of approach, i.e., PA, to prevent fraud.
Proceedings of the 5th WSEAS Int. Conference on Information Security and Privacy, Venice, Italy, November 20-22, 2006 46 Back Propagation Networks for Credit Card Fraud Prediction Using Stratified Personalized
More informationAddressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
More informationInternational Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
More informationWhitepaper: Understanding Web Filtering Technologies ABSTRACT
Whitepaper: Understanding Web Filtering Technologies ABSTRACT The Internet is now a huge resource of information and plays an increasingly important role in business and education. However, without adequate
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationApplication of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
More information