Construction Algorithms for Index Model Based on Web Page Classification
|
|
- Moris May
- 7 years ago
- Views:
Transcription
1 Journal of Computational Information Systems 10: 2 (2014) Available at Construction Algorithms for Index Model Based on Web Page Classification Yangjie ZHANG 1,2,, Chungang YAN 1,2, Pengwei WANG 1,2, Haichun SUN 1,2 1 Department of Computer Science and Technology, Tongji University, Shanghai , China 2 The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai , China Abstract Web pages on the Internet are massive, diverse, heterogeneous and redundant. How to organize and manage them effectively is an urgent problem. In this paper, we propose a method to index web pages and build an index model based on web page classification and hyperlink analysis. First, an initialization algorithm is given to construct an index model for an initial set of web pages. Then, considering the dynamics of web pages on the Internet, we propose an incremental updating algorithm which can update an index model incrementally. Through theoretical analysis, the complexity of proposed algorithms shows linear relationships with the scale of web pages and their growth. The experimental results show that initialization algorithm can construct an index model for a fixed number of web pages relatively fast, while the incremental updating algorithm can satisfy the update speed of web pages on the Internet. Thus, the proposed algorithms are feasible and effective. The constructed index model can provide supports for diversified information service systems to enable them to make better use of web resources and provide more valuable services to users. Keywords: Web Page Classification; Index Model; Hyperlinks; Naive Bayes 1 Introduction With the rapid development of the Internet, network resources become more and more abundant, and various kinds of information service systems are emerged to facilitate people s daily lives. However, owing to the openness, dynamics and complexity of the Internet, web pages on the Internet are massive, diverse, heterogeneous and redundant. Existing methods cannot provide a very proper way to organize and manage web pages for these systems. As a popular information service system, search engine solves the problem of acquiring web pages rapidly by its main Project supported by National Basic Research Program of China 973 (No. 2010CB328101), International Science & Technology Cooperation Program of China (No. 2013DFM10100), National Natural Science Funds of China (No ), Shanghai Science & Technology Research Plan (No ). Corresponding author. address: zyj177484@126.com (Yangjie ZHANG) / Copyright 2014 Binary Information Press DOI: /jcis9017 January 15, 2014
2 656 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) technologies including inverted index, link analysis, and distributed storage [1]. Search engine helps users to find relevant web pages through keywords matching. However, the rapid growth of web pages reduces search efficiency, increases redundancy, inaccuracy of the returned results and it cannot meet the personalized demands of information retrieval service only through keywords matching. Recommendation system is another popular representative. Since researches on collaborative filtering appeared in mid-1990s, diversified recommendation algorithms have been proposed and used in many research fields [2-4], including cognitive science [5], information retrieval [6] and management science [7]. Web page recommendation systems recommend similar web pages to those having common preferences, and improve the intelligence of internet information service systems. In general, there are two types of recommender systems in this field: one can only find web pages that are similar to the customer interests by content-based filtering, while the other one needs massive user comments. With the increase of web pages, the accuracy and speed will reduce by collaborative filtering. Due to lack of an effective organization and management method of web pages, web-based information systems performance is restricted [8]. To this end, this paper presents an index model for web pages, which can organize web pages and find out the relationships among web pages. We give an assumption that hyperlinks among web pages reflect some kinds of business relationships in the real world and based on web page classification, two construction algorithms for index model are designed and implemented: one is the initialization algorithm for a given set of web pages, and the other is an updating algorithm for an existing index model and an incremental set of web pages. Then the index model is completed by analyzing hyperlinks among web pages after web page classification. Experimental results show that the proposed algorithms are feasible and effective. First, to validate its effectiveness, an example index model is constructed based on one million web pages crawled from the Internet, which shows that the generated relationships among web page classes can reflect the real-world business associations. Second, for a given set of web pages, the initialization algorithm can complete the construction of index model within a limited time. Last, the incremental updating algorithm can update an index model with the actual growth of web pages. 2 Index Model Framework In this paper, c denotes a web page class; C denotes a set of web page classes; p denotes a web page; D denotes a sample set of web pages; t denotes a feature, and it is a word. Definition 1. (Index Model) An index model is defined as a weighted directed graph G = (V, E), where a) V = {c i c i C }, and C is the set of all web page classes in the index model; b) E = { c i, c j c i, c j V andc i c j }; c) Let w c i, c j be the weight of c i, c j, then w c i, c j = p t c j,p k c i,u k =url t (W u P (c j p t ) P (c i p k )) W u = 1 n
3 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) where u k is a hyperlink on web page p k, url t is the Uniform Resource Locator of web page p t, n is the number of hyperlinks on web page p k, P (c j p t ) is the probability that web page p t belongs to web page class c j. The entire construction process of index model is as follows: 1. Web page pretreatment. 1 Extract content enclosed by HTML tags from a web page. 2 Extract content text. 3 Combine the contents extracted in steps 1 and 2. 4 Web page segmentation. 4 Remove stop words. 6 Select features to represent each web page. 7 Extract hyperlinks in web pages. 2. Training classifier and web page classification. 1 Generate a classifier by learning a sample set of web pages. 2 Classify all preprocessed web pages. 3. Compute link relationships among web page classes. 1 Using index initialization algorithm to complete initialization of an index model. 2 Index incremental updating algorithm will be used to updating an index model. The building process of index model is shown in Fig. 1. a sample set of web pages an initial set of Web pages updating web page Web Page Pretreatment Extract tags Extract body Contents Segmentation Remove stop words Select features Extract hyperlinks Training Classifier Web Page Classification Classifier Index Initialization Algorithm Index Incremental Updating Algorithm Compute Link Relationships among Web Page Classes Fig. 1: Building process of index model based on web page classification
4 658 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) Algorithms for Constructing Index Model 3.1 Web page preprocessing A web page contains various HTML tags, and contents within different labels make different contributions to its subject. In order to classify a web page and obtain relationships among web classes, we first preprocess each web page, which includes extracting tags and body contents, content segmentation, extracting hyperlinks, etc. Step 1 Extract contents enclosed in Title, META, H1 H6, a tags in a HTML file. Step 2 Extract content text of a web page using a statistical approach [9]. Step 3 Content segmentation. Processing contents extracted from Step 1 and Step 2 using Chinese word segmentation tool IKAnalyzer [10], and remove stop words. Step 4 Weight each feature using TF*IDF [11]. Step 5 Extract feature vector of each web page. Based on the values obtained in Step 4, select top n feature items to constitute the feature vector ( n is an experimental value). Then, a feature vector of a web page is expressed as: p = (t 1, t 2,..., t n ). Step 6 Extract hyperlinks of web pages. All hyperlinks extracted from a web page is denoted as a vector P out = (u 1, u 2,..., u n ), where u i is a hyperlink on this web page. Step 7 Weight each hyperlink. The weight of a hyperlink u is calculated by W u = 1 n, where n is the total number of hyperlinks on this web page. The basic idea derives from Random Surfer Model [12], which assumes that all the hyperlinks on a web page are equally important. 3.2 Training classifier and web page classification This paper uses Naive Bayesian [13] method to train the classifier and classify all web pages we processed before. Compared to KNN [14] and other methods for dealing with massive web pages, Naive Bayesian method can guarantee better classification accuracy and faster classifying speed, and it is simple. Step 1 Generate a classifier through learning a sample set of web pages. In this process, we calculate the probability that each feature item belongs to each web page class based on sample pages. There have been various methods to complete this calculation. McCallum and Nigam proposed a multinomial model [13] whose misclassification accuracy is smaller than other models when the feature set is relatively large. In this paper, we use their multinomial model to calculate P (t j c), which represents the probability of t j belonging to c. The computing formula is as follows: P (t j c) = V T F (t j, c) V k=1 T F (t k, c) Where V represents the total number of feature items in a sample class c, T F (t j, c) represents the total frequency of t j appearing in c. Step 2 Classify all preprocessed web pages. Calculate the probabilities that p belongs to all web page classes, and then p is classified into c i with the greatest probability. The probability is calculated as follows: i = arg max{p (c j p)}, c j C
5 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) P (c j p) = P (c j ) n P (t i c j ) i=1 P (p) Where P (c j ) is a priori probability, and its value is the number of web pages in c j in proportion to the sample set, P (p) is a constant, n is the number of feature items in p. 3.3 Computing link relationships among web classes Web page representation After the preprocessing and classification of web page, we obtain the elements of a web page in Table 1. Table 1: The representation of a web page p Element Symbol feature vector (t 1, t 2,..., t n ) web page class c probability of p belonging to (P (c p)) a set of hyperlinks on P out = (u 1, u 2,..., u m ) a set of URLs which have hyperlinks to p P in = (u 1, u 2,..., u o ) Where P in is initialized to empty before computing link relationships Computing relationships among web classes A directed edge c i, c j in an index model represents the direct link relationship from c i to c j, and its weight w c i, c j is calculated as a conditional probability P (c j c i ). It indicates the probability of users visiting the web page class c j after they browsed c i. In this article, we propose a way to build link relationships among web classes based on weights of hyperlinks and probabilities of web pages belonging to classes. We suppose that a web page only belongs to one class. Then, P (c j c i ) is calculated as follows: P (c j c i ) = Where P (c j p t ) and P (p t c i ) are computed as: P (c j p t ) = P (p t c i ) = p k c i p t c j P (c j p t ) P (p t c i ) (1) P (C j p t ) 1 + p k c j P (C j p k ) P (p t p k ) u k =url t W u (2) (3)
6 660 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) In Eq. (2), P (c j p t ) is the probability of p t belonging to c j after normalization. In Eq. (3), P (p t c i ) means the probability of users visiting p t after they browsed c i, which is calculated by summing all hyperlinks from web page class c i to page p t. u k represents a hyperlink on p k, url t represents the URL of p t, W u is the weight of hyperlink u k. To facilitate the calculation, the Eq. (3) can be transformed into the following form: P (p t c i ) = Then, P (c j c i ) is calculated as follows: P (c j c i ) = p k c i,u k =url t p t c j,p k c i,u k =url t ( Wu P (c i p k ) ) (4) ( Wu P (c j p t ) P (c i p k ) ) (5) Two algorithms for constructing index model For a given set of web pages, Algorithm 1 can complete the initial construction of an index model as defined in Def. 1. Algorithm 1 Index Initialization Algorithm: Input: a set of web class C, a set of web pages S. Output: Index Model Matrix IN CC, HashTable HT 1: For c C 2: For p c 3: Calculate 1 + p c P (c p) 4: HT.put( url, c ) // url is URL of p. 5: For p S 6: For u P out // P out is the set of hyperlinks on p. 7: if ( HT.contains( u ) ) 8: Calculate P (c j p k ) and P (c i p) // u is URL of p k. ( ) 9: P (c j c i ) = P (c j c i ) + 1 P P out (c j p k ) P (c i p) 10: else P in.put( url ) // P in is the set of URLs which have hyperlinks to p. There are two main factors that have influence on the complexity of Algorithm 1, namely, the number of web pages and the number of hyperlinks on each web page. We assume that the number of web pages is S and the average number of hyperlinks on each web page is P out. The time complexity of Steps 1-4 is O( S ). Steps 5-10 calculate those link relationships and their time complexity are O( S P out ). Thus, the total time complexity of Algorithm 1 is O( S + S P out ) = O( S P out ). Algorithm 1 can complete the initial construction of an index model based on a given set of web pages, but it cannot update the model in real time with the changes. Considering dynamic changes of web pages on the Internet, we further give an index incremental updating algorithm for updating an existing index model.
7 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) Algorithm 2 Index Incremental Updating Algorithm: Input: a set of web class C, web page p, Index Model Matrix IN CC, HashTable HT. Output: Index Model Matrix IN CC, HashTable HT 1: Update 1 + p k c j P (c p k ) // p belongs to web page class c. 2: HT.put( url, c ) // url is the URL of p. 3: For c i C 4: P (c i c) = P (c i c) 1+ p k c P (c p k) P (p) 1+ p k c P (c p k) 5: P (c c i ) = P (c c i ) 1+ p k c P (c p k) P (p) 1+ p k c P (c p k) 6: For u P in // P in is the set of URLs which have hyperlinks to p. 7: if( HT.contains( u ) ) 8: Calculate P (c j p k ) and P (c p) // u is URL of p k. ( ) 9: P (c c j ) = P (c c j ) + 1 P in P (c j p k ) P (c p) 10: For u P out // P out is the set of hyperlinks on p. 11: if( HT.contains( u )) 12: Calculate P (c j p k ) and P (c p) // u is URL of p k. ( ) 13: P (c j c) = P (c j c) + 1 P out P (c j p k ) P (c p) 14: else P in.put( url ) // P in is the set of URLs which have hyperlinks to u. The main factors affecting the complexity of Algorithm 2 are the number of web classes, the number of hyperlinks on each web page p, and the number of web pages that have hyperlinks pointing to web page p. The time complexity of steps 3-5 is O( C ), and that of steps 6-14 is O( P in + P out ). Thus, the total time complexity of Algorithm 2 is O( C + P in + P out ). 4 Experimental Evaluation and Analysis 4.1 Experimental environment Experiments in this paper are based on 9 Sugon servers with 4 cores and 8G memory. The experimental set of web pages contains one million Chinese web pages that are crawled from the Internet. Eight servers are used to classify web pages by the web page classification tool Mahout [15] based on Hadoop Distributed system, and one server is used to run the proposed construction algorithms for index model. 4.2 Experimental results and analysis In this paper, the result of web page classification will directly affect the construction of index model. Thus, the accuracy of classification algorithm should be guaranteed. We use 282 web page classes and their sample web pages in open directory project dmoz [16] as the training data, and manually select some relevant web pages as a supplement. There are about 500 sample web pages in each web page class.
8 662 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) We randomly select 80% of web pages in each sample class to train the classifier, while the remaining 20% web pages as the test set. The experimental results show that the classification accuracy is 71.7%. In order to evaluate the efficiency of the two proposed algorithms and compare them, the crawled one million web pages are classified into 282 web classes and then used to construct index model by two algorithms separately as shown in Fig. 2. Specially, we use updating algorithm update an empty model by these web pages. The comparative running time is shown in Fig. 2(a). Time(h) Web Pages(Million) Initiali zation Updat ing Time (h) Initialization Updating (a) Building an index model (b) Updating an index model Fig. 2: Time comparisons of two construction algorithms As can be seen from Fig. 2(a), time efficiency of index initialization algorithm is better than that of index incremental updating algorithm. To build an index model with S pages, index incremental updating algorithm requires O( S C + S P in +P out ) time while the initialization one requires O( S P out ) time. The time complexity of these two algorithms increases linearly with the number of web pages as previously mentioned. For the experimental set of web pages we selected, since it is not self-contained from the perspective of hyperlinks contained, i.e., lots of web pages that are linked to from this experimental set are not within it, the growth curve of time efficiency is shown in Fig. 2(a). When the number of web pages is sufficiently large, the time efficiency of two algorithms will be stabilized and tends to grow linearly. In order to evaluate the efficiency that the two proposed algorithms are used to update an existing index model. We update an index model built by half million web pages using another half million web pages. The comparative running time is shown in Fig. 2(b). Compared to index incremental updating algorithm, index initialization algorithm build an index model in a relatively short period of time. But as shown in Fig. 2(b), when you need to update an index model, index initialization algorithm have to rebuild the whole index model, while updating algorithm just need to add new pages to the built index model. Therefore, the two algorithms are applied to different situations: initialization algorithm is used for the initialization of an index model; incremental updating algorithm is used to update an existing index model. According to the annual statistical report by China Internet Network Information Center (CNNIC), the number of web pages increased stably. If the algorithm is applied, by increasing the number of devices, it can meet the update speed of web pages on the internet which illustrates feasible of the algorithm. Further, we analyzed the structure of the index model constructed above. As the edge weights are generally small, its edge weight multiplied by Then the statistics result of its edge weights is shown in Fig. 3. Since the test data is only one million web pages, lots of web pages that are linked to from this experimental set are not within it, edge weights are generally small and edges whose weight is less than 1 may be caused by heterogeneous of the Internet. The weight distribution shows that the
9 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) The Percentage of All Edges 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% < >100 Edge Weight Fig. 3: Statistics of the index model edge weight index model can distinguish the strength of link relationships among web page classes. Therefore it can be said that the index model constructed by the algorithm satisfied the basic conditions of validity. 4.3 Exhibition example The index model we have built is too big to complete show. In this section, we select 4 web page classes and their edges from the index model we built before as an exhibition example. As shown in Fig. 4, an index model is a direct graph. Each out-degree of a web page class represents the browsing probability from the class to its point to class. For example, in Fig. 4, a direct edge from Estate to House Renting is That is, if users are browsing web pages in Estate now, the probability they will visit web pages in House Renting next step is 22.1%. Based on the index model, we can know which classes are more closely related with a specified class. Compared with House Renting, Estate is more relevant with Retail Market. And compared with Retail Market, House Renting is more relevant with Estate. Along with web pages in those classes are changing, the corresponding edge values are changing too. When most web pages on the Internet have been divided into their corresponding classes, edge values of the index model tend to be stable. Shopping 1.6 Estate Retail Market House Renting Fig. 4: An index model instance 5 Conclusion This paper presents an index model based on web page classification, and designs its initial construction algorithm by hyperlinks analysis. Moreover, in order to reflect the dynamic process of web pages on the
10 664 Y. Zhang et al. /Journal of Computational Information Systems 10: 2 (2014) Internet, we give index incremental updating algorithm and the two algorithms time complexity is only linear growth. Experimental results show that index incremental updating algorithm can dynamically update an existing index model and it can meet the update speed of web pages on the internet. Index model can organize and manage web pages and link relationships among classes can reflect some real business connections. However, this article only preliminary given its construction algorithm. We will further study on how to make better use of an index model to service users and how to build an index model based on other web classification methods such as KNN. References [1] F. S. Hong, and H. Kun, Research on search engine technology and service and its enlightenment, Journal of the China Society for Scientific and Technical Information 19(6) (2002) [2] U. Shardanand, P. Maes, Social information filtering: algorithms for automating Word of Mouth, in: Proc. on Human Factors in Computing Systems, ACM Press, New York, 1995, pp [3] W. Hill, L. Stead, M. Rosenstein, and G. Fumas, Recommending and evaluating choices in a virtual community of use, in: Proc. on Human Factors in Computing Systems, ACM Press, New York, 1995, pp [4] P. Resnick, N. lakovou, M. Sushak, P. Bergstrom, and J. Riedl, GroupLens: An open architecture for collaborative filtering of netnews, in: Proc. the Computer Supposed Cooperative Work Conf, ACM Press, New York, 1994, pp [5] E. Rich, User modeling via stereotypes, Cognitive Science, 3(4) (1979) [6] R. Baeza-Yates, and B. Ribeiro-Ner, Modern Information Retrieval, New York Addison-Wesley Publishing, [7] B. P. S. Murthi, and S. Sarkar, The role of the management sciences in research on personalization, Management Science, 49(10) (2003) [8] G. Adomavicius, and A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. on Knowledge and Data Engineering, 17(6) (2005) [9] S. C. Jie, and G Yi, A statistical approach for content extraction from web page, Journal of Chinese Information Processing, 18(5) (2004) [10] H. Y. Biao, The comparative study of chinese word segmentation of lucene interface, Science & Technology Information, 12 (2012) [11] Y. Yang, and J. O. Pedersen, A comparative study on feature selection in text categorization, in: Proc. the 14th International Conference on Machine Learning (ICML 97), 1997, pp [12] L. Page, S Brin, and R. Motwani, The pagerank citation ranking: Bringing order to the web, [13] A. McCallum, and K. Nigam, A comparison of event models for naive bayes text classification, in: Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, CA: AAAI Press, Menlo Park, 1998, pp [14] T. Cover, and P. Hart, Nearest neighbor pattern classification, IEEE Transactions in Information Theory, 13(1) (1967) [15] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in Action, Manning Publications, Shelter Island, [16] M. Grobelnik, J. Brank, D. Mladeni, B. Novak, and B. Fortuna, Using dmoz for constructing ontology from data stream, in: Proc. the 28th International Conference on Information Technology Interfaces, Dubrovnik, 2006, pp
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationA Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationIII. DATA SETS. Training the Matching Model
A Machine-Learning Approach to Discovering Company Home Pages Wojciech Gryc Oxford Internet Institute University of Oxford Oxford, UK OX1 3JS Email: wojciech.gryc@oii.ox.ac.uk Prem Melville IBM T.J. Watson
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach
ecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach ecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach Banatus Soiraya Faculty of Technology King Mongkut's
More informationA Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng
More informationFault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
More informationRANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
More informationBayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
More informationSEO Techniques for various Applications - A Comparative Analyses and Evaluation
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationA MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining
A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,
More informationOPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP
OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP 1 KALYANKUMAR B WADDAR, 2 K SRINIVASA 1 P G Student, S.I.T Tumkur, 2 Assistant Professor S.I.T Tumkur Abstract- Product Review System
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationLasso-based Spam Filtering with Chinese Emails
Journal of Computational Information Systems 8: 8 (2012) 3315 3322 Available at http://www.jofcis.com Lasso-based Spam Filtering with Chinese Emails Zunxiong LIU 1, Xianlong ZHANG 1,, Shujuan ZHENG 2 1
More informationRemote support for lab activities in educational institutions
Remote support for lab activities in educational institutions Marco Mari 1, Agostino Poggi 1, Michele Tomaiuolo 1 1 Università di Parma, Dipartimento di Ingegneria dell'informazione 43100 Parma Italy {poggi,mari,tomamic}@ce.unipr.it,
More informationBisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationFiltering Noisy Contents in Online Social Network by using Rule Based Filtering System
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College
More informationIndex Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationSubordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
More informationMedical Image Segmentation of PACS System Image Post-processing *
Medical Image Segmentation of PACS System Image Post-processing * Lv Jie, Xiong Chun-rong, and Xie Miao Department of Professional Technical Institute, Yulin Normal University, Yulin Guangxi 537000, China
More informationAnti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
More informationDetecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
More informationParallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationLarge-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook Part 2:Mining using MapReduce Mining algorithms using MapReduce
More informationTowards Effective Recommendation of Social Data across Social Networking Sites
Towards Effective Recommendation of Social Data across Social Networking Sites Yuan Wang 1,JieZhang 2, and Julita Vassileva 1 1 Department of Computer Science, University of Saskatchewan, Canada {yuw193,jiv}@cs.usask.ca
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationDr. D. Y. Patil College of Engineering, Ambi,. University of Pune, M.S, India University of Pune, M.S, India
Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Effective Email
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationA Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
More informationChapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the
More informationMonitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control Phudinan Singkhamfu, Parinya Suwanasrikham Chiang Mai University, Thailand 0659 The Asian Conference on
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationRanked Keyword Search in Cloud Computing: An Innovative Approach
International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationOptimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
More informationApplied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationRandom Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
More informationInternational Journal of Emerging Technology & Research
International Journal of Emerging Technology & Research An Implementation Scheme For Software Project Management With Event-Based Scheduler Using Ant Colony Optimization Roshni Jain 1, Monali Kankariya
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationA Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
More informationAchieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services
Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationSimple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationTop Top 10 Algorithms in Data Mining
ICDM 06 Panel on Top Top 10 Algorithms in Data Mining 1. The 3-step identification process 2. The 18 identified candidates 3. Algorithm presentations 4. Top 10 algorithms: summary 5. Open discussions ICDM
More informationTop 10 Algorithms in Data Mining
Top 10 Algorithms in Data Mining Xindong Wu ( 吴 信 东 ) Department of Computer Science University of Vermont, USA; 合 肥 工 业 大 学 计 算 机 与 信 息 学 院 1 Top 10 Algorithms in Data Mining by the IEEE ICDM Conference
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationHow To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
More informationInternet Traffic Prediction by W-Boost: Classification and Regression
Internet Traffic Prediction by W-Boost: Classification and Regression Hanghang Tong 1, Chongrong Li 2, Jingrui He 1, and Yang Chen 1 1 Department of Automation, Tsinghua University, Beijing 100084, China
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationDecision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
More informationContent-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
More informationCloud Storage-based Intelligent Document Archiving for the Management of Big Data
Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Keedong Yoo Dept. of Management Information Systems Dankook University Cheonan, Republic of Korea Abstract : The cloud
More informationThe Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationGoldenBullet in a Nutshell
GoldenBullet in a Nutshell Y. Ding, M. Korotkiy, B. Omelayenko, V. Kartseva, V. Zykov, M. Klein, E. Schulten, and D. Fensel Vrije Universiteit Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, NL From:
More informationFCE: A Fast Content Expression for Server-based Computing
FCE: A Fast Content Expression for Server-based Computing Qiao Li Mentor Graphics Corporation 11 Ridder Park Drive San Jose, CA 95131, U.S.A. Email: qiao li@mentor.com Fei Li Department of Computer Science
More informationInner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationA Network Simulation Experiment of WAN Based on OPNET
A Network Simulation Experiment of WAN Based on OPNET 1 Yao Lin, 2 Zhang Bo, 3 Liu Puyu 1, Modern Education Technology Center, Liaoning Medical University, Jinzhou, Liaoning, China,yaolin111@sina.com *2
More informationA Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationA Web Recommender System for Recommending, Predicting and Personalizing Music Playlists
A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists Zeina Chedrawy 1, Syed Sibte Raza Abidi 1 1 Faculty of Computer Science, Dalhousie University, Halifax, Canada {chedrawy,
More informationSoftware Engineering 4C03
Software Engineering 4C03 Research Paper: Google TM Servers Researcher: Nathan D. Jory Last Revised: March 29, 2004 Instructor: Kartik Krishnan Introduction The Google TM search engine is a powerful and
More informationUnderstanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
More informationOptimize Position and Path Planning of Automated Optical Inspection
Journal of Computational Information Systems 8: 7 (2012) 2957 2963 Available at http://www.jofcis.com Optimize Position and Path Planning of Automated Optical Inspection Hao WU, Yongcong KUANG, Gaofei
More informationMake search become the internal function of Internet
Make search become the internal function of Internet Wang Liang 1, Guo Yi-Ping 2, Fang Ming 3 1, 3 (Department of Control Science and Control Engineer, Huazhong University of Science and Technology, WuHan,
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationRecommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
More informationBayesian Spam Detection
Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional
More informationExploration of Search Engine Optimization Technology Applied in Internet Marketing
Exploration of Search Engine Optimization Technology Applied in Internet Marketing 1 Li-Hsing HO, 2 Meng-Huang LU, 3 Chin-Pei LEE, 4 Tien-Fu PENG 1, First Author College of Management, Chung Hua University,
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationBagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
More informationCommon Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data
More informationAN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS
AN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS Maddela Pradeep 1, V. Nagi Reddy 2 1 M.Tech Scholar(CSE), 2 Assistant Professor, Nalanda Institute Of Technology(NIT), Siddharth Nagar, Guntur,
More informationAutomated Medical Citation Records Creation for Web-Based On-Line Journals
Automated Medical Citation Records Creation for Web-Based On-Line Journals Daniel X. Le, Loc Q. Tran, Joseph Chow Jongwoo Kim, Susan E. Hauser, Chan W. Moon, George R. Thoma National Library of Medicine,
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationResearch of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
More informationA Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University
More informationLess naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems
More information