Link Processing for Fuzzy Web Pages Clustering and Classification
|
|
- Marianna Arlene Daniels
- 8 years ago
- Views:
Transcription
1 European Journal of Scientific Research ISSN X Vol.27 No.4 (2009), pp EuroJournals Publishing, Inc Link Processing for Fuzzy Web Pages Clustering and Classification Amir Masoud Rahmani Islamic Azad University, Science and Research branch, Tehran, Iran Zahra Hossaini Islamic Azad University, Science and Research branch, Tehran, Iran Saeed Setayeshi Islamic Azad University, Science and Research branch, Tehran, Iran Abstract Clustering and classification are two ways of arranging objects in related groups, according to their similarities, different groups, have different characteristics. Large volume of web pages in World wild web, need to be arranged in a way that make users comfortable, and clustering is an efficient way of grouping. Fuzzy as a flexible method could be used to find similarities, by means of membership functions. For web page clustering and classification, usually fixed size vectors of words/weights, become extracted from HTML form of web pages. Those kinds of vectors ordinarily are long and need so much time to process. To avoid this here, word are gathered from <a> tag, which gives shorter vectors, with most of useful information that could be obtained from other parts. These vectors have variable size. This method gives acceptable clusters and almost precise classes, with 89.76% precision rate, and also reduces processing time. Keywords: Clustering, Classification, Fuzzy, <a> tag 1. Introduction Clustering is an unsupervised method, that can finds hidden relations between data, and arranges them in internally related groups, and classification is a supervised method of grouping data in a way, that more similar elements come together in the same group [1], [2]. Excessive number of web documents need to be ordered in related groups to ease using them. Clustering and classification have been used in different approaches, for a long time to order, uncategorized data, K-means is a popular method of clustering, but it needs to knows the number of clusters a priori [8], [10]. Most of clustering approaches use fixed size vectors in their classifier [12], [13], which cost, long time for processing unnecessary elements, that have zero weight in representative vector. To avoid the problem, variable sized vectors are used in this application, so useless elements processing time will be bypassed.
2 Link Processing for Fuzzy Web Pages Clustering and Classification 621 A web page usually is shown with a vector structure, that become extracted from HTML form of the document [13], the vector consists of a series of word and a coefficient that is the frequency of the word in that document. Coefficient could be 1 or take a weight according to tags that a word appears on it [11], [13] or even could be size of document or etc. This is a good method but final vectors are still too long, so in order to have, shorter but still effective vectors, <a> tag is used. Since most of the times links show web pages, which have near concepts to current page, they can give useful vectors that are shorter, but do not decrease discriminating ability. After all, a fast, precise clustering method is needed, and fuzzy is a complete match. Fuzzy is used for clustering in [5], [6], but those works used fixed size vectors, in [3] variable size vectors is used, but cluster centers are too long, and importance of appearance place of word does not considered. In this paper, variable size vectors, which become extracted from <a> tag is used in a fuzzy clustering method, and it gives acceptable clusters and a low percent of misclassified pages. A data dictionary is used to avoid unrelated words, such as those, which come in advertisement links. Through the next sections the algorithm will be explained in details. Sections arrangement is as following. In section 2 related concepts are explained, used algorithm and steps of this method will be explained in section 3 and section 4 is about experimental results and finally section 5 concludes the paper. 2. Related Concepts 2.1. K- means and Clustering In pattern recognition a group of related data is called a cluster. This kind of relationship usually is specified by a distance function. A data belongs to a group, when it has minimum distance with the center of that group. K-means algorithm is a simple and effective method of clustering, and most of other methods are inspired by it. K-means algorithm starts with a random choose of a data as first cluster center, then finds other data distances from this center and clustered them in or out, according to their desistance from the center. Outer data are considered as new canters and the algorithm continues with those new centers until some conditions, such as a time limit, iteration number limit and so on Data Dictionary Data dictionary is a pool of words, which let us prevent using unrelated words to a class, it could be useful to eliminate irrelevant words, such as those, which appear in advertisements and some parts of the page, that have other links. In this process for each class a sub dictionary is provided. Those sub dictionaries start with some related words to their category and during the learning process, high weight words will be added to them Fuzzy Logic A fuzzy system is a rule based system that has a knowledge base inside, some if/else rules makes this knowledge base effective for different applications, so defining those rules is an important step. The rules use some words that their values should be defined by membership functions, because they are linguistic words, and real world values need to be mapped on them. Fuzzy could be used for clustering, because, usually data that need to be clustered, do not distribute well, so structures could not be defined exact. Also by this way, a data could be clustered in different groups with different membership values.
3 622 Amir Masoud Rahmani; Zahra Hossaini and Saeed Setayeshi 3. Proposed Method The clustering method consists of two main parts: first part is link processor and second, is fuzzy cluster maker. Figure 1: Diagram of clustering and classification steps Web Doc Fuzzy clustering and classification Conversion to Vector form DATA DICTIONARY Figure 1 shows a simple diagram of performed method, each part will be explained through this section Link Processor First of all a web document should be converted to a processable structure, vector is a common and simple form that is used in different applications. These kinds of vectors consist of some words and their related weights, those words and weights, become extracted from HTML form of documents. Weight, usually is the number of occurrence of word in a page or term frequency or TF, or could be TF divided by number of all words in the document, or other possible values [], it also could be multiplied by a value according to importance degree of the tag that the word appears in, for example words in <title> tag are more important than those in <p> tag. By using this way, the achieved vector could be too long, and take a lot of time to be processed, so it seems sensible to use a(some) tag(s) that gives a shorter vector, which still has most of necessary words. Most of the times <a> tags, make links to other pages, which have the same content as this page, or extra information about some important concepts, and they have valuable information about the subject of link in their structure, so by extracting this kind of information, a shorter but still useful vector could be constructed, and precision of process does not have perceptible decrease. A link could be shown in some different ways, make an inner anchor, by means of a text, make a link to another page via a text or make a link via a picture. 1. <a href="#end"> This text is a link to a part on this page </a>. 2. <a href=" This text is a link to a page on the World Wide Web </a>. 3. <a href=" <img border="0" src="buttonnext.gif" width="65" height="38" alt= "a link"></a>. In two first way, italic text shows subject and reason of link, and has some useful keywords, and in third one alt= has some extra information about the subject of the link on the other hand <a> tag could have some explanatory properties such as name, title or id, which could give a lot of keywords. <a href="../images/fuzzy.gif" title="gif image of a fuzzifier"> fuzzifing data instance for inference engine </a>.
4 Link Processing for Fuzzy Web Pages Clustering and Classification 623 So after finding keywords, TF of each word will be calculated, and get weight. Weight of word/term i (t i ) in document j is defined as following: α if t i is in title or name part of <a>tag W ij= β if t i is in id part of <a> tag γ if t i is in rel part of <a> tag (1) 1 if ti is in descriptive part (> <) Where α, β, γ >1 For example suppose word fuzzy appears three times in <title> and 5 times in descriptive part so its weight become 3*α+ 5 Before finding TFs a data dictionary could be used to eliminate unrelated words. Data dictionary is constructed according to documents in a training set, and has been completed during the extraction process. After this, all weighted TFs are become normal, Suppose p different words had been extracted from a document and r i is the number of appearance of word i in that document, or the word redundancy (TF), and R be the vector of r i s then norm(ri) is defined as: Norm(r i )= (r i -min(r))/(max(r) min(r)) i=1:p (2) Then final vector is ready for entering to fuzzy part, to be clustered, the vector structure is shown in figure 2. Figure 2: Document vector structure after analyzing t 1j t 2j t pj W 1j W 2j W pj Length of each vector could be variable, so unnecessary appearance of words that are related to category but do not exist in documents can be ignored Clustering and Classification by Fuzzy Logic (CCFL) K-means as a method of clustering, choose a random or some random data points as its first cluster center, and then finds the similarity between other data points and this centers, data point is placed in a cluster, which have the most similarity (or minimum distance) with its center. If the maximum similarity value that the point achieved be smaller than a threshold, this point will become a center in next run. Similarity value could be found as a fuzzy membership value, because, this value could be supposed as an average function that has values between zero and one and it could be written in the form fuzzy if /else rules. Suppose there are n web pages in our data set d i is the i th document in vector form, which have mi terms, and c j is the j th center that have l j terms, then distance between these two vectors could be shown by equation 3 [11]. dist(d i,c j )=1-[((nc*[ k=1 mi x(t k )*μ(t k )] r ) / lj] (3) Where nc is number of common words between document i and center j, x(t k ) is importance degree of k th term in document j and μ(t k ) is the frequency of word t k in cluster c j and r>0. Equation 3 makes a S-shaped membership function for output values. So rules can be written in this way: If nc/l i =high and x(t k )=high and μ(t k )= high then distance is very low. If nc/l i =high and x(t k )=high and μ(t k )= medium then distance is low.
5 624 Amir Masoud Rahmani; Zahra Hossaini and Saeed Setayeshi If nc/l i =low and x(t k )=low and μ(t k )= low then distance is too high. Now question is that, what x(t) and µ(t) are. Suppose that m th term in document i is t and has frequency of ft, then the importance degree of t for cluster j is: (ft/f avg, ) p, if ft<f avg,j x(t)= 1-[(ft-f avg,j )(1-q)/(f max,j - f avg,,j )], if ft f avg,j, f max,j >f avg, (f avg, j/ft) p, if ft f avg,j, f max,j =f avg,j (4) This is a Triangular-shaped membership function where p>0, 0<q<1 and f avg,j is the average weight of term t, in all documents of cluster j, that have term t in their vector, or in the other word, suppose p documents, of l documents, in cluster j, have the term t 1 and v(1:p) is the vector of the correspond weights, then f avg,j = ( i=1 p v(i))/p (5) f max,j is maximum value of term t in these documents. So a cluster center has a structure like figure 3. Figure 3: Cluster center structure t 1j t 2j t lj W avgj W 2avg W lavg W 1max W 2max W lmax Another important factor is the percentage of documents in cluster, which have a certain term. It would help to increase inner compactness of a cluster. µ(t)= number of documents in c j that have t / number of documents in c j (6) this is a S-shaped membership function. Now suppose there are k cluster, and n documents should be clustered in, document i or d i belongs to cluster j (c j ) where: d j ϵ c j where for i=1:k dist(d i,c k )>dist(d i,c j ) and k j dist(d i,c j )< defined threshold (7) When a document becomes a member of a cluster, cluster center should be changed, for other runs. The average and maybe the max value of common words in the cluster center and the document should be recalculated, considering new weight. Then to make this new member more effective, and also not increasing the center size so much, some high weight uncommon words of the document will be added to cluster center. The best value for this number is achieved by trial and error. Another important factor in clustering is threshold of membership value, or distance, at first runs, choosing this value so close to zero may increase cluster centers in vain, this value could have become smaller in every run. The whole process could be explained as following: 1. Choose a/some data point(s) as first cluster center(s), usually the first data will be chosen. 2. Find distance between other points and these points, if minimum distance is smaller than the threshold value, data belongs to correspond cluster; else data will become center in next run. 3. Change the center of each cluster, average and max value if required. 4. Change the threshold value. 5. Check stop condition, if it is met stop the process, else go to step 1 with new centers. If a cluster center does not have any member, it could, not to be center in next run.
6 Link Processing for Fuzzy Web Pages Clustering and Classification 625 Stop condition could be some defined number of runs, or until cluster s boundary do not change any more or any other acceptable condition. For classification, the class of training set data should be known, each data point gets the class of its center, the correctness of this classification, should become a part of stop condition, it means, that if the acquired class is the same as real class, cluster gets a positive point, if number of misclassifications become higher than a value, the cluster is acceptable, so test data could use this classification method. 4. Experimental Results For experimental purposes, over thirty hundred and fifty related web pages to have been downloaded from BBC.com, Yahoo.com, and also google.com. The pages belong to four classes. About 12 pages do not have normal HTML structure and some have XML format. By ignoring them, remaining web pages are about thirty three hundred. Two hundred and fifty of these pages have been chosen as training data and reminders for test. Best values for p, q and r, are , and 1-2 respectively. In The best form the algorithm will continued until classification error become less than 9% and cluster s boundaries do not change or have minimum change, compared with the previous time, this minimum is not always the same, because it sometimes does not find any intersection with the first condition. The threshold value for the first run as 0.2 and every run it lose 0.01 of its value. Table 1 shows result of this kind of clustering and classification, the value of r considered 1 all the time. Table 1: Accuracy table for training data p q # Misclassified Data #Found Clusters Classifying Accuracy % % % % Table 1 shows that increasing values of p and q make clusters more compact, because it decreases the similarity value, and obviously increasing both values make clusters looser. Table 2 shows the same algorithm, performed on vectors that become extracted from other tags in, HTML documents, those tags are <title>, <meta> tags and <p> inside body of text (TMP). Table 2: Accuracy table for training data obtained from other tags p q # Misclassified Data #Found Clusters Classifying Accuracy % % % % The average precision rate by using vectors that become extracted from TMP tags is 91.6, and for <a> tag is 89.76, so <a> tag could be a good representative structure for documents clustering purposes, it also reduce the processing time, cause vectors are much more shorter and need less time to be processed. But even TMP data need less processing time than fixed size vectors and make the processing time more acceptable without decreasing of precision.
7 626 Amir Masoud Rahmani; Zahra Hossaini and Saeed Setayeshi Figure 4 shows the behavior of these two different kinds of vectors, in this figure the number of produced clusters during process iterations is shown, until algorithm arrives to stop criteria. Figure 4: Comparing results of <a> exteracted vectors with TMP exteracted vectors Found vectors by means of <a> tag not only reduce the number of iterations, but also reduce whole processing time. Figure 5 shows the distance value of a data point from centers in an iteration, at this time 17 clusters center are available and data point belongs to 14 th data center or cluster. Figure 5: Distance values of a vector from centers table 3 show accuracy results of testing data for correspond rows in table1. Table 3: Accuracy table for testing data #Misclassidied Acuuracy % % % % 8 90%
8 Link Processing for Fuzzy Web Pages Clustering and Classification Conclusion Clustering is a way for arranging related objects in the same groups, without any prior knowledge, and classifying is also another method of grouping according to some known patterns. In these paper <a> tag and fuzzy clustering used to gather, in order to find good clusters, in short time, that could be used in real time applications, without too much losing of precision rate. It is obvious that <a> tag could not have all related data, but it still can act as a useful structure and using fuzzy clustering could improve the correctness of results. For increasing correctness and diversity of results, the proposed method could also be combined with genetic algorithms, and make a genetic-fuzzy clustering me. References [1] Chin Wen cheong, V.Ramachandran, Genetic Based Web Cluster Dynamic Load Balancing in Fuzzy Environment, ieee conference, pp ,2000. [2] Daniel T.Larose Discovering Knowledge in Data an Introduction to Data Mining, Wiley corporation,pp ,2005. [3] Dehu Qi, Bo Sun, A Genetic K-means Approaches for Automated Web Page Classification, Information reuse and integration conference, pp , [4] G.S. Tomar,Shekhar Verma, Ashish Jha, Web Page Classification using Modified NaïveBayesian Approach,2006. [5] Esteban Meneses, Vectors and Graphs,Two Representations to Cluster Web Sites Using Hyperstructure, Fourth Latin American Web Congress (LA-WEB'06), [6] Ho Tu Bao, Introduction To Knowledge Discovery and Data Mining, Institute of Information Technology National Center for Natural Science and Technology, pp ,6-79. [7] Hui Zhang, Han-Tao Song, Fuzzy Related classification Approach Based on Semantic Measurement for Web Document, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06),2006. [8] Jiu-zhen Liang, Chinese Web Page Classification Based on Self-Organizing MappingNeural Networks, the Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 03), [9] Lin Yu Tseng and Shiueng Bien Yang, Genetic Algorithms for Clustering, Feature Selection and Classification, international conference on data publication, vol.3, pp , [10] Manish Sarkar,B. Yegnanarayana, A Clustering Algorithm Using Evolutionary Programming, neural net conference, vol.2, pp , [11] Menahem Friedman, Abraham Kandel, Moti Schneider, Mark Last, Bracha Shapka, Yuval Elovici, Omer Zaafrany, A Fuzzy-Based Algorithm for Web Document, pp [12] Weimin Xue and Hong Bao Weimin Xue, Weitong Huang and Yuchang Lu, Web Page Classification Based on SVM, 6th World Congress on Intelligent Controland Automation, pp , [13] Yaqin Zhao, Guizhong Tang, Dakuan Wei, Xianzhong Zhou, Guangming Zhang, A Clustering algorithm Based on Probabilistic Crowding and K-means, 6th World Congress on Intelligent Control and Automation, pp [14] Zhang Mao-yuan, Lu Zheng-ding, A Fuzzy Classification Based on Feature Selection for Web Pages, International Conference on Web Intelligence (WI 04), 2004.
A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationA FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer
More informationInner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationData Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationAn Imbalanced Spam Mail Filtering Method
, pp. 119-126 http://dx.doi.org/10.14257/ijmue.2015.10.3.12 An Imbalanced Spam Mail Filtering Method Zhiqiang Ma, Rui Yan, Donghong Yuan and Limin Liu (College of Information Engineering, Inner Mongolia
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationComparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationCredit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationUse of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
More informationEnhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
More informationWeb Usage Mining: Identification of Trends Followed by the user through Neural Network
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web
More informationThe Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
More informationManagement Science Letters
Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationAbout the NeuroFuzzy Module of the FuzzyTECH5.5 Software
About the NeuroFuzzy Module of the FuzzyTECH5.5 Software Ágnes B. Simon, Dániel Biró College of Nyíregyháza, Sóstói út 31, simona@nyf.hu, bibby@freemail.hu Abstract: Our online edition of the software
More informationFLBVFT: A Fuzzy Load Balancing Technique for Virtualization and Fault Tolerance in Cloud
2015 (8): 131-135 FLBVFT: A Fuzzy Load Balancing Technique for Virtualization and Fault Tolerance in Cloud Rogheyeh Salehi 1, Alireza Mahini 2 1. Sama technical and vocational training college, Islamic
More informationNeural Network based Vehicle Classification for Intelligent Traffic Control
Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN
More informationInternational Journal of Mechatronics, Electrical and Computer Technology
Improving Ranking Persian Subjects in Search Engine Using Fuzzy Inference System Elaheh Golzardi 1*, Majid Meghdadi 2 and Abdolbaghi Ghaderzade 1 1 Department of Computer Sciences, Research Branch, Islamic
More informationModeling and Design of Intelligent Agent System
International Journal of Control, Automation, and Systems Vol. 1, No. 2, June 2003 257 Modeling and Design of Intelligent Agent System Dae Su Kim, Chang Suk Kim, and Kee Wook Rim Abstract: In this study,
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationDesign call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
More informationEnhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationResearch and Implementation of View Block Partition Method for Theme-oriented Webpage
, pp.247-256 http://dx.doi.org/10.14257/ijhit.2015.8.2.23 Research and Implementation of View Block Partition Method for Theme-oriented Webpage Lv Fang, Huang Junheng, Wei Yuliang and Wang Bailing * Harbin
More informationA new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationFinancial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
More informationData Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract - Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
More informationQuality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More informationA New Method for Traffic Forecasting Based on the Data Mining Technology with Artificial Intelligent Algorithms
Research Journal of Applied Sciences, Engineering and Technology 5(12): 3417-3422, 213 ISSN: 24-7459; e-issn: 24-7467 Maxwell Scientific Organization, 213 Submitted: October 17, 212 Accepted: November
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationContinuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationMonitoring and Warning System for Information Technology (IT) Outsource Risk in Commercial Banks Based on Nested Theory of Excel Logical Function
Advance Journal of Food Science and Technology 9(4): 302-307, 2015 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2015 Submitted: March 3, 2015 Accepted: March 14, 2015 Published:
More informationThe Use of Data Mining Classification Techniques to Predict and Diagnose of Diseases
205, TextRoad Publication ISSN: 2090-4274 Journal of Applied Environmental and Biological Sciences www.textroad.com The Use of Data Mining ification Techniques to Predict and Diagnose of Diseases Sajjad
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationIDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationSegmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationWeb Service Monitoring Scheduler based on evaluated QoS in Dynamic Environment
Quest Journals Journal of Software Engineering and Simulation Volume 2 ~ Issue 3 (2014) pp: 01-08 ISSN(Online) :2321-3795 ISSN (Print):2321-3809 www.questjournals.org Research Paper Web Service Monitoring
More informationA QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
More informationOpen Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition
Send Orders for Reprints to reprints@benthamscience.ae The Open Electrical & Electronic Engineering Journal, 2014, 8, 599-604 599 Open Access A Facial Expression Recognition Algorithm Based on Local Binary
More informationLife Insurance Customers segmentation using fuzzy clustering
Available online at www.worldscientificnews.com WSN 21 (2015) 38-49 EISSN 2392-2192 Life Insurance Customers segmentation using fuzzy clustering Gholamreza Jandaghi*, Hashem Moazzez, Zahra Moradpour Faculty
More informationKnowledge Acquisition Approach Based on Rough Set in Online Aided Decision System for Food Processing Quality and Safety
, pp. 381-388 http://dx.doi.org/10.14257/ijunesst.2014.7.6.33 Knowledge Acquisition Approach Based on Rough Set in Online Aided ecision System for Food Processing Quality and Safety Liu Peng, Liu Wen,
More informationPreprocessing Web Logs for Web Intrusion Detection
Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationTIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland Youming.Zhang@uta.fi Course Description Content
More informationA FUZZY LOGIC APPROACH FOR SALES FORECASTING
A FUZZY LOGIC APPROACH FOR SALES FORECASTING ABSTRACT Sales forecasting proved to be very important in marketing where managers need to learn from historical data. Many methods have become available for
More informationDesign of Prediction System for Key Performance Indicators in Balanced Scorecard
Design of Prediction System for Key Performance Indicators in Balanced Scorecard Ahmed Mohamed Abd El-Mongy. Faculty of Systems and Computers Engineering, Al-Azhar University Cairo, Egypt. Alaa el-deen
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationAn Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
More informationAn Automated Guided Model For Integrating News Into Stock Trading Strategies Pallavi Parshuram Katke 1, Ass.Prof. B.R.Solunke 2
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue - 12 December, 2015 Page No. 15312-15316 An Automated Guided Model For Integrating News Into Stock Trading
More informationARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES
FOUNDATION OF CONTROL AND MANAGEMENT SCIENCES No Year Manuscripts Mateusz, KOBOS * Jacek, MAŃDZIUK ** ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES Analysis
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationA comparative study of bankruptcy prediction models of Fulmer and Toffler in firms accepted in Tehran Stock Exchange
Journal of Novel Applied Sciences Available online at www.jnasci.org 2013 JNAS Journal-2013-2-10/522-527 ISSN 2322-5149 2013 JNAS A comparative study of bankruptcy prediction models of Fulmer and Toffler
More informationMonitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control Phudinan Singkhamfu, Parinya Suwanasrikham Chiang Mai University, Thailand 0659 The Asian Conference on
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationKEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
More informationA NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM
A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM PPrabhu 1 and NAnbazhagan 2 1 Directorate of Distance Education, Alagappa University, Karaikudi, Tamilnadu, INDIA 2 Department of Mathematics,
More informationDevelopment of an Enhanced Web-based Automatic Customer Service System
Development of an Enhanced Web-based Automatic Customer Service System Ji-Wei Wu, Chih-Chang Chang Wei and Judy C.R. Tseng Department of Computer Science and Information Engineering Chung Hua University
More informationA Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia
A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value Amir Hossein Azadnia a,*, Pezhman Ghadimi b, Mohammad Molani- Aghdam a a Department of Engineering, Ayatollah Amoli
More informationAmerican International Journal of Research in Science, Technology, Engineering & Mathematics
American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-349, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629
More informationThe Multi-Item Capacitated Lot-Sizing Problem With Safety Stocks In Closed-Loop Supply Chain
International Journal of Mining Metallurgy & Mechanical Engineering (IJMMME) Volume 1 Issue 5 (2013) ISSN 2320-4052; EISSN 2320-4060 The Multi-Item Capacated Lot-Sizing Problem Wh Safety Stocks In Closed-Loop
More informationPrototype-based classification by fuzzification of cases
Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university pkord@telin.ugent.be Bernard De Baets Dep. Applied Mathematics
More information5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
More informationRole of Neural network in data mining
Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)
More informationA Survey on Intrusion Detection System with Data Mining Techniques
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
More informationNTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling
1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationEvaluation of Forest Road Network Planning According to Environmental Criteria
American-Eurasian J. Agric. & Environ. Sci., 9 (1): 91-97, 2010 ISSN 1818-6769 IDOSI Publications, 2010 Evaluation of Forest Road Network Planning According to Environmental Criteria Amir Hosian Firozan,
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationA Novel Binary Particle Swarm Optimization
Proceedings of the 5th Mediterranean Conference on T33- A Novel Binary Particle Swarm Optimization Motaba Ahmadieh Khanesar, Member, IEEE, Mohammad Teshnehlab and Mahdi Aliyari Shoorehdeli K. N. Toosi
More informationA Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
More informationAutomated Medical Citation Records Creation for Web-Based On-Line Journals
Automated Medical Citation Records Creation for Web-Based On-Line Journals Daniel X. Le, Loc Q. Tran, Joseph Chow Jongwoo Kim, Susan E. Hauser, Chan W. Moon, George R. Thoma National Library of Medicine,
More informationEffect of Using Neural Networks in GA-Based School Timetabling
Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA janis.zuters@lu.lv Abstract: - The school
More information