A Tool for Web Usage Mining
|
|
|
- Julie Austin
- 10 years ago
- Views:
Transcription
1 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'07), December, 2007, Birmingham, UK. A Tool for Web Usage Mining Jose M. Domenech 1 and Javier Lorenzo 2 1 Hospital Juan Carlos I Real del Castillo Las Palmas - Spain [email protected] 2 Inst. of Intelligent Systems and Num. Applic. in Engineering Univ. of Las Palmas Campus Univ. de Tafira Las Palmas - Spain [email protected] Abstract. This paper presents a tool for web usage mining. The aim is centered on providing a tool that facilitates the mining process rather than implement elaborated algorithms and techniques. The tool covers different phases of the CRISP-DM methodology as data preparation, data selection, modeling and evaluation. The algorithms used in the modeling phase are those implemented in the Weka project. The tool has been tested in a web site to find access and navigation patterns. 1 Introduction Discovering knowledge from large databases has received great attention during the last decade being the data mining the main tool to make it [1]. The world wide web has been considered as the largest repository of information but it lacks of a well defined structure. Thus the world wide web is a good environment to make data mining receiving the name of Web Mining [2, 3]. Web mining can be divided into three main topics: Content Mining, Structure Mining and Usage Mining. This work is focused on Web Usage Mining (WUM) that has been defined as the application of data mining techniques to discover usage patterns from Web data [4]. Web usage mining can provide patterns of usage to the organizations in order to obtain customer profiles and therefore they can make easier the website browsing or present specific products/pages. The latter has a great interest for businesses because it can increase the sales if they offer only appealing products to the customers although as pointed out Anand (Anand et al, 2004), it is difficult to present a convincing case for Return on Investment. The success of data mining applications, as many other applications, depend on the development of a standard. CRISP-DM, (Standard Cross-Industry Process for Data Mining) (CRISP-DM, 2000) is a consortium of companies that has defined and validated a data mining process that can be used into different data mining projects as web usage mining. The life cycle of a data mining project is defined by CRISP-DM into 6 stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment. The Business Understanding phase is highly connected with the problem to be solved because they defined the business objectives of the application. The last
2 one, Deployment, is not easy to make automatically because each organization has its own information processing management. For the rest of stages a tool can be designed in order to facilitate the work of web usage mining practitioners and reduce the development of new applications. In this work we implement the WEBMINER architecture [5] which divides the WUM process into three main parts: preprocessing, pattern discovery and pattern analysis. This three parts corresponds to the data preparation, modeling and evaluation of the CRISP-DM model. In this paper we present a tool to facilitate the Web Usage Mining based on the WEBMINER architecture. The tool is conceived as a framework where different techniques can be used in each stage facilitating in this way the experimentation and thus eliminating the need of programming the whole application when we are interested in studying the effect of a new method in the mining process. The architecture of the tool is shown in Figure 1 and the different elements that makes up it will be described. Thus, the paper is organized as follows. Section 2 will describe the data preprocessing. In sections 3 and 5 different approaches to user session and transactions identification will be presented. Finally in sections 6 and 7 the models to be generate and the results are presented. <<Table>> log Data preprocessing Server logs Site map Session identification Sessions Classified pages Association rules discovering Rules Site map Feature Extraction Classifier training Site page classification Access Patterns Web site crawler Classified pages Clustering Browsing Patterns Fig. 1. WUM tool architecture 2 Web Log Processing Data source for Web Usage Mining come from different sources as proxy, web log files, web site structure and even from sniffer packet logs. Normally, the most widely used sources are the web log files. These files record the user accesses to the site and there exists several formats: NCSA (Common Log Format), W3C Extended, SunTM ONE Web Server (iplanet), IBM Tivoli Access Manager
3 WebSEAL or WebSphere Application Server Logs. The most of the web servers record the access using an extension of the CLF (ECLF). In ECLF basically the recorded information for each access is: remote host: Remote hostname. (or IP address number if DNS hostname is not available or was not provided) rfc931 : The remote login name of the user. (If not available a minus sign is typically placed in the field) authuser: The username as which the user has authenticated himself. This is available when using password protected WWW pages. (If not available a minus sign is typically placed in the field) date: Date and time of the request. request: The request line exactly as it came from the client. (i.e., the file name, and the method used to retrieve it [typically GET]) status: The HTTP response code returned to the client. Indicates whether or not the file was successfully retrieved, and if not, what error message was returned. bytes: The number of bytes transferred. referer: The url the client was on before requesting your url. (If it could not be determined a minus sign will be placed in this field) user agent: The software the client claims to be using. (If it could not be determined a minus sign will be placed in this field) As said before, web server logs record all the user accesses including for each visited page all the elements that composed it as gif images, styles or scripts. Other entries in the log refers to fail requests to the server as 404 Error: Object not found. So a first phase in data preparation consists of filtering the log entries removing all useless entries. Others entries in the web log that must be removed, are those that correspond to search robots because they do not corresponds to a true user. To filter these entries it can be used the plain text file Robot.txt, the list of known search robots and we have introduced an heuristic that is to filter those very quick consecutive requests because a characteristic of search robots is the short delay between page requests. So with a threshold of 2 seconds between two consecutive requests the entries that corresponds to robots can be eliminated. The structure of the site has been used as another data source. This structure is obtained with a web crawler starting from the root, so all the pages that can be reached from the root will composed the structure of it. For non static sites the structure must be introduced by hand. 3 User Session Identification Once the web log file is processed and all the irrelevant entries has been removed, it is necessary to identify the users that visit to the site. The visits are concurrent so in the log file the entries of different users are interlaced what makes us process it to collect the entries that belong to the same user.
4 A first approach to identify a user is to use the IP address and assign all the entries with the same IP to the same user. This approach exhibits some drawbacks. Some users access to internet through a proxy so many users will share the same IP. In other cases the same user has different IP because it has a dynamic IP configuration in its ISP. In order to minimize these effects some heuristics has been applied. A first heuristic is to detect changes in the browser or in the operative system fields of the entries that come from the same IP. Another heuristic makes use of the referer field and the map of the site obtained with the site crawler mentioned previously. Thus if a page is not directly linked to the pages previously visited by the user, it is an evidence that another user share the same IP and browser. With the explained heuristics we will get false positive, that is to consider only one user when actually are different users. After identifying the users that have visited the site, the next phase is to obtain the user sessions. A session is made up of all the visited pages by a user. The technique is based on establishing a time threshold, so if two accesses take more than the fixed time threshold, it is considered as a new session [6, 7]. Many commercial products establish a threshold of 30 minutes. Catledge and Pitkow [8] define this threshold in 25.5 minutes based on empirical studies. 4 Web Page Classification After data cleaning, the next stage in the data preparation phase is to compute the features of each page in the site. The following features has been defined: Size: Size of the page in bytes. Num. incoming links. Num. outcoming links. Frequency: Number of times the page was requested in a period of time. Source: Number of times the page is the starting point in a session. Similarity: Similarity of a page with its sibling pages based on a tbd computation. Depth: Average depth of the sibling pages. The depth of a pages is measured as the number of / in the URL. From the previous features it can be obtained a model for diferent pages which avoid to the webmaster to annotate each of the page in the site. In this work we have defined the following pages of interest: Home page: It is the first visited page by the users. Content page: It contains a part of the site information. Auxiliary page: Users can use this page to visit other pages in the site. Reference page: Explain a concept or it has references Personal page: It contains biographic information of the organization staff. To avoid the computational cost of training a classifier with the whole set of features, a previous feature selection stage is made. The initial feature set is
5 filtered using the GD measure [9], which is based on information theory concepts, in order to select the most informative features. This measure allows to rank the features according to the relevance with the concept and it also detects redundant features that can be removed from the initial feature set. In a small web site, pages can be manually tagged as home page, content page and so on, but in a medium or large web site this is not affordable. Therefore it is necessary an automatic or semi-automatic method to tag the pages. In this proposal a phase of page classification is include (Figure 1) based on a learned model for the different categories of pages and using the features defined above. Hwanjo et al. [10] propose to use SVM with positive samples to classify web pages. Xu et al. [11] also introduce the SVM to deal with the heterogeneous data that appear in a web page as link, plain text, title page or anchor text. Holden and Freitas [12] make use of the Ant Colony paradigm to find a set of rules that classify the web pages into several categories. The study of complex web page classification algorithms is out of the scope of this paper so two well known learning methods have been included: naive-bayes and C4.5. In this tool, a supervised learning stage has been included. The user selects and manually tags a set of pages that makes up the initial training set. With this initial training set, the learning process is launched and the results are tested by the user. If there are bad classified pages, the user can introduce them into the learning set with the correct tag. After some cycles, a correct model is obtained and the pages of the site are classified. 5 Transaction Identification A transaction is defined as a set of homogeneous pages that have been visited in a user session. Each user session can be considered as only one transaction composed of all the visited pages or it can divided into a smaller set of visited pages. The transaction identification process is based on a split and merge process in order to look for a suitable set of transactions that can be used in a data mining task. Formally, a transaction is composed of an IP address, a user identification and a set of visited pages which are identified by its URL and access time. t =< ip t, uid t, {(l t 1.url, l t 1.time),..., (l t m.url, l t m.time)} > F or 1 k m, l t k L, l t k.ip = ip t, l t k.uid = uid t (1) To realize the split stage in the transaction identification there are different strategies. Transaction Identification by Reference Length This strategy, proposed by Cooley et. al. [2], is based on the assumption that the time that a user spends in an auxiliary page is lower than a content page. Obtaining a time t
6 by a maximum likelihood estimation and defining a threshold C, the pages are added to the transaction if they are considered auxiliary-content: 1 k (m 1) : lk trl length C and k = m : lk trl length > C (2) While for only content pages transactions: 1 k m : l trl k length > C (3) Transaction identification by Maximum Forward Reference This strategy is based on the idea proposed by Chen et al. [13]. A transaction is considered as the set of pages from the first visited page until the previous page where the user does a back reference. A back reference appears when the user accesses again to a previously visited page in the current session, while a forward reference is to access to a page not previously visited in the current session. So the maximum forward reference are the content pages and the path to the maximum reference is composed of index pages. Transaction Identification by Time Window This strategy divides a user session into time intervals lower than a fixed threshold. In this strategy the last visited page normally does not correspond to a content page unlike the previous strategy. If W is the size of the time window, the accesses that are included to the transaction (1) are those that fulfill: l t mtime l t 1time W (4) This strategy is normally combined with the previous ones. 6 Model Generation To characterize the visitors of the web site, it is interesting to detect the access patterns, that is, what type of pages are visited and also navigation patterns, that is, how the visitors browse the pages in the web site. Both patterns are of interest because they can help the web site designer to improve the usability or visibility of the site. To get these patterns a clustering stage is introduced into the tool and although many works have been proposed to deal with this problem [14 16], in the tool three well know methods have been used: Expectation Maximization, K-means and Cobweb. As input to the previous methods, both the identified sessions and the transactions are used. Another information that is useful for the web site designer is to know if there exists any unusual relation among the pages that are visited by users. This information can be easily extracted from the transactions and user sessions by means of an association rule discovering module. The Apriori method proposed by Agrawal [17] has been used.
7 7 Experiments The tool was implemented in Java and the learning methods were the ones implemented in Weka [18] and by now we are only focused in the development of the framework it will allow us to introduce new learning methods. The appearance of the tool is shown in Figure 2. Fig. 2. DCW tool To test the approach we select a log file corresponding to a week of accesses to the site of the Department of Computer Science (DIS) of the Univ. of Las Palmas de Gran Canaria ( The log file has entries that after the preprocessing phase (Sec. 2) it is reduced to entries. Fixing a time threshold for session identification (Sec. 3) to 30 minutes, 9460 sessions were identified being the main page of the DIS the most visited pages with 1571 accesses. After the session identification the next stage is to train the classifier to tag the pages. In this experiment the pages where divided into two categories: content and auxiliary. The C4.5 algorithm was used to do induce a decision tree model and an accuracy of 85% was achieved. Once the pages of the site are classified into auxiliary or content category, the pattern extraction is carried out. To get the access pattern of the visitors a clustering phase with EM algorithm is done. The results are the shown in Table 1. Two different clusters are obtained with correspond to users that visit mainly content pages while the other cluster represents the visitors that browse auxiliary pages. The first cluster could correspond to students and staff while the second one could correspond to curious visitors because they only browse auxiliary pages. To get the access patterns, only sessions of 3, 4 o 5 accesses (pages visited) are considered. Table 2 shows the clusters obtained for 3 accesses sessions. The largest cluster corresponds to sessions that end up in auxiliary pages which
8 Content pages Auxiliary pages Cluster Prob.= D = D = Cluster Prob.= D = D = Log likelihood: Table 1. Access patterns results with EM clustering means that the user abandons the site before reaching a page that gives useful information. access 0 access 1 access 2 Cluster 0 Auxiliary page Auxiliary page Auxiliary page Prob.= (376.03) (376.03) (376.03) Cluster 1 Content page Content page Content page Prob.= (175.97) (175.97) (175.97) Log likelihood: Table 2. Navigation patterns for 3 accesses sessions Table 3 shows the results for the access patterns of session with 4 accesses and here it can be noted that the two largest clusters correspond to sessions that finish in content pages and only a small amount of sessions end up in auxiliary pages which can imply that the visitor does not find the information that was looking for. access 0 access 1 access 2 access 3 Cluster 0 Content page Content page Content page Content page Prob.= (93.95) 92.9(93.95) 87.82(93.95) 86.71(93.95) Cluster 1 Auxiliary page Auxiliary page Auxiliary page Auxiliary page Prob.= (36.51) 35.49(36.51) 35.51(36.51) 35.49(36.51) Cluster 2 Auxiliary page Auxiliary page Auxiliary page Content page Prob.= (11.54) 8.46(11.54) 10.35(11.54) 9.27(11.54) Log likelihood: Table 3. Navigation patterns for 4 accesses sessions Table 4 shows the association rules that were obtained with Apriori algorithms. The rules do not contribute to generate new knowledge because they are very obvious. For example the first rule expresses that if the second visited page is the studies Informatic Engineering, the first page was the root of the site. As the aim of this work is to present the framework for a WUM tool, therefore a comparative of the results with other techniques has not been carried out
9 Rules Support Confidence access1=/subject/index.asp?studies=itis = access0=/ 16 1 access1=/subject/index.asp?studies=ii = access0=/ 14 1 access2=/staff/ = access0=/ 16 1 access2=/student/ = access0=/ 15 1 Table 4. Association rules because they can be found in the literature. Comparing with other open source tools, we have found that the most similar is WUMprep [19] which only cover part of the Data Preparation stage and unlike DCW that has a GUI, WUMprep it is based on Perl script. In relation to the model generation and validation there are two well-know tools as Weka [18] and RapidMiner [20]. They are oriented to data mining in general and the previous stage of web log cleaning must be done with another tools. 8 Conclusions In this work a tool for Web Usage Mining has been presented. It allows to realize all phases to get access and navigation patterns and also association rules. The implementation was done in Java and making use of the Weka inducers which allow to test new model induction algorithms. To test the tool, some experiments were carried out with a log file of more than entries and they reveal some behaviors of the visitors that the designer of the web do not know and it can help them to redesign the web site to offer a better service to the students and staff of the Department of Computer Science of the ULPGC. Future work is twofold. On the one hand, some elements of the proposed tool needs to be improved to tackle for example with dynamics web sites. One the other hand, other methods can be tested in the classification and clustering phases. In the page classification phase the computation of new features and the use of SVM as classifier. Acknowledgements This work has been partially supported by the Spanish Ministry of Education and Science and FEDER funds under research project TIN Thanks to Miguel Garcia from La Laguna University for his implementation in Java of the GD Measure. References 1. Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. The MIT Press (2001) 2. R. Cooley, B. Mobasher, J.S.: Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems 1 (1999) 5 32
10 3. Chakrabarti, S.: Mining the Web.Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers (2003) 4. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1 (2000) Cooley, R., Srivastava, J., Mobasher, B.: Web mining: Information and pattern discovery on the world wide web. In: Proc. of the 9th IEEE International Conference on Tools with Artificial Intellegene (ICTAI 97). (1997) 6. Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The impact of site structure and user environment on session reconstruction in web usage analysis. In: WEBKDD Mining Web Data for Discovering Usage Patterns and Profiles. LNAI 2703 (2003) Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web usage analysis. INFORMS Journal on Computing 15 (2003) Catledge, L.D., Pitkow, J.E.: Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27 (1995) Lorenzo, J., Hernández, M., Méndez, J.: Detection of interdependences in attribute selection. Lecture Notes in Computer Science 1510 (1998) Yu, H., Han, J., Chang, K.C.C.: Pebl: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering 16 (2004) Xu, Z., King, I., Lyu, M.R.: Web page classification with heterogeneous data fusion. In: Proceedings of the Sixteenth International World Wide Web Conference (WWW2007), Alberta, Canada (2007) Holden, N., Freitas, A.: Web page classification with an ant colony algorithm. In et al., X.Y., ed.: Parallel Problem Solving from Nature - PPSN VIII, LNCS 3242, Springer-Verlag (2004) Chen, M.S., Park, J.S., Yu, P.S.: Data mining for path traversal patterns in a Web environment. In Proceedings of the 16th International Conference on Distributed Computing Systems (1996) 14. Xiao, J., Zhang, Y.: Clustering of web users using session-based similarity measures. In: 2001 International Conference on Computer Networks and Mobile Computing (ICCNMC 01). (2001) Bianco, A., Mardente, G., Mellia, M., Munafo, M., Muscariello, L.: Web user session characterization via clustering techniques. In: Global Telecommunications Conference, GLOBECOM 05. IEEE. (2005) 16. Chen, L., Bhowmick, S.S., Li, J.: Cowes: Clustering web users based on historical web sessions. In: 11th International Conference on Database Systems for Advanced Applications, Singapore (2006) 17. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th Int. Conference on Very Large Data Bases. (1994) Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. 2 edn. Morgan Kaufmann, San Francisco (2005) 19. Pohle, C., Spiliopoulou, M.: Building and exploiting ad hoc concept hierarchies for web log analysis. (In: Data Warehousing and Knowledge Discovery, Proceedings of the 4th International Conference, DaWaK 2002) 20. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: Yale (now: Rapidminer): Rapid prototyping for complex data mining tasks. (In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006))
PREPROCESSING OF WEB LOGS
PREPROCESSING OF WEB LOGS Ms. Dipa Dixit Lecturer Fr.CRIT, Vashi Abstract-Today s real world databases are highly susceptible to noisy, missing and inconsistent data due to their typically huge size data
WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques
From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques Howard J. Hamilton, Xuewei Wang, and Y.Y. Yao
Data Preprocessing and Easy Access Retrieval of Data through Data Ware House
Data Preprocessing and Easy Access Retrieval of Data through Data Ware House Suneetha K.R, Dr. R. Krishnamoorthi Abstract-The World Wide Web (WWW) provides a simple yet effective media for users to search,
KOINOTITES: A Web Usage Mining Tool for Personalization
KOINOTITES: A Web Usage Mining Tool for Personalization Dimitrios Pierrakos Inst. of Informatics and Telecommunications, [email protected] Georgios Paliouras Inst. of Informatics and Telecommunications,
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
An Effective Analysis of Weblog Files to improve Website Performance
An Effective Analysis of Weblog Files to improve Website Performance 1 T.Revathi, 2 M.Praveen Kumar, 3 R.Ravindra Babu, 4 Md.Khaleelur Rahaman, 5 B.Aditya Reddy Department of Information Technology, KL
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 [email protected]
CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS
CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS 3.1 Introduction In this thesis work, a model is developed in a structured way to mine the frequent patterns in e-commerce domain. Designing and implementing
ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING
ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING L.K. Joshila Grace 1, V.Maheswari 2, Dhinaharan Nagamalai 3, 1 Research Scholar, Department of Computer Science and Engineering [email protected]
AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING
AN EFFIIENT APPROAH TO PERFORM PRE-PROESSING S. Prince Mary Research Scholar, Sathyabama University, hennai- 119 [email protected] E. Baburaj Department of omputer Science & Engineering, Sun Engineering
Advanced Preprocessing using Distinct User Identification in web log usage data
Advanced Preprocessing using Distinct User Identification in web log usage data Sheetal A. Raiyani 1, Shailendra Jain 2, Ashwin G. Raiyani 3 Department of CSE (Software System), Technocrats Institute of
Bisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation [email protected] ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
Association rules for improving website effectiveness: case analysis
Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, [email protected] Tanja Krunić, The
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm
International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 2, Issue 5 (March 2013) PP: 16-21 Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm
Web Usage Association Rule Mining System
Interdisciplinary Journal of Information, Knowledge, and Management Volume 6, 2011 Web Usage Association Rule Mining System Maja Dimitrijević The Advanced School of Technology, Novi Sad, Serbia [email protected]
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Ms.Dipa Dixit 1 Mr Jayant Gadge 2 Lecturer 1 Asst.Professor 2 Fr CRIT, Vashi Navi Mumbai 1 Thadomal Shahani Engineering College,Bandra 2
A Survey on Web Mining From Web Server Log
A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo 178627 Database And Data Mining Research Group Summary RapidMiner project Strengths How to use RapidMiner Operator
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.
REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department
Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data
Identifying the Number of to improve Website Usability from Educational Institution Web Log Data Arvind K. Sharma Dept. of CSE Jaipur National University, Jaipur, Rajasthan,India P.C. Gupta Dept. of CSI
Preprocessing Web Logs for Web Intrusion Detection
Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer
Pre-Processing: Procedure on Web Log File for Web Usage Mining
Pre-Processing: Procedure on Web Log File for Web Usage Mining Shaily Langhnoja 1, Mehul Barot 2, Darshak Mehta 3 1 Student M.E.(C.E.), L.D.R.P. ITR, Gandhinagar, India 2 Asst.Professor, C.E. Dept., L.D.R.P.
Web Mining as a Tool for Understanding Online Learning
Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA [email protected] James Laffey University of Missouri Columbia Columbia, MO USA [email protected]
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
An application for clickstream analysis
An application for clickstream analysis C. E. Dinucă Abstract In the Internet age there are stored enormous amounts of data daily. Nowadays, using data mining techniques to extract knowledge from web log
FRAMEWORK FOR WEB PERSONALIZATION USING WEB MINING
FRAMEWORK FOR WEB PERSONALIZATION USING WEB MINING Monika Soni 1, Rahul Sharma 2, Vishal Shrivastava 3 1 M. Tech. Scholar, Arya College of Engineering and IT, Rajasthan, India, [email protected] 2 M.
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
ANALYZING OF SYSTEM ERRORS FOR INCREASING A WEB SERVER PERFORMANCE BY USING WEB USAGE MINING
ISTANBUL UNIVERSITY JOURNAL OF ELECTRICAL & ELECTRONICS ENGINEERING YEAR VOLUME NUMBER : 2007 : 7 : 2 (379-386) ANALYZING OF SYSTEM ERRORS FOR INCREASING A WEB SERVER PERFORMANCE BY USING WEB USAGE MINING
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Data Mining of Web Access Logs
Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer
Web Log Based Analysis of User s Browsing Behavior
Web Log Based Analysis of User s Browsing Behavior Ashwini Ladekar 1, Dhanashree Raikar 2,Pooja Pawar 3 B.E Student, Department of Computer, JSPM s BSIOTR, Wagholi,Pune, India 1 B.E Student, Department
A Cube Model for Web Access Sessions and Cluster Analysis
A Cube Model for Web Access Sessions and Cluster Analysis Zhexue Huang, Joe Ng, David W. Cheung E-Business Technology Institute The University of Hong Kong jhuang,kkng,[email protected] Michael K. Ng,
On the design concepts for CRM system
Jeong Yong Ahn Department of Computer Science and Informatics, Seonan University, Namwon, Korea Seok Ki Kim Division of Mathematics and Statistical Informatics, Chonbuk National University, Chonju, Korea
AN OVERVIEW OF PREPROCESSING OF WEB LOG FILES FOR WEB USAGE MINING
AN OVERVIEW OF PREPROCESSING OF WEB LOG FILES FOR WEB USAGE MINING N. M. Abo El-Yazeed Demonstrator at High Institute for Management and Computer, Port Said University, Egypt [email protected]
Web Usage Mining for a Better Web-Based Learning Environment
Web Usage Mining for a Better Web-Based Learning Environment Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, Alberta, Canada email: zaianecs.ualberta.ca ABSTRACT Web-based
Analysis of Server Log by Web Usage Mining for Website Improvement
IJCSI International Journal of Computer Science Issues, Vol., Issue 4, 8, July 2010 1 Analysis of Server Log by Web Usage Mining for Website Improvement Navin Kumar Tyagi 1, A. K. Solanki 2 and Manoj Wadhwa
Web Log Mining: A Study of User Sessions
UNIVERSITY OF PADUA Department of Information Engineering PersDL 2007 10th DELOS Thematic Workshop on Personalized Access, Profile Management, and Context Awareness in Digital Libraries Corfu, Greece,
Generalization of Web Log Datas Using WUM Technique
Generalization of Web Log Datas Using WUM Technique 1 M. SARAVANAN, 2 B. VALARAMATHI, 1 Final Year M. E. Student, 2 Professor & Head Department of Computer Science and Engineering SKP Engineering College,
Cloud Mining: Web usage mining and user behavior analysis using fuzzy C-means clustering
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 7, Issue 2 (Nov-Dec. 2012), PP 09-15 Cloud Mining: Web usage mining and user behavior analysis using fuzzy C-means
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Arti Tyagi Sunita Choudhary
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining
Web Usage mining framework for Data Cleaning and IP address Identification
Web Usage mining framework for Data Cleaning and IP address Identification Priyanka Verma The IIS University, Jaipur Dr. Nishtha Kesswani Central University of Rajasthan, Bandra Sindri, Kishangarh Abstract
Identifying User Behavior by Analyzing Web Server Access Log File
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 327 Identifying User Behavior by Analyzing Web Server Access Log File K. R. Suneetha, Dr. R. Krishnamoorthi,
LiDDM: A Data Mining System for Linked Data
LiDDM: A Data Mining System for Linked Data Venkata Narasimha Pavan Kappara Indian Institute of Information Technology Allahabad Allahabad, India [email protected] Ryutaro Ichise National Institute of
Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Web Mining Functions in an Academic Search Application
132 Informatica Economică vol. 13, no. 3/2009 Web Mining Functions in an Academic Search Application Jeyalatha SIVARAMAKRISHNAN, Vijayakumar BALAKRISHNAN Faculty of Computer Science and Engineering, BITS
Why Google Analytics Cannot Be Used For Educational Web Content
Why Google Analytics Cannot Be Used For Educational Web Content Sanda-Maria Dragoş Chair of Computer Systems, Department of Computer Science Faculty of Mathematics and Computer Science Babes-Bolyai University
Web Usage Mining: Identification of Trends Followed by the user through Neural Network
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
IBM WEBSPHERE LOAD BALANCING SUPPORT FOR EMC DOCUMENTUM WDK/WEBTOP IN A CLUSTERED ENVIRONMENT
White Paper IBM WEBSPHERE LOAD BALANCING SUPPORT FOR EMC DOCUMENTUM WDK/WEBTOP IN A CLUSTERED ENVIRONMENT Abstract This guide outlines the ideal way to successfully install and configure an IBM WebSphere
An Enhanced Framework For Performing Pre- Processing On Web Server Logs
An Enhanced Framework For Performing Pre- Processing On Web Server Logs T.Subha Mastan Rao #1, P.Siva Durga Bhavani #2, M.Revathi #3, N.Kiran Kumar #4,V.Sara #5 # Department of information science and
Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval
Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval Mariângela Vanzin, Karin Becker, and Duncan Dubugras Alcoba Ruiz Faculdade de Informática - Pontifícia Universidade Católica do Rio
How To Evaluate Web Applications
A Framework for Exploiting Conceptual Modeling in the Evaluation of Web Application Quality Pier Luca Lanzi, Maristella Matera, Andrea Maurino Dipartimento di Elettronica e Informazione, Politecnico di
Introduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain [email protected] Horsens, Denmark, 26th September 2005
Implementation of a New Approach to Mine Web Log Data Using Mater Web Log Analyzer
Implementation of a New Approach to Mine Web Log Data Using Mater Web Log Analyzer Mahadev Yadav 1, Prof. Arvind Upadhyay 2 1,2 Computer Science and Engineering, IES IPS Academy, Indore India Abstract
A Survey on Web Mining Tools and Techniques
A Survey on Web Mining Tools and Techniques 1 Sujith Jayaprakash and 2 Balamurugan E. Sujith 1,2 Koforidua Polytechnic, Abstract The ineorable growth on internet in today s world has not only paved way
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS
MINING CLICKSTREAM-BASED DATA CUBES
MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt
ABSTRACT The World MINING 1.2.1 1.2.2. R. Vasudevan. Trichy. Page 9. usage mining. basic. processing. Web usage mining. Web. useful information
SSRG International Journal of Electronics and Communication Engineering (SSRG IJECE) volume 1 Issue 1 Feb Neural Networks and Web Mining R. Vasudevan Dept of ECE, M. A.M Engineering College Trichy. ABSTRACT
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Journal of Informetrics
Journal of Informetrics 4 (2010) 331 337 Contents lists available at ScienceDirect Journal of Informetrics journal homepage: www.elsevier.com/locate/joi Differences between web sessions according to the
Integrating Web Content Mining into Web Usage Mining for Finding Patterns and Predicting Users Behaviors
International Journal of Information Science and Management Integrating Web Content Mining into Web Usage Mining for Finding Patterns and Predicting Users Behaviors S. Taherizadeh N. Moghadam Group of
Applying Web Mining Application for User Behavior Understanding
Applying Web Mining Application for User Behavior Understanding ZAKARIA SULIMAN ZUBI Computer Science Department Faculty of Science Sirte University P.O Box 727 Sirte, Libya Email: [email protected] MUSSAB
Subject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
How To Analyze Web Server Log Files, Log Files And Log Files Of A Website With A Web Mining Tool
International Journal of Advanced Computer and Mathematical Sciences ISSN 2230-9624. Vol 4, Issue 1, 2013, pp1-8 http://bipublication.com ANALYSIS OF WEB SERVER LOG FILES TO INCREASE THE EFFECTIVENESS
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Visualizing e-government Portal and Its Performance in WEBVS
Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR [email protected] Abstract An e-government
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
Optimization of Image Search from Photo Sharing Websites Using Personal Data
Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search
E-CRM and Web Mining. Objectives, Application Fields and Process of Web Usage Mining for Online Customer Relationship Management.
University of Fribourg, Switzerland Department of Computer Science Information Systems Research Group Seminar Online CRM, 2005 Prof. Dr. Andreas Meier E-CRM and Web Mining. Objectives, Application Fields
Integrating Web Content Clustering into Web Log Association Rule Mining
Integrating Web Content Clustering into Web Log Association Rule Mining Jiayun Guo, Vlado Kešelj, and Qigang Gao Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS,
V.Chitraa Lecturer CMS College of Science and Commerce Coimbatore, Tamilnadu, India [email protected]
(IJCSIS) International Journal of Computer Science and Information Security, A Survey on Preprocessing Methods for Web Usage Data V.Chitraa Lecturer CMS College of Science and Commerce Coimbatore, Tamilnadu,
