Web Usage Mining Structuring semantically enriched clickstream data

Size: px
Start display at page:

Download "Web Usage Mining Structuring semantically enriched clickstream data"

Transcription

1 Web Usage Mining Structuring semantically enriched clickstream data by Peter I. Hofgesang Stud.nr A thesis submitted to the Department of Computer Science in partial fulfilment of the requirements for the degree of Master of Computer Science at the Vrije Universiteit Amsterdam, The Netherlands August 2004

2

3 supervisor Dr. Wojtek Kowalczyk Faculty of Sciences, Vrije Universiteit Amsterdam Department of Computer Science second reader Dr. Elena Marchiori Faculty of Sciences, Vrije Universiteit Amsterdam Department of Computer Science

4 Abstract Web servers worldwide generate a vast amount of information on web users browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational) of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. We discuss techniques and processes required for preparing, structuring and enriching web access logs. Furthermore we present several web usage mining methods for extracting useful features. Finally we employ all these techniques to cluster the users of the domain and to study their behaviours comprehensively. The contributions of this thesis are a data enrichment that is content and origin based and a treelike visualization of frequent navigational sequences. This visualization allows for an easily interpretable tree-like view of patterns with highlighted relevant information. The results of this project can be applied on diverse purposes, including marketing, web content advising, (re-)structuring of web sites and several other E-business processes, like recommendation and advertiser systems. 4

5 Content 1 Introduction Related research Data preparation Data description Cleaning access log data Data integration Storing the log entries An overall picture Data structuring User identification User groups Session identification An overall picture Profile mining models Mining frequent itemsets The mixture model The global tree model Analysing log files of the web server Input data Distribution of content-types within the VU-pages and access log entries Experiments on data structuring Mining frequent itemsets The mixture model The global tree model Conclusion and future work Acknowledgements Bibliography APPENDIX APPENDIX A. The uniform resource locator (URL) APPENDIX B. Input file structures APPENDIX C. Experimental details APPENDIX D. Implementation details APPENDIX E. Content of the CD-ROM

6 Structure This Master Thesis is organized as follows: Chapter 1, Introduction This chapter provides a high-level overview of the related research and main goals of this project. Chapter 2, Related research Chapter 2 gives a comprehensive overview of the related research known so far. Chapter 3, Data preparation This chapter follows through all steps of the data preparation process. It starts describing the main characteristics of the input data followed by a description of the data cleaning process. The section on data integration will explain how the different data sources are merged together for data enrichment while the next section concerns data loading. Finally an overall scheme and an experiments section are laid out. Chapter 4, Data structuring In chapter 4 we explain how the semantically enriched data is combined to form user sessions. It also discusses the process of user identification and gives a description of groups of users, both of which are preliminary requirements of the identification of sessions. The chapter ends with an overall scheme of data structuring followed by a section of experiments. Chapter 5, Profile mining models This chapter provides an overview of the theoretical background of applied data mining models. First it explains the widely used mining algorithm of frequent itemsets. The following section describes the recently researched mixture model architecture. And finally a tree model is proposed for exploiting the hierarchical structure of session data. Chapter 6, Analysing log files of the web server Chapter 6 discusses experimental results of mining models applied on the semantically enriched data. All the input data are related to a specific web domain: Chapter 7, Conclusion and future work Finally in chapter 7 we present the conclusions of our research and explore avenues of future work. 6

7 1 Introduction The extensive growth of the information reachable via the Internet induces its difficulty in manageability. It raises a problem to numerous companies to publish their product range or information online in an efficient, easily manageable way. The exploration of web users customs and behaviours plays a key role in dissecting and understanding the problem. Web mining is an application of data mining techniques to web data sets. Three major web mining methods are web content mining, web structure mining and web usage mining. Content mining applies methods to web documents. Structure mining reveals hidden relations in web site and web document structures. In this thesis we employ web usage mining which presents methods to discover useful usage patterns from web data. Web servers are responsible for providing the available web content on user requests. They collect all the information on request activities into so-called log files. Log data are a rich source for web usage mining. Many scientific researches aim at the field of web usage mining and especially at user behaviour exploration. Besides, there is a great demand in the business sector for personalized, customdesigned systems that conform highly to the requirements of users. There is a substantial amount of prior scientific works as well on modelling web user characteristics. Some of them present a complete framework of the whole web usage mining task (e.g., Mobasher et al. (1996) [18] proposed WEBMINER). Many of them present page access frequency based models and modified association rules mining algorithms, such as [1, 31, 23]. Xing and Shen (2003) [30] proposed two algorithms (UAM and PNT) for predicting user navigational preferences both based on page visits frequency and page viewing time. UAM is a URL-URL matrix providing page-page transition probabilities concerning all users statistics. And PNT is a tree based algorithm for mining preferred navigation paths. Nanopoulos and Manolopoulos (2001) [21] present a graph based model for finding traversal patterns on web page access sequences. They introduce one levelwise and two non-level wise algorithms for large paths exploiting graph structure. While most of the models work on global session levels an increasing number of researches show that the exploration of user groups or clusters is essential for better characterisation: Hay et al. (2003) [14] suggest Sequence Alignment Method (SAM) for measuring distance of sessions incorporated within structural information. The proposed distance is reflected by the number of operations required to transform sessions into one another. SAM distance based clusters form the basis of further examinations. Chevalier et al. (2003) [8] suggest rich navigation patterns consisting of frequent page set groups and web user groups based on demographical patterns. They show the correlation between the two types of data. Other researches point far beyond frequency based models: Cadez et al. (2003) [4] propose a finite mixture of Markov models on sequences of URL categories traversed by users. This complex probability based structure models the data generation process itself. In this thesis we discuss techniques and processes required for further analysis. Furthermore we present several web usage mining methods for extracting useful features. An overall process workflow can be seen in figure 1. 7

8 T e x t T e x t INPUT DATA DATA PREPARATION SESSION IDENTIFICATION PROFILE MINING DATA FILTERING AR FORMAT Association rules Web server s access log data Content type mapping table URL / content type DATA INTEGRATION DATABASE MM FORMAT Probability s Text Text Content types Mixture model Geographical and organizational information USER SELECTION Identified sessions 3 GTM FORMAT 2 Tree model 3 3 Figure 1: The overall process workflow This thesis considers three separate data sets as input data. Access log data are generated by the web server of the specified domain and contains user access entries. The content-type mapping table contains relations between documents and their category in the form of URL / content type pairs. Mapping tables can either be generated by classifier algorithms or by content providers. In the case of this latter type, contents of pages are given explicitly in the form of content categories (e.g., news, sport, weather, etc.). Geographical and organizational information make it possible to determine different categories of users. All data mining tasks start with data preparation, which prepares the input data for further examination. It consists of four main steps as it can be seen in figure 1. Data filtering strips out irrelevant entries, data integration enriches log data with content labels and the enriched data are stored in a database. The user selection process sorts out appropriate user entries of a specified group for session identification. The following step in the whole process is session identification. Related log entries are identified as unique user navigational sequences. Finally these sequences are written to output files in different formats depending on the application. The profile mining step applies several web usage mining methods to discover relevant patterns. It uses an association rules mining algorithm [1] for mining frequent page sets and for generating interesting rules. It also applies the mixture model proposed by Cadez et al. (2001) [5] to build a predictive model of navigational behaviours of users. Finally it presents a tree model for representing and visualizing visiting patterns in a nice and natural way. In the experimental part of this thesis we employ all these techniques to address the problem of defining clusters on the users of the web domain and we study their behaviours comprehensively. The contributions of this thesis are content based data enrichment and visualization of frequent navigational sequences. Data enrichment amplifies users transactional data with the content types of visited pages and documents and makes distinctions among users based on geographical and organizational information. The visualization presents a tree-like view of patterns that highlights relevant information and can be interpreted easily. 8

9 2 Related research There are numerous commercial software packages usable to obtain statistical patterns from web logs, such as [11, 22, 37]. They focus mostly on highlighting log data statistics and frequent navigation patterns but in most cases do not explore relationships among relevant features. Some researches aim at proposing data structures to facilitate web log mining processes. Punin et al. (2001) [24] defined the XGMML and LOGML XML languages. XGMML is for graph description while the latter is for web log description. Other papers focus only (or mostly) on data preparation [6, 13, 15]. Furthermore there are complete frameworks presented for the whole web usage mining task (e.g., Mobasher et al. (1996) [18] proposed WEBMINER). Many researches, such as [1, 23, 31], present page access frequency based models and modified apriori [1] (frequent itemset mining) algorithms. Some papers (e.g., [32] [10] [9]) present online recommender systems to assist the users browsing or purchasing activity. Yao et al. (2000) [32] use standard data mining and machine learning techniques (e.g., frequent itemset mining, C4.5 classifier, etc.) combined with agent technologies to provide an agent based recommendation system for web pages. While Cho et al. (2002) [10] suggest a product recommendation method based on data mining techniques and product taxonomy. This method employs decision tree induction for the selecting of users likely to buy the recommended products. Hay et al. (2003) [14] apply sequence alignment method (SAM) for clustering user navigational paths. SAM is a distance-based measuring technique that considers the order of sequences. The SAM distance of two sequences reflects the number of transformations (i.e., delete, insert, reorder) required to equalize them. A distance matrix is required for clustering which holds SAM distance scores for all session pairs. The analysis of the resulting clusters showed that the SAM based method outperforms the conventional association distance based measuring. In their paper Runkler and Bezdek (2003) [27] use relational alternating cluster estimation (RACE) algorithm for clustering web page sequences. RACE finds the centers for a specified number of clusters based on a page sequence distance matrix. The algorithm alternately computes the distance matrix and one of the cluster centers in each iteration. They propose Levenshtein (a.k.a edit) distance for measuring the distance between members (i.e. textual representation of visited page number sequences within sessions). Levenshtein distance counts the number of delete, insert or change steps necessary to transform one word into the other. Pei et al. (2000) [23] propose a data structure called web access pattern tree (WAP-tree) for efficient mining of access patterns from web logs. WAP-trees store all the frequent candidate sequences that have a support higher than a preset threshold. All the information stored by WAP-tree are labels and frequency counts for nodes. In order to mine useful patterns in WAPtrees they present WAP-mine algorithm which applies conditional search for finding frequent events. WAP-tree structure and WAP-mine algorithm together offer an alternative for apriorilike algorithms. Smith and Ng (2003) [28] present a self-organizing map framework (LOGSOM) to mine web log data and present a visualization tool for user assistance. Jenamani et al. (2003) [16] use a semi-markov process model for understanding e-customer behaviour. The keys of the method are a transition probability matrix (P) and a mean holding time matrix (M). P is a stochastic matrix and its elements store the probabilities of transition 9

10 states. M stores the average lengths of time for processes to remain in state i before moving to state j. In this way this probabilistic model is able to model the time elapsed between transitions. Some papers present methods based on content assumptions. Baglioni et al. (2003) [2] uses URL syntactic to determine page categories and to explore the relation between users sex and navigational behaviour. Cadez et al. (2003) [4] experiment on categorized data from Msnbc.com. Visualization of frequent navigational patterns makes human perception easier. Cadez et al. (2003) [4] present a WebCanvas tool for visualizing Markov chain clusters. This tool represents all user navigational paths for each cluster, colour coded by page categories. Youssefi et al. (2003) [33] present 3D visualization superimposed web log patterns and extracted web structure graphs. 10

11 3 Data preparation Preparing the input data is the first step of all data and web usage mining tasks. The data in this case are, as mentioned above, the access log files of the web server of the examined domain and the content types mapping table of the HTML pages within this domain. Data preparation consists of three main steps such as data cleaning/filtering, data integration and data storing. Data cleaning is the task of removing all irrelevant entries from the access log data set. Data integration establishes the relation between log entries and content mappings. And the last step is to store the enriched data into a convenient database. A comprehensive study has been made by Cooley et al. (1999) [13] on all these preprocessing tasks. This chapter starts with the description of the input data and generation procedure, followed by the details of log access file cleaning and data integration for log entries and mapping data integration. Finally it presents the database scheme for data storing and an overall picture and description of the data preparation process. 3.1 Data description This section describes the details of the access log and content type mapping data Access log files Visitors to a web site click on links and their browser in turn requests pages from the web server. Each request is recorded by the server in so-called access log files 1. Access logs contain requests for a given period of time. The time interval used is normally an attribute of the web server. There is a log file present for each period and the old ones are archived or erased depending on the usage and importance. Most of log files of web servers are stored in a common log file format (CLFF) [34] or in an extended log file format (ELFF) [35]. An extended log file contains a sequence of lines containing ASCII characters terminated by either the sequence LF or CRLF. Entries consist of a sequence of fields relating to a single HTTP transaction. Fields are separated by white space. If a field is unused in a particular entry dash, a "-" marks the omitted field. Web servers can be configured to write different fields into the log file in different formats. The most common fields used by web servers are the followings: remotehost, rfc931, authuser, date, request, status, bytes, referrer, user_agent. 1 There are other types of log files generated by the web server as well, but this project does not consider them. 11

12 The meanings of all these fields are explained in the table below with given examples: The most commonly used fields of access log file entries by web servers Field name Description of the field (with example) Remote hostname (or IP number if DNS hostname is not available) remotehost example: The remote login name of the user. rfc931 authuser [date] "request" status bytes "referer" "user_agent" example: - The username with which the user has authenticated himself. example: - Date and time of the request with the web server s time zone. example: [20/Jan/2004:23:17: ] The request line exactly as it came from the client. It consists of three subfields: the request method, the resource to be transferred, and the used protocol. example: "GET / HTTP/1.1" The HTTP status code returned to the client. example: 200 The content-length of the document transferred. example: The url the client was on before requesting the url. example: "-" The software the client claims to be using. example: "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" Table Content types mapping table A content types mapping table is a table containing URL/content type pair entries. URLs are file locator paths referring to documents, and content types are labels giving the types of documents (for more details about URL refer to APPENDIX A). Content types can either be generated by an algorithm or by content providers where the contents of pages are given explicitly (e.g., sport pages refer to sport content, etc.). Generator algorithms can also be distinguished depending on whether they produce the content types automatically or are driven by human interaction. 12

13 We use an external algorithm [3], which attaches labels to all HTML documents in a collection of HTML pages based on their contents. The algorithm is based on the naive Bayes classifier supplemented by a smart example selector algorithm. It uses only the textual content of the HTML pages stripping out the control tags. Some parts of the text enclosed within special tags (e.g., title or header tags) are biased. The algorithm chooses the first 100 pages randomly to be categorized by humans. This initialization step is followed by an active learning method. This method chooses the examples by considering the ones already selected. This thesis deals with other documents besides HTML as well (eg. pdf, ps, doc, rtf, etc.). However it would be a difficult process to attach labels to each of them based on their content. This is because the structure of these files is specific and most of the time very complex. And their size is usually very large. For these reasons a very simple technique is used to identify such documents. The label documents is attached to all pdf and ps files that refers to scientific papers, e-books, documentations, etc., while the label other documents is attached to all other document types (e.g., doc, rtf, ppt, etc.). Other documents determine e.g., administrative papers, forms, etc. According to these remarks, a mapping table is completed to contain entries for the two labels. The following table presents an example of content types mapping table: An example of content-type mapping table URL content type identifier bi/courses-en.html 4 ci/datamine/diana/index.html 6 Table Cleaning access log data As described above, raw access log files contain a vast amount of variant request entries. Each log entry can be informative for some application but this project excludes most of them. Processing of certain types of requests would lead to misconclusions (e.g., requests generated by spider engines). Besides, stripping the data has a positive effect on processing time and the required storage space. Since this project focuses only on documents themselves (like html, pdf, ps, doc files) all the request entries on different file types should be stripped out. Furthermore as the main goal is the characterization of users, robot transactions, which generate web traffic automatically by robot programs, must also be filtered out. There are several other criteria for filtering. Detailed descriptions of the filtering criteria and methods follow further on Filtering unsupported extensions A typical web page is made up of many individual files. Beyond the HTML page it consists of graphical elements, code styles, mappings etc., all in separate files. Each user request for an 13

14 HTML file evokes hidden requests for all the files required for displaying that specific page. In this manner access log files contain all the hidden requests traces as well. Extension filtering strips out all the request entries for file types other than predefined (for the structure of extension list file refer to APPENDIX B4 Extension filter list file). Requested files extensions in log entries could be extracted from the request field. An example of such request field: "GET /ai/kr/imgs/ibrow.jpg HTTP/1.0" Filtering spider transactions A significant portion of log file entries is generated by robot programs. These robots, also known as spider or crawler engines, automatically search through a specific range of the web. They index web content for search engines, prepare content for offline browsing or for several other purposes. The common point in all crawlers activity is that, although they are mostly supervised by humans, they generate systematic, algorithmic requests. So without eliminating spider entries from log files, real users characteristics would be distorted by features of machines. Spiders can be identified by searching for specific spider patterns in the "user_agent field of log entries. Most of the well-disposed spiders put their name or some kind of pattern that identifies them into this field. Once a pattern has been identified, the filter method ignores the examined log entry. Spider patterns can be looked up browsing the web for spiders. There are several pages considering spider activities and patterns, and there are lots of professional forums on the subject (mostly discussing how to avoid them) [29]. Spider patterns are collected in a separate spider list file (refer to APPENDIX B5). An example of such user_agent field: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;.NET CLR )" Filtering dynamic pages Web pages generated on user requests dynamically are called dynamic pages. These pages can not be located on the web server as an individual file, since they are built by a specific engine using several data sources. For this reason dynamic pages cannot be analyzed in a simple way. However with the application of several tricks it is possible to still obtain useful information. Jacobs et al. (2001) in [15] use an inductive logic programming (ILP) framework to reveal usage patterns based on dynamic page link parameters that are passed to the server. Since it is not an objective of this thesis to apply sophisticated methods for information recovery on dynamic pages, the filtering process simply eliminates all such reference. 14

15 There is no standard for the structure of URL requests for dynamic pages except that parameters appear after the? (question mark) in the URL which consist of name/value pairs. Therefore, dynamic pages can basically be filtered out by searching for the question mark in request fields of log entries. Note that requests for a single dynamic page without any parameters, thus without the delimiter question mark, would be stripped out during extension filtering (e.g., *.jsp, *.php, *.asp pages). An example of such a dynamic page s request field: "GET /obp/overview.php?lang=en HTTP/1.0" Filtering HTTP request methods HTTP/1.0 [25, 26] allows several methods to be used to indicate the purpose of a request. The most often used methods are GET, HEAD and POST. Since using the GET method is the only way of requesting a document that could be useful for this project, the request method filter ignores any other requests. The filter examines the request field of the log entry for the GET method identifier. An example of such a request field: "POST /modules/coppermine/themes/default/theme.php HTTP/1.0" Filtering and replacing escape characters URL escape characters are special character sequences made up of a leading % character and two hexadecimal characters. They substitute special characters in URL requests that could be problematic while transferring requests to different types of servers. Special characters are simply replaced by sequences of standard characters. In most cases the task is only to replace these escape sequences with their representatives, but in certain instances URLs contain corrupted sequences that cannot be interpreted. In these cases the entries should be ignored. Corrupt sequences can be caused by typing errors of the users, automatically generated robot requests, etc Filtering unsuccessful requests If a user requests a page that does not exist, his browser replies with the well-known 404 error, page not found error message. In this case the user has to use the back button to navigate back to the previous page or type a different URL manually. Either way the user doesn t use the requested page to navigate through it, since the error page doesn t provide any link to follow. For this reason log entries of erroneous requests should also be ignored. These entries can be filtered by examining the status field. The status of corrupt requests mostly equals to 404. In special cases status field can take other values as well, such as 503 etc. 15

16 An example of such a log entry: [16/May/2004:08:07: ] "POST /modules/coppermine/include/init.inc.php HTTP/1.0" "-" "Mozilla 4.0 (Linux)" Filtering request URLs for a domain name A URL of a page request consists of a domain name and the path of the requested document relative to the public directory of the domain. Since the domain name is not ambiguous to the responsible web server, it stores only the relative path of the request in the access log files, without the domain name. In a few cases however, log file entries tend to contain the whole absolute path. This leads to mapping errors during data integration, since the mapping table contains only relative paths and comparison is based on paths similarity. For these reasons a URL in the request field has to be transformed to the relative format. An example of such request field: "GET /www.cs.vu.nl/fb/generated/wrk_units/120.html HTTP/1.1" Path completion When a user requests a public directory instead of a specific file, the web server tries to find the default page in that directory. The default page is index.html in most cases, but it varies between web servers. Thus the task is to complete the URL with the name of the default page in case a log entry contains a directory request. It is possible that the server does not contain the default page in the requested directory. In this case the certain log entry will be filtered while looking it up in the content type mapping table (refer to section Content types mapping table). An example of such a request field: original request field: "GET /pub/minix/ HTTP/1.1" completed request field: "GET /pub/minix/index.html HTTP/1.1" Filtering anchors Anchors are special qualifiers for HTML link references. They act as reference points within a single web page. If a named anchor is placed somewhere in the HTML page s body, a link referring to the HTML page completed with a special character hash mark and the name of the anchor (e.g., link + # + anchor name) following the link will scroll directly to the place where the anchor is put. Anchors should be stripped out from URLs, otherwise the HTML document can not be found in the mapping table. An example of such a request field: "GET /vakgroepen/ai/education/courses/micd/opgave_1.html#1c HTTP/1.1" 16

17 We don t filter frame pages. Frames are supported by the HTML specification and make it possible to split an HTML document into several sub documents (e.g., a frame for the navigation menu, a frame for the content, etc.). Each frame refers to a specific HTML document, resulting in a separate page request. The main frame page contains mostly special tags for controlling all the subframes. This page is either labelled miscellaneous or labelled the same as its subframes by the text mining algorithm [3]. Either way there is no need to pay special attention to such pages while preparing the data. 3.3 Data integration A novel approach in this project is to use content types of the visited pages rather than URL references. Content types, as described earlier, are given in a special mapping table where each entry consists of an URL/content type pair (refer to section Content types mapping table). Data integration in this context means that there should be a content type label attached to every single stored log entry. The most simple and convenient method is to attach content labels to transactions during data cleaning 2. This would save time, since it uses the same cycle for both processes. After cleaning and filtering a log entry, the data integration step looks up the entry s request URL in the mapping table. If the URL is present, the corresponding type label is attached to the entry. Otherwise the extension of the URL is checked for a valid document type, other than HTML (refer to section Filtering unsupported extensions), and looked up in the table again. If the extension was an HTML page, it should be deleted Storing the log entries The final step of the data preparation is to store the data in a convenient database. MySQL was chosen as a database server in spite of the fact that the current version does not support stored procedures. In most cases it would be easier and faster to use internal methods for manipulating the data inside the database, but there were no inextricable difficulties that occurred during the project in this context. The advantages of MySQL are that it is fast, easy to maintain, free to use for research purposes and it s widely accepted. The database scheme for storing cleaned log entries can be seen in table 3. 2 Depending on the application. For continuous streaming data, a better solution would be to attach labels online to entries, and probably it would use the content identification model also to identify unknown contents besides a preset mapping table. 3 This step could be improved by using the original classifier model in case of a missing URL. 17

18 Database scheme of the cslog table column name type name id bigint remotehost varchar rfc931 varchar authuser varchar transdate datetime request text content_type tinyint status smallint bytes int referer text user_agent text Table 3 The column names respond to the log field names mentioned in section Access log files except for the content_type field which refers to the attached content type described in the previous paragraph and id which is the unique identifier of the entries. 3.5 An overall picture The following figure gives an overall picture of our data preparation scheme. Loading/filtering/mapping access log data MappingTable mapping_table.mtd Object Transaction (filtered,mapped) LogParser Transaction Object TransactionFilter Object Log2Database Object Object Object RAW LOG cslog.txt extension.flt datahandling.prop spider.flt DATABASE Figure 2: An overall picture of the data preparation 18

19 The first step in the data preparation process is to load raw log files into the memory line by line by the LogParser object. This object transforms all entries into suitable Transaction objects, which contain all the fields of the log file. Once a Transaction has been parsed, it goes through the TransactionFilter, which filters out useless entries (by simply ignoring them). After this step a content-type label is attached to all transactions by the MappingTable object. Finally Log2Database loads the filtered transactions into the specified database. 19

20 4 Data structuring Sessions a.k.a. transactions 4 constitute the basis of most web log mining processes. They are related to users and composed of pages visited during a separate browsing activity. This chapter starts with the description of user identification, which is essential for session identification. This is followed by details on grouping of users, which is also a relevant topic as characterization of them is the main goal of this project. The next paragraph deals with session identification methods and types, while discussing moreover how the selection method is restricted to groups of users. The final section presents a comprehensive overview of the data structuring process. 4.1 User identification Identification of users is essential for efficient data mining. It makes it possible to distinguish user specific data within the whole data set. It is straightforward to identify users in Intranet applications since they are required to identify themselves by following the login process. It is much more complicated in the case of public domains. The reason is that Internet protocols (e.g., HTTP, TCP/IP) do not require user authorization from client applications (e.g., web browser). The only private information exchanged is the machine IP address of the client. Identification based on this information is unreliable. This is because multiple users may use the same machine (thus the same IP address) to connect to the Internet. And on the other hand, a single user may use several machines to use the same service. Besides, proxy servers and firewalls hide the true IP address of the client. There are many solutions to resolve this problem. Content providers can force users to register for their services. In this way users have to follow a login process each time they want to browse their contents. To avoid explicit user authentication, servers can use so called cookies. Cookies are user specific files stored on client machines. Each time a user visits the same service, the server can obtain user information from stored cookies. The most accurate identification based solely on access log files is to use both IP address and browser agent type as a unique user identification pair [13]. However some papers use IP/cookie pairs [2]. The identification procedure proposed in this thesis takes place inside the database as a select query, which fills up the users table from the cslog table. Table 4 shows the data scheme of the users table. 4 Market basket analysis terminology uses transaction in terms of items purchased at once. Meanwhile the information technology (IT) sector denotes transaction for unique client-server request-respond information exchanges. Furthermore IT terminology also uses the term session (which is analogous to market basket) to denote consequent user page visits a.k.a. navigation sequences. To resolve the conflict, this thesis uses both terminologies for determination of navigation sequences except in chapter 3 Data preparation, where transaction translates to page accesses. 20

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Arti Tyagi Sunita Choudhary

Arti Tyagi Sunita Choudhary Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining

More information

Introduction to Web Analytics Terms

Introduction to Web Analytics Terms Introduction to Web Analytics Terms N10014 Introduction N40002 This glossary provides definitions for common web analytics terms and discusses their use in Unica Web Analytics. The definitions are arranged

More information

LabVIEW Internet Toolkit User Guide

LabVIEW Internet Toolkit User Guide LabVIEW Internet Toolkit User Guide Version 6.0 Contents The LabVIEW Internet Toolkit provides you with the ability to incorporate Internet capabilities into VIs. You can use LabVIEW to work with XML documents,

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 sheetal.raiyani@gmail.com

More information

Users Interest Correlation through Web Log Mining

Users Interest Correlation through Web Log Mining Users Interest Correlation through Web Log Mining F. Tao, P. Contreras, B. Pauer, T. Taskaya and F. Murtagh School of Computer Science, the Queen s University of Belfast; DIW-Berlin Abstract When more

More information

W3Perl A free logfile analyzer

W3Perl A free logfile analyzer W3Perl A free logfile analyzer Features Works on Unix / Windows / Mac View last entries based on Perl scripts Web / FTP / Squid / Email servers Session tracking Others log format can be added easily Detailed

More information

Working With Virtual Hosts on Pramati Server

Working With Virtual Hosts on Pramati Server Working With Virtual Hosts on Pramati Server 13 Overview Virtual hosting allows a single machine to be addressed by different names. There are two ways for configuring Virtual Hosts. They are: Domain Name

More information

Secure Web Appliance. SSL Intercept

Secure Web Appliance. SSL Intercept Secure Web Appliance SSL Intercept Table of Contents 1. Introduction... 1 1.1. About CYAN Secure Web Appliance... 1 1.2. About SSL Intercept... 1 1.3. About this Manual... 1 1.3.1. Document Conventions...

More information

Web Analytics Definitions Approved August 16, 2007

Web Analytics Definitions Approved August 16, 2007 Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 standards@webanalyticsassociation.org 1-800-349-1070 Licensed under a Creative

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 2, Issue 5 (March 2013) PP: 16-21 Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

More information

Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

More information

HTTP. Internet Engineering. Fall 2015. Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

HTTP. Internet Engineering. Fall 2015. Bahador Bakhshi CE & IT Department, Amirkabir University of Technology HTTP Internet Engineering Fall 2015 Bahador Bakhshi CE & IT Department, Amirkabir University of Technology Questions Q1) How do web server and client browser talk to each other? Q1.1) What is the common

More information

INTERNET DOMAIN NAME SYSTEM

INTERNET DOMAIN NAME SYSTEM INTERNET DOMAIN NAME SYSTEM http://www.tutorialspoint.com/internet_technologies/internet_domain_name_system.htm Copyright tutorialspoint.com Overview When DNS was not into existence, one had to download

More information

Advanced Preprocessing using Distinct User Identification in web log usage data

Advanced Preprocessing using Distinct User Identification in web log usage data Advanced Preprocessing using Distinct User Identification in web log usage data Sheetal A. Raiyani 1, Shailendra Jain 2, Ashwin G. Raiyani 3 Department of CSE (Software System), Technocrats Institute of

More information

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data Identifying the Number of to improve Website Usability from Educational Institution Web Log Data Arvind K. Sharma Dept. of CSE Jaipur National University, Jaipur, Rajasthan,India P.C. Gupta Dept. of CSI

More information

Chapter 5. Regression Testing of Web-Components

Chapter 5. Regression Testing of Web-Components Chapter 5 Regression Testing of Web-Components With emergence of services and information over the internet and intranet, Web sites have become complex. Web components and their underlying parts are evolving

More information

Using TestLogServer for Web Security Troubleshooting

Using TestLogServer for Web Security Troubleshooting Using TestLogServer for Web Security Troubleshooting Topic 50330 TestLogServer Web Security Solutions Version 7.7, Updated 19-Sept- 2013 A command-line utility called TestLogServer is included as part

More information

ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING

ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING International Journal of Science, Environment and Technology, Vol. 2, No 5, 2013, 1008 1016 ISSN 2278-3687 (O) ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING 1 V. Jayakumar and 2 Dr.

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Using Logon Agent for Transparent User Identification

Using Logon Agent for Transparent User Identification Using Logon Agent for Transparent User Identification Websense Logon Agent (also called Authentication Server) identifies users in real time, as they log on to domains. Logon Agent works with the Websense

More information

USER GUIDE MANTRA WEB EXTRACTOR. www.altiliagroup.com

USER GUIDE MANTRA WEB EXTRACTOR. www.altiliagroup.com USER GUIDE MANTRA WEB EXTRACTOR www.altiliagroup.com Page 1 of 57 MANTRA WEB EXTRACTOR USER GUIDE TABLE OF CONTENTS CONVENTIONS... 2 CHAPTER 2 BASICS... 6 CHAPTER 3 - WORKSPACE... 7 Menu bar 7 Toolbar

More information

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5 DEPLOYMENT GUIDE Version 1.1 Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5 Table of Contents Table of Contents Deploying the BIG-IP system v10 with Citrix Presentation Server Prerequisites

More information

Bitrix Site Manager 4.1. User Guide

Bitrix Site Manager 4.1. User Guide Bitrix Site Manager 4.1 User Guide 2 Contents REGISTRATION AND AUTHORISATION...3 SITE SECTIONS...5 Creating a section...6 Changing the section properties...8 SITE PAGES...9 Creating a page...10 Editing

More information

CA Nimsoft Monitor. Probe Guide for URL Endpoint Response Monitoring. url_response v4.1 series

CA Nimsoft Monitor. Probe Guide for URL Endpoint Response Monitoring. url_response v4.1 series CA Nimsoft Monitor Probe Guide for URL Endpoint Response Monitoring url_response v4.1 series Legal Notices This online help system (the "System") is for your informational purposes only and is subject

More information

Exploitation of Server Log Files of User Behavior in Order to Inform Administrator

Exploitation of Server Log Files of User Behavior in Order to Inform Administrator Exploitation of Server Log Files of User Behavior in Order to Inform Administrator Hamed Jelodar Computer Department, Islamic Azad University, Science and Research Branch, Bushehr, Iran ABSTRACT All requests

More information

User Identification and Authentication

User Identification and Authentication User Identification and Authentication Vital Security 9.2 Copyright Copyright 1996-2008. Finjan Software Inc.and its affiliates and subsidiaries ( Finjan ). All rights reserved. All text and figures included

More information

DEPLOYMENT GUIDE Version 2.1. Deploying F5 with Microsoft SharePoint 2010

DEPLOYMENT GUIDE Version 2.1. Deploying F5 with Microsoft SharePoint 2010 DEPLOYMENT GUIDE Version 2.1 Deploying F5 with Microsoft SharePoint 2010 Table of Contents Table of Contents Introducing the F5 Deployment Guide for Microsoft SharePoint 2010 Prerequisites and configuration

More information

DEPLOYMENT GUIDE Version 1.2. Deploying F5 with Oracle E-Business Suite 12

DEPLOYMENT GUIDE Version 1.2. Deploying F5 with Oracle E-Business Suite 12 DEPLOYMENT GUIDE Version 1.2 Deploying F5 with Oracle E-Business Suite 12 Table of Contents Table of Contents Introducing the BIG-IP LTM Oracle E-Business Suite 12 configuration Prerequisites and configuration

More information

PORTAL ADMINISTRATION

PORTAL ADMINISTRATION 1 Portal Administration User s Guide PORTAL ADMINISTRATION GUIDE Page 1 2 Portal Administration User s Guide Table of Contents Introduction...5 Core Portal Framework Concepts...5 Key Items...5 Layouts...5

More information

UH CMS Basics. Cascade CMS Basics Class. UH CMS Basics Updated: June,2011! Page 1

UH CMS Basics. Cascade CMS Basics Class. UH CMS Basics Updated: June,2011! Page 1 UH CMS Basics Cascade CMS Basics Class UH CMS Basics Updated: June,2011! Page 1 Introduction I. What is a CMS?! A CMS or Content Management System is a web based piece of software used to create web content,

More information

Ultimus and Microsoft Active Directory

Ultimus and Microsoft Active Directory Ultimus and Microsoft Active Directory May 2004 Ultimus, Incorporated 15200 Weston Parkway, Suite 106 Cary, North Carolina 27513 Phone: (919) 678-0900 Fax: (919) 678-0901 E-mail: documents@ultimus.com

More information

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

More information

WebSpy Vantage Ultimate 2.2 Web Module Administrators Guide

WebSpy Vantage Ultimate 2.2 Web Module Administrators Guide WebSpy Vantage Ultimate 2.2 Web Module Administrators Guide This document is intended to help you get started using WebSpy Vantage Ultimate and the Web Module. For more detailed information, please see

More information

Preprocessing Web Logs for Web Intrusion Detection

Preprocessing Web Logs for Web Intrusion Detection Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer

More information

Pre-Processing: Procedure on Web Log File for Web Usage Mining

Pre-Processing: Procedure on Web Log File for Web Usage Mining Pre-Processing: Procedure on Web Log File for Web Usage Mining Shaily Langhnoja 1, Mehul Barot 2, Darshak Mehta 3 1 Student M.E.(C.E.), L.D.R.P. ITR, Gandhinagar, India 2 Asst.Professor, C.E. Dept., L.D.R.P.

More information

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining Jaswinder Kaur #1, Dr. Kanwal Garg #2 #1 Ph.D. Scholar, Department of Computer Science & Applications Kurukshetra University,

More information

Research and Development of Data Preprocessing in Web Usage Mining

Research and Development of Data Preprocessing in Web Usage Mining Research and Development of Data Preprocessing in Web Usage Mining Li Chaofeng School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. China Abstract Web Usage Mining is the

More information

Contents WEKA Microsoft SQL Database

Contents WEKA Microsoft SQL Database WEKA User Manual Contents WEKA Introduction 3 Background information. 3 Installation. 3 Where to get WEKA... 3 Downloading Information... 3 Opening the program.. 4 Chooser Menu. 4-6 Preprocessing... 6-7

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

IaaS Configuration for Cloud Platforms

IaaS Configuration for Cloud Platforms vrealize Automation 6.2.3 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions

More information

Load testing with WAPT: Quick Start Guide

Load testing with WAPT: Quick Start Guide Load testing with WAPT: Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. A brief insight is provided

More information

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream User Manual Onsight Management Suite Version 5.1 Another Innovation by Librestream Doc #: 400075-06 May 2012 Information in this document is subject to change without notice. Reproduction in any manner

More information

Presentation Reporting Quick Start

Presentation Reporting Quick Start Presentation Reporting Quick Start Topic 50430 Presentation Reporting Quick Start Websense Web Security Solutions Updated 19-Sep-2013 Applies to: Web Filter, Web Security, Web Security Gateway, and Web

More information

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT) Internet Technologies World Wide Web (WWW) Proxy Server Network Address Translator (NAT) What is WWW? System of interlinked Hypertext documents Text, Images, Videos, and other multimedia documents navigate

More information

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture #3 2008 3 Apache.

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture #3 2008 3 Apache. JSP, and JSP, and JSP, and 1 2 Lecture #3 2008 3 JSP, and JSP, and Markup & presentation (HTML, XHTML, CSS etc) Data storage & access (JDBC, XML etc) Network & application protocols (, etc) Programming

More information

P Principles of Network Forensics P Terms & Log-based Tracing P Application Layer Log Analysis P Lower Layer Log Analysis

P Principles of Network Forensics P Terms & Log-based Tracing P Application Layer Log Analysis P Lower Layer Log Analysis Agenda Richard Baskerville P Principles of P Terms & -based Tracing P Application Layer Analysis P Lower Layer Analysis Georgia State University 1 2 Principles Kim, et al (2004) A fuzzy expert system for

More information

AIMMS The Network License Server

AIMMS The Network License Server AIMMS The Network License Server AIMMS AIMMS 4.0 July 1, 2014 Contents Contents ii 1 The Aimms Network License Server 1 1.1 Software requirements........................ 1 1.2 Installing and deploying

More information

ithenticate User Manual

ithenticate User Manual ithenticate User Manual Updated November 20, 2009 Contents Introduction 4 New Users 4 Logging In 4 Resetting Your Password 5 Changing Your Password or Username 6 The ithenticate Account Homepage 7 Main

More information

Secure Web Appliance. Reverse Proxy

Secure Web Appliance. Reverse Proxy Secure Web Appliance Reverse Proxy Table of Contents 1. Introduction... 1 1.1. About CYAN Secure Web Appliance... 1 1.2. About Reverse Proxy... 1 1.3. About this Manual... 1 1.3.1. Document Conventions...

More information

Chapter 6 Using Network Monitoring Tools

Chapter 6 Using Network Monitoring Tools Chapter 6 Using Network Monitoring Tools This chapter describes how to use the maintenance features of your Wireless-G Router Model WGR614v9. You can access these features by selecting the items under

More information

Click stream reporting & analysis for website optimization

Click stream reporting & analysis for website optimization Click stream reporting & analysis for website optimization Richard Doherty e-intelligence Program Manager SAS Institute EMEA What is Click Stream Reporting?! Potential customers, or visitors, navigate

More information

Bitrix Site Manager ASP.NET. Installation Guide

Bitrix Site Manager ASP.NET. Installation Guide Bitrix Site Manager ASP.NET Installation Guide Contents Introduction... 4 Chapter 1. Checking for IIS Installation... 5 Chapter 2. Using An Archive File to Install Bitrix Site Manager ASP.NET... 7 Preliminary

More information

Urchin E-Commerce. User Guide

Urchin E-Commerce. User Guide 2005 Linux Web Host. All rights reserved. The content of this manual is furnished under license and may be used or copied only in accordance with this license. No part of this publication may be reproduced,

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

IP Phone Service Administration and Subscription

IP Phone Service Administration and Subscription CHAPTER 6 IP Phone Service Administration and Subscription Cisco CallManager administrators maintain the list of services to which users can subscribe. These sections provide details about administering

More information

ithenticate User Manual

ithenticate User Manual ithenticate User Manual Version: 2.0.2 Updated March 16, 2012 Contents Introduction 4 New Users 4 Logging In 4 Resetting Your Password 5 Changing Your Password or Username 6 The ithenticate Account Homepage

More information

Introduction to Directory Services

Introduction to Directory Services Introduction to Directory Services Overview This document explains how AirWatch integrates with your organization's existing directory service such as Active Directory, Lotus Domino and Novell e-directory

More information

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS 3.1 Introduction In this thesis work, a model is developed in a structured way to mine the frequent patterns in e-commerce domain. Designing and implementing

More information

EMC Documentum Webtop

EMC Documentum Webtop EMC Documentum Webtop Version 6.5 User Guide P/N 300 007 239 A01 EMC Corporation Corporate Headquarters: Hopkinton, MA 01748 9103 1 508 435 1000 www.emc.com Copyright 1994 2008 EMC Corporation. All rights

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

DiskPulse DISK CHANGE MONITOR

DiskPulse DISK CHANGE MONITOR DiskPulse DISK CHANGE MONITOR User Manual Version 7.9 Oct 2015 www.diskpulse.com info@flexense.com 1 1 DiskPulse Overview...3 2 DiskPulse Product Versions...5 3 Using Desktop Product Version...6 3.1 Product

More information

Network Forensics: Log Analysis

Network Forensics: Log Analysis Network Forensics: Analysis Richard Baskerville Agenda P Terms & -based Tracing P Application Layer Analysis P Lower Layer Analysis Georgia State University 1 2 Two Important Terms PPromiscuous Mode

More information

Data Mining of Web Access Logs

Data Mining of Web Access Logs Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer

More information

2/24/2010 ClassApps.com

2/24/2010 ClassApps.com SelectSurvey.NET Training Manual This document is intended to be a simple visual guide for non technical users to help with basic survey creation, management and deployment. 2/24/2010 ClassApps.com Getting

More information

Monitoring Replication

Monitoring Replication Monitoring Replication Article 1130112-02 Contents Summary... 3 Monitor Replicator Page... 3 Summary... 3 Status... 3 System Health... 4 Replicator Configuration... 5 Replicator Health... 6 Local Package

More information

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING L.K. Joshila Grace 1, V.Maheswari 2, Dhinaharan Nagamalai 3, 1 Research Scholar, Department of Computer Science and Engineering joshilagracejebin@gmail.com

More information

Digital media glossary

Digital media glossary A Ad banner A graphic message or other media used as an advertisement. Ad impression An ad which is served to a user s browser. Ad impression ratio Click-throughs divided by ad impressions. B Banner A

More information

Web Browsing Quality of Experience Score

Web Browsing Quality of Experience Score Web Browsing Quality of Experience Score A Sandvine Technology Showcase Contents Executive Summary... 1 Introduction to Web QoE... 2 Sandvine s Web Browsing QoE Metric... 3 Maintaining a Web Page Library...

More information

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING AN EFFIIENT APPROAH TO PERFORM PRE-PROESSING S. Prince Mary Research Scholar, Sathyabama University, hennai- 119 princemary26@gmail.com E. Baburaj Department of omputer Science & Engineering, Sun Engineering

More information

www.novell.com/documentation Policy Guide Access Manager 3.1 SP5 January 2013

www.novell.com/documentation Policy Guide Access Manager 3.1 SP5 January 2013 www.novell.com/documentation Policy Guide Access Manager 3.1 SP5 January 2013 Legal Notices Novell, Inc., makes no representations or warranties with respect to the contents or use of this documentation,

More information

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey 1 Google Analytics for Robust Website Analytics Deepika Verma, Depanwita Seal, Atul Pandey 2 Table of Contents I. INTRODUCTION...3 II. Method for obtaining data for web analysis...3 III. Types of metrics

More information

IriScene Remote Manager. Version 4.8 FRACTALIA Software

IriScene Remote Manager. Version 4.8 FRACTALIA Software IriScene Remote Manager Version 4.8 FRACTALIA Software 2 A. INTRODUCTION...3 B. WORKING DESCRIPTION...3 C. PLATFORM MANUAL...3 1. ACCESS TO THE PLATFORM...3 2. AUTHENTICATION MODES...5 3. AUTHENTICATION

More information

Carisbrooke. End User Guide

Carisbrooke. End User Guide Carisbrooke Contents Contents... 2 Introduction... 3 Negotiate Kerberos/NTLM... 4 Scope... 4 What s changed... 4 What hasn t changed... 5 Multi-Tenant Categories... 6 Scope... 6 What s changed... 6 What

More information

v6.1 Websense Enterprise Reporting Administrator s Guide

v6.1 Websense Enterprise Reporting Administrator s Guide v6.1 Websense Enterprise Reporting Administrator s Guide Websense Enterprise Reporting Administrator s Guide 1996 2005, Websense, Inc. All rights reserved. 10240 Sorrento Valley Rd., San Diego, CA 92121,

More information

Load Balancing IBM WebSphere Servers with F5 Networks BIG-IP System

Load Balancing IBM WebSphere Servers with F5 Networks BIG-IP System Load Balancing IBM WebSphere Servers with F5 Networks BIG-IP System Introducing BIG-IP load balancing for IBM WebSphere Server Configuring the BIG-IP for load balancing WebSphere servers Introducing BIG-IP

More information

Chapter 6 Using Network Monitoring Tools

Chapter 6 Using Network Monitoring Tools Chapter 6 Using Network Monitoring Tools This chapter describes how to use the maintenance features of your RangeMax Wireless-N Gigabit Router WNR3500. You can access these features by selecting the items

More information

Firewall Builder Architecture Overview

Firewall Builder Architecture Overview Firewall Builder Architecture Overview Vadim Zaliva Vadim Kurland Abstract This document gives brief, high level overview of existing Firewall Builder architecture.

More information

Web Log Mining: A Study of User Sessions

Web Log Mining: A Study of User Sessions Web Log Mining: A Study of User Sessions Maristella Agosti and Giorgio Maria Di Nunzio Department of Information Engineering University of Padua Via Gradegnigo /a, Padova, Italy {agosti, dinunzio}@dei.unipd.it

More information

Visualizing e-government Portal and Its Performance in WEBVS

Visualizing e-government Portal and Its Performance in WEBVS Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR ccfong@umac.mo Abstract An e-government

More information

An Introduction To The Web File Manager

An Introduction To The Web File Manager An Introduction To The Web File Manager When clients need to use a Web browser to access your FTP site, use the Web File Manager to provide a more reliable, consistent, and inviting interface. Popular

More information

Setting Up Scan to SMB on TaskALFA series MFP s.

Setting Up Scan to SMB on TaskALFA series MFP s. Setting Up Scan to SMB on TaskALFA series MFP s. There are three steps necessary to set up a new Scan to SMB function button on the TaskALFA series color MFP. 1. A folder must be created on the PC and

More information

What is Web Security? Motivation

What is Web Security? Motivation brucker@inf.ethz.ch http://www.brucker.ch/ Information Security ETH Zürich Zürich, Switzerland Information Security Fundamentals March 23, 2004 The End Users View The Server Providers View What is Web

More information

CentreWare for Microsoft Operations Manager. User Guide

CentreWare for Microsoft Operations Manager. User Guide CentreWare for Microsoft Operations Manager User Guide Copyright 2006 by Xerox Corporation. All rights reserved. Copyright protection claimed includes all forms and matters of copyright material and information

More information

Usage Analysis Tools in SharePoint Products and Technologies

Usage Analysis Tools in SharePoint Products and Technologies Usage Analysis Tools in SharePoint Products and Technologies Date published: June 9, 2004 Summary: Usage analysis allows you to track how websites on your server are being used. The Internet Information

More information

DNS and E-mail Interface User Guide

DNS and E-mail Interface User Guide DNS and E-mail Interface User Guide Document Revision 04 // 2012 www.twcbc.com back back to TOC to TOC Header Text and Info Table of Contents 1. Introduction 3 2. Accessing the Application 4 3. Working

More information

orrelog SQL Table Monitor Adapter Users Manual

orrelog SQL Table Monitor Adapter Users Manual orrelog SQL Table Monitor Adapter Users Manual http://www.correlog.com mailto:info@correlog.com CorreLog, SQL Table Monitor Users Manual Copyright 2008-2015, CorreLog, Inc. All rights reserved. No part

More information

Workflow Templates Library

Workflow Templates Library Workflow s Library Table of Contents Intro... 2 Active Directory... 3 Application... 5 Cisco... 7 Database... 8 Excel Automation... 9 Files and Folders... 10 FTP Tasks... 13 Incident Management... 14 Security

More information

LANCOM Techpaper Content Filter

LANCOM Techpaper Content Filter The architecture Content filters can be implemented in a variety of different architectures: 11 Client-based solutions, where filter software is installed directly on a desktop computer, are suitable for

More information

TOSHIBA GA-1310. Printing from Windows

TOSHIBA GA-1310. Printing from Windows TOSHIBA GA-1310 Printing from Windows 2009 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45081979 04 February 2009 CONTENTS 3 CONTENTS

More information

An Enhanced Framework For Performing Pre- Processing On Web Server Logs

An Enhanced Framework For Performing Pre- Processing On Web Server Logs An Enhanced Framework For Performing Pre- Processing On Web Server Logs T.Subha Mastan Rao #1, P.Siva Durga Bhavani #2, M.Revathi #3, N.Kiran Kumar #4,V.Sara #5 # Department of information science and

More information

Scheduling Software User s Guide

Scheduling Software User s Guide Scheduling Software User s Guide Revision 1.12 Copyright notice VisualTime is a trademark of Visualtime Corporation. Microsoft Outlook, Active Directory, SQL Server and Exchange are trademarks of Microsoft

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information