A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting

Size: px
Start display at page:

Download "A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting"

Transcription

1 A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting Xiaohua Hu, 1, * Nick Cercone 2, 1 College of Information Science, Drexel University, Philadelphia, PA Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and business intelligence reporting. When we integrate the web data warehouse construction, data mining, and OLAP into the e-commerce system, this tight integration dramatically reduces the time and effort for web usage mining, business intelligence reporting, and mining deployment. Our data warehouse/olap framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and cube construction, and pattern evaluation and deployment. We discuss data transformation operations for web usage mining and business reporting in clickstream, session, and customer levels; describe the problems and challenging issues in each phase in detail; provide plausible solutions to the issues; and demonstrate the framework with some examples from some real web sites. Our data warehouse/ OLAP framework has been integrated into some commercial e-commerce systems. We believe this data warehouse/olap framework would be very useful for developing any real-world web usage mining and business intelligence reporting systems Wiley Periodicals, Inc. 1. INTRODUCTION Knowledge about customers and understanding customer needs is essential for customer retention in a web store for online e-commerce applications, because competitors are just one click away. To maintain a successful e-commerce solution, it is necessary to collect and analyze customer click behaviors at the web store. A *Author to whom all correspondence should be addressed: thu@cis.drexel.edu. nick@cs.dal.ca. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 19, (2004) 2004 Wiley Periodicals, Inc. Published online in Wiley InterScience ( DOI /int.20012

2 586 HU AND CERCONE Figure 1. The data warehouse/olap data flow diagram. web site generates a large amount of reliable data and is a killer domain for data mining application. Web usage mining can help an e-commerce solution to improve up-selling, cross-selling, personalized ads, click-through rates, etc. by analyzing the clickstream and customer purchase data through data mining techniques. Recently, web usage mining has attracted much attention from research and e-business professionals and it offers many of the following benefits to an e-commerce web site: Targeting customers based on usage behavior or profile (personalization) Adjusting web content and structure dynamically based on page access patterns of users (adaptive web site) Enhancing service quality and delivery to the end user (cross-selling, up-selling) Improving web server system performance based on the web traffic analysis Identifying the hot area/killer area of the web site We present a general data warehouse/olap framework for web usage mining and business intelligence reporting. In our framework, data mining is tightly integrated into the e-commerce systems. Our data warehouse/olap framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery, and pattern evaluation as shown in Figure 1. This framework provides the appropriate data transformations [also called extraction, transformation, and loading (ETL)] from the online transactional processing (OLTP) system to the data warehouse, builds data cubes from the data warehouse, mines the data for business analysis, and, finally, deploys the mining results to improve the online business. We describe the problems and challenging issues in each phase in detail and provide a general approach and guideline to web usage mining and business intelligence reporting for e-commerce. In Section 2, we discuss the various data capture methods and some of the pitfalls and challenging issues. In Section 3, we describe the data transformation operations for web data at different levels of granularity (clickstream level, session level, and customer level) and show how to organize the dimensions and fact tables for the webhouse, which is the data source for web usage mining and business intelligence reporting. We discuss the cube construction and various data mining methods for web usage mining in Section 4 and pattern evaluation (mining rules evaluation) in Section 5. In Section 6, we conclude with some insightful discussion. 2. DATA CAPTURE Capturing the necessary data in the data collection stage is an essential step for a successful data mining task. A large part of web data is represented in the web

3 A DATA WAREHOUSE/OLAP FRAMEWORK 587 log collected by the web server. A web log records the interactions between web server and web user (web browsers). A typical web log (common log format) contains information such as Internet provider (IP) address, identification data (ID) or password for access to a restricted area, a time stamp of the uniform resource locator (URL) request, method of transaction, status of error code, and size in bytes of the transaction. For the extended log format, it includes the extra information such as a referrer and agent. Initially, web logs were designed to help debug the web server. One of the fundamental flaws of analyzing web log data is that log files contain information about the files transferred from the server to the client not information about people visiting the web site. 1,2,3,4 Some of these fields are useless for data mining and are filtered in the data preprocessing step. Some of them such as the IP address, referrer, and agent can reveal much about the site visitors and the web site. Mining the web store often starts with the web log data. Web log data need to go through a set of transformation before data mining algorithms can be applied. To have a complete picture of the customers, web usage data should include the web server access log, browser logs, user profiles, registration data, user sessions, cookies, user search keywords, and user business events. 1,5,6 8 Based on our practice and experience in web usage mining, we believe that web usage mining requires conflation of multiple data sources. The data needed to perform the analysis should consist of five main sources: (1) The web server logs recording the visitors clickstream behaviors (pages template, cookie, transfer log, time stamp, IP address, agent, referrer, etc.) (2) Product information (product hierarchy, manufacturer, price, color, size, etc.) (3) Content information of the web site (image, gif, video clip, etc.) (4) The customer purchase data (quantity of the products, payment amount and method, shipping address, etc.) (5) Customer demographics information (age, gender, income, education level, lifestyle, etc.) Data collected in a typical web site categorize to different levels of granularity: page view, session, order item, order header, and customer. A page view has the information such as type of the page and duration on the page. A session consists of a sequence of page views; an order contains a few order items. In the data collection phase it is best to collect the finest granular and detailed data possible describing the clicks on the web server and items sold at the web store. Each web server will potentially report different details, but at the lowest level, we should be able to obtain a record for every page hit and every item sold if we want to have a complete portfolio of the click behavior and sale situation of the web store. There are various methods to capture and collect valuable information for visitors for e-commerce at the server level, proxy level, and client level through the computer-generated imagery Common Gateway Interface (CGI), Java application program interface (API) and JavaScript. 1,5,7 Most of them use web log data or packet sniffers as a data source for clickstream. For the purpose of data mining, web log data are not sufficient for the following main reasons: (1) Unable to identify the sessions

4 588 HU AND CERCONE (2) Lack of web store transaction data; the web store transaction records all sale-related information of a web store and it is necessary for business analysis and data mining in order to answer some basic and important business questions such as which referrer site leads to more product sale at my site? What is the conversion rate of the web site? and Which part of my web sites are more attractive to the purchaser? (3) Lack of business events of web store; business events of a web store such as add an item to the shopping cart, research key event, and abandoning shopping cart are very useful to analyze user shopping and browsing behavior of a web store. In our framework, we believe that collecting data at the web application server layer is the most effective approach, as suggested by some commercial vendors. 1,7 The web application server controls all the user activities such as registration and logging in/out and can create a unified database to store web log data, sale transaction data, and business events of the web site. Discussion of these methods is beyond the scope of this study. For interested readers, please refer to Refs. 1 and 7. There are challenging issues in the data capture phase for web usage mining. The following challenges illustrate three problems: (1) how to sessionize the clickstream data; (2) how to filter crawler sessions; and (3) how to gather customer information. These challenges are the most popular ones encountered in almost all of the web usage mining projects, and these problems have a huge impact on the success or failure of web usage mining projects. In the following sections, we discuss each of them in detail Session Data A user web session is a sequence of consecutive page views (hits) before the user explicitly logs out or times out. A user who visits a site in the morning and then again in the evening would count as two user visits (sessions). Because of the statelessness of hypertext transfer protocol (HTTP), clickstream data is just a sequence of page hits; a page hit may be an isolated event that is hard to analyze without considering the context. To make the raw clickstream data usable in web usage mining, the clickstream needs to be collected and transformed in such a way that it has a session perspective. Thus, the first task after data collection is to identify the sessions for the clickstream stream (sessionizing the clickstream). In some web usage mining systems, during preprocessing, individual log entries are aggregated into server sessions according to the IP address and agent information. New sessions also are identified using a 30-minute intersession time-out period. 9,10 Within each session, the log entries are grouped into a separate request where each request may correspond to an individual user click or a search event. Nonetheless, there are some serious problems when processing in this way. Many internet users utilize an Internet service provider (ISP); their IP address may be assigned dynamically so it is very likely that the same user will have a different address in different sessions. 7,11,12 14 Another problem is that users behind a firewall can all have the same IP address; an IP address is not suitable as an identification variable for such sessions.

5 A DATA WAREHOUSE/OLAP FRAMEWORK 589 Realizing the limitations of relying on the IP address, cookies are used as a work-around to solve this problem and to sessionize the clickstream in many web sites. A cookie is a mechanism that allows the web server to store its own information about a user on the user s hard driver. It is a small file that a web server sends to a web user and stores on his computer so that it can remember something about you at a later time. The location of the cookies depends on the browser. Internet Explorer stores each cookie as a separate file under a Window s subdirectory. Netscape stores all cookies in a single cookies.txt file. Sites to store customization information or to store user demographic data often use this information. The main purpose of cookies is to identify users and possibly prepare customized web pages for them. If the cookie is turned on, that means the user will send the cookie back to the web server each time his browser opens one of the web pages and the web server can identify the requesting users computer unambiguously. Thus, the browser puts all the hits with the same cookie as one session until the user explicitly logs out or times out. In some situations, for privacy concerns, some users choose to turn off cookies, and then the web site needs to use login ID and referrer and agent information, if possible, to identify user and server sessions. 1, Crawlers Session A crawler is a software agent that traverses web sites based on web linkages in web pages. Search engines use crawlers to index web pages and crawlers can help users to gather information, such as prices for certain products, and help web designers to diagnose web site problems (such as response time, isolated web pages, etc.). Most crawlers adopt a breadth-first retrieval strategy to increase their coverage of the web site. In our experience with some web site data, at times up to 30% of site clickstream session traffic may be crawlers; these sessions are called crawler sessions. Crawler sessions may mislead data mining analysis to generate inaccurate or incorrect results if they are not filtered. For example, an associate algorithm is used to find the page click orders in a session, as pointed out in Refs. 5, 9, 15, and 16, and an association rule mining algorithm may inadvertently generate frequent item sets involving web pages from different page categories. Such spurious patterns may lead an analyst of an e-commerce site to believe that web surfers are interested in products from various categories when, in fact, crawlers induce such patterns. 1,9,17 This problem can be avoided if web crawler sessions are removed from the data set during data preprocessing. Thus, identifying crawler sessions is very important for web usage mining. There are a few ways to identify a crawler session. In Ref. 9, they build a classification model to identify sessions. The crawler sessions may have some of the following characteristics: images turned off, empty referrers, visit robots.txt file, very short page duration time, pattern is a depth-first or breadth-first search of the site, and/or never purchase. 11 Some web sites adopt the approach that creates an invisible link on a page and because only crawlers follow invisible links (regular users cannot click invisible links), the sessions consisting of the invisible links are considered to be a crawler session.

6 590 HU AND CERCONE 2.3. Customer Demographics (Offline Date) Retaining customers and increasing sales is the only way for an e-commerce web store to survive in this very competitive online market. To retain customers, you need to understand their needs and preferences. As pointed in Refs. 12, 18, and 19, fostering and promoting repeated sales requires knowledge about customers preferences, consumption rate, behavior, and lifestyle. This knowledge generally requires knowing a customer s income, age, gender, lifestyle, etc. To find the best way to reach its customers and increase sales, it is necessary for a company to enrich the clickstream with this offline information. The user of demographics, psychographics, property information, household characteristics, individual characteristics, lifestyle, has been used by database marketing professionals to improve their sales, retain customers, and acquire new customers for bricks-and-mortar stores for decades. This information also should be used in a web store to enhance the vast amount of customer and clickstream behavior already captured at the web site. In the web store, customer information can be collected through a registration form, which often is limited. Some web sites offer incentives to users to encourage them to register or answer a set of questions. The problem is that users tend not to give the information or they provide inaccurate information on registration forms. Fortunately, there are many commercial marketing database vendors that collect this information based on zip code or physical addresses. This information should be integrated to web data for additional insight into the identity, attributes, lifestyles, and behaviors of the web site visitors and customers. 19 There are several sources of demographic information at various levels such as CACI, Acxiom, and Experian. CACI provides neighborhood demographics; Acxiom gives householdlevel psychographics; and Experian provides the MOSAIC targeting system, which identifies consumers according to the type of neighborhood in which they live. 19 These external offline demographics can identify online visitors and customers, where they live, and, subsequently, how they think, behave, and are likely to react to your online offers and incentives. Database marketers have used this information for years to segment their customers and potential prospects. The demographics and socioeconomic profiles are aggregated from several sources including credit card issuers, county recorder offices, census records, and other cross-referenced statistics. 19 When analyzing and mining customer demographics data from web data, customer privacy always should be kept in mind. Profiling customers is bad when web sites fail to do it anonymously. 3. DATA WEBHOUSE CONSTRUCTION A data warehouse provides the data source for OLAP and data mining. Designing a proper data warehouse schema and populating the data from the OLTP system to the warehouse is very time-consuming and complex. A well-designed data warehouse would feed business with the right information at the right time in order to make the right decisions in the e-commerce system. 1,20,21 In Section 2, we discussed data capture methods for the web site, which collect the clickstream,

7 A DATA WAREHOUSE/OLAP FRAMEWORK 591 sales, customers, shipments, payments, product information, etc. These data are online transaction data and are stored in the transaction database system (OLTP). The database schemas of the OLTP are based on entity-relationship (E-R) modeling, normalized to reduce redundancy in the database and designed to maintain atomicity, consistency, and integrity to maintain speed and efficiency for use in day-to-day business operations such as inserting, updating, and deleting a transaction. For an OLTP query, it normally only needs to access a small set of records in the databases but it demands very quick responses. For web usage mining purposes, it needs to have a database schema (called a data warehouse) designed to support decision making and data analysis (OLAP). Typical relational databases are designed for OLTP and do not meet the requirements for effective OLAP. As a result, data warehouses are designed differently than traditional relational databases. Data warehouses use OLTP data for historical, read-only analysis. The data in a data warehouse system normally are organized in multidimensional modeling with star schema (fact tables plus the surrounding dimension tables). The requirement of clickstream data in the data warehouse makes the schema design even more complicated. The web challenges the current view of the data warehouse with multiple new requirements. 18 The data warehouse is required to make the customer clickstream available for analysis; therefore, a new term webhouse was coined by Ralph Kimball. 18,22 A webhouse plays an integral role in the web revolution as the analysis platform for all the behavior data arriving from the clickstream, as well as for the many web sites that rely on the data warehouse to customize and drive the end user s web experience in real time. 18 We use a webhouse to refer to the data warehouse system for the web usage mining. The webhouse is the source data of data mining and business intelligence reporting in the data warehouse/olap framework and it contains the fundamental business content of what a web store sells with web services and capabilities. A webhouse should allow you to analyze all hits on a web site and all the products sold in the web store from many viewpoints. Many systems have developed to web log record mining, which finds association patterns and sequential patterns about web accessing; but to understand customers such as repeated visitors versus single visitors, and single purchase customers versus multiple purchase customers, it is necessary to include additional information such as order information from the web store, product information about the product, user browsing sequence from clickstream, and customer information from user tables. In the following section we discuss the requirement analysis and dimensional modeling technique to design the webhouse Requirement Analysis of the Webhouse It is necessary to build a comprehensive view of the immerse stream of clicks arriving at web sites, including items sold through the site. We want to build a webhouse to provide insightful information and answer the important business questions for e-commerce. The design of a webhouse starts with the requirement analysis. We spent significant time interviewing our clients, business analysts, engineers/developers, and end-users to gather their requirements and what kinds of

8 592 HU AND CERCONE business problems they hope solve from the webhouse. Their questions cover wide ranges and areas: Web site activity (hourly, daily, weekly, monthly, quarterly, etc.) Product sale (by region, by brand, by domain, by browser type, by time, etc.) Customers (by type, by age, by gender, by region, buyer versus visitor, heavy buyer versus light buyer, etc.) Vendors (by type, by region, by price range, etc.) Referrers (by domain, by sale amount, by visit numbers, etc.) Navigational behavior pattern (top entry page, top exit page, killer age, hot page, etc.) Click conversion ratio Shipments (by regular mail, by express mail, etc.) Payments (by cash, by credit card, by e-money, etc.) Some of the important questions are Who are my most profitable customers? What is the difference between buyers and nonbuyers at my site? Which parts of my site attract the most visits? Which part of my site is a session killer? Which parts of the sites lead to the most purchases? What is the typical click path that leads to a purchase? What is the typical path of customers who abandoned the shopping cart? What are the percentages of customers visiting the product section? What is the new-visitor click profile? What are the top/bottom products? What are the peak traffic hours? We analyzed these questions and determined that the dimensions needed to be constructed and the fact measure the business analysts needed are to be determined. After identifying the dimensions and measures, we can move to the next step: webhouse schema design Webhouse Schema Design In the webhouse, there are one or several fact tables and a set of small tables called dimensional tables. The fact table is where numerical measurements of the business are stored. Each of these measurements is taken at the intersection of all dimensions. The dimension tables are where the textual descriptions of the dimensions of the business are stored. 18 There are several design methodologies to design a data warehouse such as the architecture-based methodology proposed by Anahory and Murray, 23 and the four-step methodology used by Ralph Kimball. To construct the webhouse for the data warehouse/olap framework, we adopted Kimball s methodology to build the webhouse through dimensional modeling techniques. The four-steps include (1) defining the source of data, (2) choosing the grain of the fact tables, (3) choosing the dimensions appropriate for the grain, and (4) choosing the facts appropriate for that grain. We discuss each step in detail in the following sections.

9 A DATA WAREHOUSE/OLAP FRAMEWORK Define the Source Data Because we wish to analyze click behavior and the sales situation of a web site for an online web store, we need to have the data for every web hit on the web site and every item sold in the web site. Data collection is at the page request level (clickstream) or purchase in the order item level and all of this information is already available in the transaction database (OLTP). In the data collection phase, we have collected every web page view, purchase data and customer information in the web database system, which is the data source for our webhouse. We need to extract them from the transaction database and transform them into the webhouse according to the design of the dimension tables and fact tables, which is described later. In addition, we also need to have product, use, page, time, payment, shipping, and promotion information Choose the Grain of the Fact Tables The fact table is the center of the webhouse. It contains a list of all measures and points to the key value of the lowest level of each dimension. The lowest level of each dimension table, business problems, and domain determines the granularity of the fact table. Before the fact tables can be designed in detail, a decision must be made as to what an individual low-level record in that fact table means. This is the grain of the fact table. 18 To analyze the clickstream, every page hit should have a row in the clickstream fact table, which is the grain of the clickstream. To analyze the sales business of the web store, every item sold should have a row for order item fact table. Thus, the grain of the item fact tables in the webhouse is every item sold Choose the Dimensions Appropriate for the Grain Dimensions are qualifiers that give meaning to measures. They organize the data based on the what, when, and where components of a business question. Dimensions are stored in dimension tables made up of dimensional elements and attributes. Each dimension is composed of related items or elements. Dimensions are hierarchies of related elements. Each element represents a different level of summarization. For example, products roll up to subcategories, which roll up to categories (which in turn roll up to department, etc.). The lowest level in the hierarchy is determined by the lowest level of detail required for the analysis. Levels higher than the base level store redundant data. This denormalized table reduces the number of joins required for a query and makes it easier for users to start querying at a higher level and to drill down to lower levels of detail as needed. All of the elements relating to the product, e.g., would comprise the product dimension. This allows the user to query for all categories and drill down to the subcategories or product level for more detailed information. In the following sections, we discuss some of the important dimensions in the webhouse. (The discussions of some other dimensions such as business event dimension and promotion dimension are omitted because of space limitations; these dimensions are almost the same in the webhouse compared with the traditional data warehouse.)

10 594 HU AND CERCONE Session Dimension. The session dimension is more than just a tag that groups together all the page events that constitute a single user s session. The session dimension is the place where we label the session and trace its activity, 18 to describe the characteristics of the session. The characteristics of a session should consist of session length, the total page requests of the session, the first pages of the session, last pages of the session, the referrer of the session, cookie ID, user agent, client host, the first request time, the last request time, total browsing time of the session, average viewing time of each page of the session, and session visit count. We may need to characterize sessions as sessions with purchase, sessions without purchase, random browsing, crawler sessions, etc. With this information, we are able to answer business questions such as which page my customer comes to on my site (top first page of the sessions) and where they leave (top last page of the session) and what are the characteristic of the sessions that lead to purchase? Page Dimension. Site area analysis is very important in order to understand which part of the web site attracts most of the hits, which part leads to a purchase, which part of the site is a killer, and which part of the site is less visited and superfluous. The page dimension should contain meaningful context that tells the analyst the user s web site location. Each web page must contain some simple descriptors identifying the location and type of page such as log in, registration, hot product, product info, company info, frequently asked questions, and order form. 22 A large web site should have a hierarchical description associated with each page that gives progressively more detail about what constitutes the page. This information needs to be stored in the page dimension and be maintained consistently as we update and modify the web site. A page dimension also should contain such information as page template, page category, number of images, and banners on the page, etc. Time Dimension. The time dimension is very important in every data warehouse because every fact table in the data warehouse is a time series of observations of some sort. In traditional data warehouses, the time dimension is at a daily grain, but for the webhouse the granularity is finer. We have seen some webhouses record at the hourly grain or even minute level. Data column in a relational table normally has the format as year, month, day, hour, minute, and seconds (YYYYMMDD:: HHMMSS). We need to create new attributes representing date of week, day of year, and quarters from date column attributes. Because in a web environment, we are analyzing both the click stream behavior and the sales, it makes perfect sense to have two time hierarchies. One is more or less the traditional time dimension in the data warehouse: date related to day, week, month, quarter, and years (may need to use data transformation function to construct new attributes and properties: weekday, weekend, holiday season, etc.), which is useful to compare sales across day, month, quarter, or year. The other time hierarchy is the time of the day related to a specific spot within a day, hour, or minute in the day, (some derived useful attributes are early morning, late afternoon, evening, working hour, lunch break, etc.). This time hierarchy is useful for site traffic analysis. User Dimension. To obtain good customer profiles, variables describing the characteristics of the customer should be added. If available, this information is given in a data warehouse where all customer characteristics and historical infor-

11 A DATA WAREHOUSE/OLAP FRAMEWORK 595 mation about click behavior are stored. To combine this information with the transaction data, the users must identify themselves when visiting the web site so the cookie ID could be matched with their names and the transactional data could be merged with customer-relevant data. The customer dimension should contain information such as name, addresses, gender, age, demographics, and lifestyle. Identifying the user is very important to distinguish different types of visitors to the web site. In the user dimension we need to label users as single visitors, repeat visitors, visitor with single purchase, visitor with multiple purchases, or most profitable customer based on the amount they spend. Based on the user dimension information, we should be able to answer business questions related to different user types. Product Dimension. The product dimension describes the complete portfolio of what the web site sells online and the information varies from different online stores. For example, Amazon.com has a larger product dimension than an online bank. Normally, the product dimension should contain information such as product key, stock-keeping unit (SKU) description, product property (weight, size, color, package type, etc.), brand, sub category, department), price, manufacturer, and warranty information Choose the Facts Appropriate for That Grain Choosing the appropriate fact measures for the grain in the fact table depends on the business objective and analysis purposes. For the clickstream fact, we can choose the time (number of seconds) the user spent on each page. For the order fact table, we can choose revenue, profit, cost, quantity, and other measures. The star schema for the webhouse is constructed as shown in Figure Data Transformation Creating a warehouse is not enough because a lot important information is not in the data warehouse yet; e.g., for the session, it is essential to know the number of pages, time spent, or the session leads to purchase or not. For the customer, it is necessary to create attributes such as whether the customers are repeat visitors, heavy spenders, or occasional shoppers, etc. These new attributes need to be created/derived from existing database columns to make data mining and reporting easier or even possible. There are two sets of transformations that need to take place: (1) data must be transferred from the OLTP systems to the OLAP systems and (2) data may need to go through some transformation or conversion to create some new values, which are not explicitly represented in the data warehouse. The first set of transformation is relatively stable and straightforward. There are a lot of ETL tools on the market for this purpose. 16 The second set of transformation provides a significant challenge for web usage mining because a lot of these transformations are related to the application domains and business goals. Typically, in the web warehouse, the data are collected in the clickstream level. For data mining and business intelligence reporting purposes, the data in the data warehouse need to be transformed or aggregated to different levels of granularity (session

12 596 HU AND CERCONE Figure 2. Star schema of webhouse. level, order-header level, or customer level) depending on the mining and reporting goals. For example, if the analyst is interested in the difference between sessions with purchase and without purchase, then the transformation/aggregation operations need to be performed to convert clickstream data into the session level. If the analyst wants to understand the characteristics of the most-profitable customers, then the data need to transform/aggregate further from session level to customer level. There are three types of transformations in the web usage mining context: (1) Generalizing/extracting the primitive values to high level values. For example, the referrer column for each click session has too many different values but some useful information is embedded in it so it is useful to create new columns from it such as the host of the referrer and the domain of the referrer and, similarly, new columns such as domain and host from ISPs and customer s

13 A DATA WAREHOUSE/OLAP FRAMEWORK 597 (2) Grouping/summarizing information from multiple columns. For example, in the customer preference survey, there are columns such as prefer basketball, prefer football, and prefer baseball, corresponding to the customer s first, second, and third preferred sport. For mining or reporting purposes, it is better to increase the granularity by generating new columns to indicate the customer preference philosophy. (3) The third type of new column created is for inferring information not directly available from existing database columns. For example, to have an image of a customer s product page views, it needs to know whether a click record is a product page view from brands, which is not directly available. This information can be inferred from the template and referrer columns. Based on our experience, the following are some of the typical data transformation operations we found very popular and useful for web usage mining and reporting: (1) Click level transformation Transformation name Transformation description Result type Referrer indicator for a product page Creates an indicator variable for referrer for an arbitrary product page. Product detail page views are important information for a web store. Within a web store, you can go to a product page from different places depending on how the web store is designed. (e.g., ViewfromHotArea, ViewfromGifts). To analyze the clickstream data, it is helpful to know which area each product page view comes from and it is defined based on the type, template, and referrer columns Boolean Page view time The number of seconds that a person spends viewing a page Double Credit card indicator (MasterCard, Visa, AMX, etc.) Indicate whether the transaction was completed using which type of credit card Boolean Decode the query string Path of session Detailed path of the session Last page of the session First page of the session Click area Click tags Purchases of products that appear on Whats_hot pages The operation returns the search results of the search arguments the customers typed while they surfed the web site This operation pulls the templates of the session into a long string Similar to the foregoing operation, except you will get the detailed dynamic generated page Return the last page of the session Return the first page of the session The column tells which area a click is on or none for nonclick Check whether a click is a gift-box view or shopping-cart view It is very useful to know who bought products from the Whats_hot pages, or what products were bought from the Whats_hot pages. However, this is very hard to do without an event log. What can be done is to find purchases of products that appear on Whats_hot pages. Note that these products may appear on other pages and customers can buy them there String String String String String Boolean String Boolean

14 598 HU AND CERCONE When did a customer fill the registration (survey) form? Web site normally has an optional registration form that contains some survey questions. Knowing answers to these questions can help develop a better understanding of customers (before any purchases or after purchases) Boolean The foregoing transformations can capture a lot of essential information for reporting and helping the business analysts understand and improve the web site performance and function increase customer satisfaction. For example, the query decode string transformation and can capture the top 10 failed search keywords from the customers as shown in Table I from a real online motor store. Fat boy and chrome are the most popular items the customers are looking for. Then, the store manager can make a decision to add these items to the web store if a lot of customers showed interests in these items. (2) Session level transformation Transformation name Transformation description Result type Customer browser name Browse release Return a string containing the browser s name from the user agent if the flag is true; otherwise, group all unknown browser names to others The release number of the browser given the user agent string. The main release number is for Mozilla. It will contain the release number for MS Internet Explorer inside ( ) if the browser is an IE browser. It contains AOL and its release number inside [ ] if the browser is an AOL browser. String String Browse OS The Operating System (OS) running the browse String Returned visitor True if the user is a returned visitor Boolean Session length The total clicks of this session Integer Long session Indicate whether the session is a long one or not ( 6 clicks) Boolean Short session Indicate whether the session is a long one or not (1 or 2 clicks) Boolean Session duration The total time spent on this session Double Referrer host Host of the referrer String Referrer domain Domain of the referrer String URL site Return the URL site such as Yahoo, Excite, etc. String ISP host Internet service provider host String What day it is of the A number to indicate which day it is for the first visit Double first visit What day it is of the A number to indicate which day it is for the last visit Double last visit Is the visit a Indicate whether the visit happens on a weekend or not Boolean weekend Is the visit a Indicate whether the visit happens on a weekday or not Boolean weekday Any purchase on this session Indicate whether the session leads to any purchase or not Boolean Purchase amount in different areas In addition to the numbers of product detail page views that came from different areas, it also is important to know customers purchase amount from each area. Because it is hard to trace where a purchase comes from precisely, it can be estimated by distributing the total purchase amount to each area using the proportion of the number of product detail page views from this area Double

15 A DATA WAREHOUSE/OLAP FRAMEWORK 599 Table I. Search string Top 10 failed searches. No. of searches Fat boy 1566 Chrome 791 Motorclothes 443 G-type fuel tank 325 G-sportster 280 Maintenance 260 C-sidecar 210 Sissy bar 175 Seat 169 Touring 163 Purchase quantity in different areas Hour of day of the server Time period of the day Similar to customers purchase amount from each area, it is necessary to know customers purchase quantity from each area Shows the visitor s time from the first request date based on the location of the server Based on the hour of day, you can add more columns to indicate whether the visit time is in the morning or evening, etc. (such as early morning, late evening, lunch time, etc.) Double Double Boolean Table II shows the top 10 paths of a web site without any purchase. These paths can help the web site understand the customer click behavior and reveal a lot of reasons why the customers left the web site without making a purchase. For example, the top path is main.jsp3splash.jsp (14,622 sessions). The customers visited the main.jsp and then left the web site after clicking the splash.jsp. A further analysis by the web designer found that the splash.jsp took a while to compile and download and the effect of the animation of the splash.jsp made a huge portion of the contents of the store invisible, and thus caused a lot of frustrations among the customers; so, they left. After the splash.jsp was removed, the conversion rates improved significantly. Table II. Web path Top 10 paths lead to nonpurchased sessions. Count main.jsp3splash.jsp 14,622 main.jsp3main.jsp 3731 main.jsp3main.jsp3main.jsp 790 main.jsp3login.jsp 329 main.jsp3hot.jsp3registration.jsp 303 Login.jsp 274 main.jsp3survey.jsp 216 product.jsp 212 main.jsp3product.jsp 192 main.jsp3search.jsp 180

16 600 HU AND CERCONE Table III. Single/multiple visitors/buyers. Type Count Single visit 1823 Multiple visit 37 Single-visit buyer 269 Multiple-visit buyer 58 Unknown 2846 (3) Customer level transformation Transformation name Transformation description Result type domain The domain name is the portion of the address after String the last period such as COM, NET, EDU, etc. host name The host name is the portion of the address after String the at sign and before the last period (.) Time zone Time zone of the customer Area code Area code of the customer s phone number String Country region Country region of the customer String Repeat buyers Indicator whether the visitor is a repeated buyer or not Boolean Single-visit customer Customer only visited once and no purchase was made Boolean Multiple-visit customer Customer visited multiple times but no purchase was made Boolean Single-visit buyer Customer visited once and made a purchase Boolean Multiple-visit buyer Customer visited multiple times and made at least one Boolean purchase Profit ration (average Profit ratio is defined as the total number of sales divided Double revenue per visit) by total number of visits Propensity-to-purchase ratio Indicate the likelihood of the visit or purchasing something Double Things preferred and things really bought In the survey form, there are questions such as preferred brands, preferred products, and special needs. It should be valuable information if we know the correlation between what a customer prefers and what he/she buys String The customer level transformation created a lot of new columns in the data warehouse to make the reporting and data mining easier and more meaningful at the customer level. For example, identifying whether a customer is a single visitor, buyer, repeated buyer, etc., is very important for the web store. Table III can reveal how many customers are loyal customers, occasional shoppers, or just pure visitors. After the data transformations are done, data in the webhouse are organized into different levels. Some of the most useful summary tables and fact tables for web usage mining and reporting are shown in Table IV. 4. PATTERN DISCOVERY: A DATA WAREHOUSE/OLAP APPROACH Data warehouse/olap is an approach for integrating data mining, data warehousing, and OLAP technologies. OLAP systems precalculate summary information (data cubes) to enable drilling, pivoting, slicing and dicing, and filtering

17 A DATA WAREHOUSE/OLAP FRAMEWORK 601 Table IV. Summary and facts table in the webhouse. CLICK_LINES SESSIONS CUSTOMERS GIFT_LINES ORDER_LINE ORDER_HEADERS PROMOTIONS LINE_ITEMS A row for each web page viewed A row for each web session A row for each customer A row for each gift registry item of each customer Contains a row for each order line of each order A row for each order of each customer A row for each promotion folder and promotion defined in the system ORDER_LINES joins with CUSTOMER, ORDER_HEADERS, PRODUCTS, ASSORTMENT, and PROMOTIONS to analyze business from multiple angles or views (dimensions). Web mining your site in the webhouse can reveal actionable and meaningful patterns for users and useful click sequences for the web site design Construct Cubes from the Webhouse A data cube is precalculated summary data organized in a way that the cells of the cube contain measured values and the edges of the cube define the natural dimensions of the data. (The data cube may have more than three dimensions; so, technically, it should be called a hypercube.) The dimensional elements in the cube are organized in a hierarchy and you can roll up and/or drill down the dimension hierarchy to get a different view or understanding about the cube data. A data cube offers the benefits for data analysis such as an immediate response to a business query and the ability to drill down and roll up the multiple dimensional data in the cube to analyze business measures such as profit, revenue, quantity from different angles, perspectives, and various ancillary factors. We can create two cubes from the webhouse as shown in Figure 2, one cube for the clickstream and another cube for the order item based on the fact table clickstream, the ordered item and session, product, user, page, and time dimension table. In the webhouse, we already have data organized based on a multiple-dimensional model. All that is required is to plug into the OLAP software. There are many OLAP tools such as MS OLAP, Cognos, and Essbase to choose from to build large cubes. Cutting and dicing these cubes reveal significant information about your web site and sales situations. For example, we can find out the top pages of the sites, top domains, top browsers, the view time of the top page, top exits page of the site, top referrer of the site, top products by sales and quantity, top referrers by page request, sale, quantity, users, and web site activity based on day, month, or even hours and minutes. We also can find out who our visitors include, how much they spend, the sale cycles, etc. From the OLAP cubes, many business intelligence reports can be derived. Business reports are the most important tool for business analysts but are unappreciated by a lot of companies. Business intelligent reports can provide many insightful information about the web store such as sales of products across different refers, best-selling products/bottom products, top domains/bottom domain, top searched keywords, etc.

18 602 HU AND CERCONE 4.2. Mining the Webhouse Data OLAP is a key component of this approach, but OLAP alone is not good enough for e-commerce applications. Some of the challenging issues can not be answered by examining the measured values in the cubes. For example, to answer the question, Given a set of page views, will the visitor view another page on the site or will the visitor leave, it is very difficult if not impossible to find a satisfactory answer based on the OLAP cube data from the webhouse. A lot of mining algorithms and methods such as association algorithm, decision tree, neural network, Bayesian algorithm, clustering method, etc. can be applied for web usage mining to derive insightful knowledge rules to understand the business and customers, build prediction models for classification, and generate campaign scores for product promotion. Next, we discuss how these algorithms can help solve some of the challenging problems for the e-commerce Association Rules Initially, association rule algorithms were designed to analyze market basket data to find correlations in items purchased together, e.g., if a customer buys product A, what is the likelihood that he/she will buy product B. In web usage mining, association rule algorithms can be used for two purposes. First, analyzing the online purchase data to determine which products are sold together by online customers (similar to the traditional supermarket basket data analysis). Online shopping databases contain historical data on prior customer choices where each customer has selected a subset of products. This data can be used to generate a dynamic recommendation of new items to a customer who is in the process of making the item choice. Another use of the association rule algorithm is to analyze the page view hits in a session. Web sites also display dynamically changing sets of links to related sites depending on the browsing pattern during a surfing session. Use of the adapted association algorithm can find related pages that often are visited together. The pages may not have hyperlinks between them. As a result of association rule analysis, it is possible to optimize the web site structure and detect drawbacks that had not been obvious in the past. This information may help the web designer to redesign their web site (add direct links between those strong correlated pages); it also may help web servers do per-fetching or precompiling of the web pages (presently, many web sites have dynamic page generation) to reduce the user waiting time. Web sites also display a dynamically changing set of links to related sites depending on the browsing pattern during a surfing session. However, we feel that the recommendation is inherently a different problem, the main reason for this is that preferences are caused by largely the taste and interest. When a customer surfs the web store, whether purchasing or visiting a web site, not all actions (put items into shopping cart or click through different web pages) are elected because of their association with some previous actions (other items already in the cart or already went through some previous pages). 24 We believe there are two behaviors: renew choice or association choice. Starting from the scratch, some need drives the customer to select the click first page/first item,

19 A DATA WAREHOUSE/OLAP FRAMEWORK 603 Table V. Associations in a beauty-supply web store. No. of rules Relations Lift Support (%) Confidence (%) Rule Bloom f Dirty_Girl Dirty_Girl f Bloom Philosophy f Bloom Bloom f Philosophy Dirty_Girl f Blue_Q Blue_Q f Dirty_Girl Tony_And_Tina f Girl Philosophy f Tony_And_Tina Tony_And_Tina f Philosophy Demeter_Fragrances f Smell_This Girl f Tony_And_Tina Smell_This f Demeter_Fragrances which is because of some independent need that we call the renewal choice. After the first move, a customer may stop or click another page/select another item by association or by another renewal choice, iteratively. We propose a hybrid approach (statistical association rule approach) to compute the probability of a new move becoming the next choice given the current status and make a recommendation list based on a ranking of this probability. What makes this approach different from the usual association rule approaches is that they account not only for the choice making or buying associated with the items present in the shopping cart (associative buying), but also for the fact that a customer exercises an independent choice unrelated to the existing items in the shopping cart (renew buying). We compute the probability of both renewal choice and associative choice given the items in the shopping cart and obtain the probabilities for each item given the partial basket content and obtain the probabilities for each item given one of these two buying modes. The results from this analysis are very useful for promoting cross-selling and up-selling for the online web store. Based on this consideration, we tested on one client s site and the association rules in Table V reveal that this approach generates more meaningful and actionable associations Classification/Prediction Classification/prediction is a very popular data mining technique used to build a model based on the training data; then, apply the model to assign a new item to a certain class. There are many algorithms such as decision trees, neural networks, Bayesian networks, and probability theory for classification. For example, to understand the customers who spend more than $12 on the web site, you can use decision tree algorithms to build a model, which may reveal such patterns as the customer who spends more than $12 are single women, aged between 25 and 35 years, and make more than $35,000 a year. Another application for classification/ prediction is target-oriented campaign. Mass campaign has a very low response rate, typically 2 3%. In a target-oriented campaign, the company only sends campaign messages to a small portion of customers who are most likely to respond.

20 604 HU AND CERCONE Even though sending s to all online customers is very cheap, it is still important to target effectively, as suggested by Berry and Linoff, because customers who might read one target are less likely to read a lot of junk messages. 11,15 Another important reason is that if the customers are fed up with these off-target messages, they can revoke their permission to be contacted by the web store. To identify who is most likely to respond to your campaign, avoid generating too many off-target s and improve the service equality, we can build a prediction model based on the historical data (which has the responders versus nonresponders in the past campaigns) and then apply the prediction model to the current customers and sort the customer list with a probability score; the top of the sorted list are those customers who are likely to respond the campaign Clustering Clustering techniques are useful when there are no classes to be classified or predicted. Clustering algorithms group a set of objects into different groups based on measures so that the objects in the same group are similar to each other and objects in different groups are different. In web usage mining, clustering algorithms can be used in several ways: (1) Profiling customers based on some features such as purchasing amount, region, and purchased products. For example, we group customers into different groups such as heavy spenders, light spenders, or browsers based on the amount. We can extract similar features from this cluster and find out that heavy spenders are mostly young technical professionals and single men. The results of clustering web data can help online stores to identify proper customer segments with common characteristics and target these segments for campaign or product promotion and make special offers tailored to their needs and requirements. (2) Clustering navigational paths of web hits. As shown in Refs. 12 and 25, clustering navigational paths is very important for user segmentation; the result can help the web designer understand or predict visitors navigation patterns to make the web site more efficient or more close to the visitors preferences. For example, if the clustering results show Page P1, P2, and P3 are in the same cluster, then the web server can prefetch Pages P2 and P3 or precompile Pages P2 and P3 while the user is still viewing Page P1 to reduce the loading time or compile time; that way, it helps reduce the user waiting latency. Another potential use is to find subsets of the users that would benefit from sharing a single web cache rather than using individual ones. 5. PATTERN EVALUATIONS AND DEPLOYMENT In the data warehouse/olap framework, the last step is to evaluate the mining results and then adopt actionable results. After the mining algorithms are applied, many patterns may be identified but not all of them are interesting or actionable. Unlike most of the pattern evaluation approaches, which rely on a structured query language (SQL) statement to query the database and evaluate the results, in our data warehouse/olap framework, the data cube is an essential component in the mining procedure and we can dice and roll up the data cube to

A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting

A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting Xiaohua Hu College of Information Science Drexel University, Philadelphia PA, USA 19104 email: thu@cis.drexel.edu

More information

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users 1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data

More information

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

More information

Web Analytics Definitions Approved August 16, 2007

Web Analytics Definitions Approved August 16, 2007 Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 standards@webanalyticsassociation.org 1-800-349-1070 Licensed under a Creative

More information

Click stream reporting & analysis for website optimization

Click stream reporting & analysis for website optimization Click stream reporting & analysis for website optimization Richard Doherty e-intelligence Program Manager SAS Institute EMEA What is Click Stream Reporting?! Potential customers, or visitors, navigate

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING The Data Webhouse Toolkit Building the Web-Enabled Data Warehouse Ralph Kimball Richard Merz WILEY COMPUTER PUBLISHING John Wiley & Sons, Inc. New York Chichester Weinheim Brisbane Singapore Toronto Contents

More information

Web Traffic Capture. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

Web Traffic Capture. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com Web Traffic Capture Capture your web traffic, filtered and transformed, ready for your applications without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite

More information

Customer Relationship Management

Customer Relationship Management Customer Relationship Management CRM is Any application or initiative designed to help an organization optimize interactions with customers, suppliers, or prospects via one or more touch points for the

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Arti Tyagi Sunita Choudhary

Arti Tyagi Sunita Choudhary Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 sheetal.raiyani@gmail.com

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Data Warehouse design

Data Warehouse design Data Warehouse design Design of Enterprise Systems University of Pavia 21/11/2013-1- Data Warehouse design DATA PRESENTATION - 2- BI Reporting Success Factors BI platform success factors include: Performance

More information

Analyzing the footsteps of your customers

Analyzing the footsteps of your customers Analyzing the footsteps of your customers - A case study by ASK net and SAS Institute GmbH - Christiane Theusinger 1 Klaus-Peter Huber 2 Abstract As on-line presence becomes very important in today s e-commerce

More information

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Chapter 6 - Enhancing Business Intelligence Using Information Systems Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning

More information

1 Which of the following questions can be answered using the goal flow report?

1 Which of the following questions can be answered using the goal flow report? 1 Which of the following questions can be answered using the goal flow report? [A] Are there a lot of unexpected exits from a step in the middle of my conversion funnel? [B] Do visitors usually start my

More information

Privacy Policy - LuxTNT.com

Privacy Policy - LuxTNT.com Privacy Policy - LuxTNT.com Overview TNT Luxury Group Limited (the owner of LuxTNT.com). knows that you care how information about you is used and shared, and we appreciate your trust that we will do so

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

WEBSITE ANALYSIS OVERVIEW

WEBSITE ANALYSIS OVERVIEW WEBSITE ANALSIS OVERVIEW Key Analysis Areas Web Traffic Web Visitors Web Navigation ecommerce Customers in this Area Include: BBC Worldwide Caja Duero La Caixa LendingTree.com Lexmark International Nygård

More information

Demystifying Digital Introduction to Google Analytics. Mal Chia Digital Account Director

Demystifying Digital Introduction to Google Analytics. Mal Chia Digital Account Director Demystifying Digital Introduction to Google Analytics Mal Chia Digital Account Director @malchia @communikateetal Slides will be emailed after the session 2 Workshop Overview 1. Introduction 2. Getting

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Dimensional Data Modeling for the Data Warehouse

Dimensional Data Modeling for the Data Warehouse Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Dimensional Data Modeling for the Data Warehouse Prerequisites Students should

More information

HOW DOES GOOGLE ANALYTICS HELP ME?

HOW DOES GOOGLE ANALYTICS HELP ME? Google Analytics HOW DOES GOOGLE ANALYTICS HELP ME? Google Analytics tells you how visitors found your site and how they interact with it. You'll be able to compare the behavior and profitability of visitors

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data

More information

EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA

EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA Articles 37 Econ Lit C8 EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA Assoc. prof. Snezhana Sulova, PhD Introduction Today increasing numbers of commercial companies are using the electronic

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

GOOGLE ANALYTICS 101

GOOGLE ANALYTICS 101 GOOGLE ANALYTICS 101 Presented By Adrienne C. Dupree Please feel free to share this report with anyone who is interested in the topic of building a profitable online business. Simply forward it to them

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

Internet Advertising Glossary Internet Advertising Glossary

Internet Advertising Glossary Internet Advertising Glossary Internet Advertising Glossary Internet Advertising Glossary The Council Advertising Network bring the benefits of national web advertising to your local community. With more and more members joining the

More information

Google Analytics Health Check Laying the foundations for successful analytics and optimisation

Google Analytics Health Check Laying the foundations for successful analytics and optimisation Google Analytics Health Check Laying the foundations for successful analytics and optimisation Google Analytics Property [UA-1234567-1] Domain [Client URL] Date of Review MMM YYYY Consultant [Consultant

More information

Sterling Business Intelligence

Sterling Business Intelligence Sterling Business Intelligence Release Note Release 9.0 March 2010 Copyright 2010 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:

More information

INFO 1400. Koffka Khan. Tutorial 6

INFO 1400. Koffka Khan. Tutorial 6 INFO 1400 Koffka Khan Tutorial 6 Running Case Assignment: Improving Decision Making: Redesigning the Customer Database Dirt Bikes U.S.A. sells primarily through its distributors. It maintains a small customer

More information

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING L.K. Joshila Grace 1, V.Maheswari 2, Dhinaharan Nagamalai 3, 1 Research Scholar, Department of Computer Science and Engineering joshilagracejebin@gmail.com

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Alexander Nikov. 7. ecommerce Marketing Concepts. Consumers Online: The Internet Audience and Consumer Behavior. Outline

Alexander Nikov. 7. ecommerce Marketing Concepts. Consumers Online: The Internet Audience and Consumer Behavior. Outline INFO 3435 E-Commerce Teaching Objectives 7. ecommerce Marketing Concepts Alexander Nikov Identify the key features of the Internet audience. Discuss the basic concepts of consumer behavior and purchasing

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

web analytics ...and beyond Not just for beginners, We are interested in your thoughts:

web analytics ...and beyond Not just for beginners, We are interested in your thoughts: web analytics 201 Not just for beginners, This primer is designed to help clarify some of the major challenges faced by marketers today, such as:...and beyond -defining KPIs in a complex environment -organizing

More information

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS 3.1 Introduction In this thesis work, a model is developed in a structured way to mine the frequent patterns in e-commerce domain. Designing and implementing

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Google Analytics Guide. for BUSINESS OWNERS. By David Weichel & Chris Pezzoli. Presented By

Google Analytics Guide. for BUSINESS OWNERS. By David Weichel & Chris Pezzoli. Presented By Google Analytics Guide for BUSINESS OWNERS By David Weichel & Chris Pezzoli Presented By Google Analytics Guide for Ecommerce Business Owners Contents Introduction... 3 Overview of Google Analytics...

More information

Regain Your Privacy on the Internet

Regain Your Privacy on the Internet Regain Your Privacy on the Internet by Boris Loza, PhD, CISSP from SafePatrol Solutions Inc. You'd probably be surprised if you knew what information about yourself is available on the Internet! Do you

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Web Analytics and the Importance of a Multi-Modal Approach to Metrics

Web Analytics and the Importance of a Multi-Modal Approach to Metrics Web Analytics Strategy Prepared By: Title: Prepared By: Web Analytics Strategy Unilytics Corporation Date Created: March 22, 2010 Last Updated: May 3, 2010 P a g e i Table of Contents Web Analytics Strategy...

More information

How We Use Your Personal Information On An Afinion International Ab And Afion International And Afinion Afion Afion

How We Use Your Personal Information On An Afinion International Ab And Afion International And Afinion Afion Afion AFFINION INTERNATIONAL AB COMPANY PRIVACY AND COOKIES POLICY The privacy and cookies policy sets out how we use any personal information that you give to us, or that we may collect or otherwise process

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) Data Driven Success Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) In business, data is everything. Regardless of the products or services you sell or the systems you support,

More information

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Multidimensional Modeling - Stocks

Multidimensional Modeling - Stocks Bases de Dados e Data Warehouse 06 BDDW 2006/2007 Notice! Author " João Moura Pires (jmp@di.fct.unl.pt)! This material can be freely used for personal or academic purposes without any previous authorization

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Increase Conversion and Sales, Not your Marketing Budget

Increase Conversion and Sales, Not your Marketing Budget Increase Conversion and Sales, Not your Marketing Budget How to Optimize your Shopify Store for the Holiday Season! Dust off those jingle bells! The holiday season is just around the corner for online

More information

Setting Up Solar Web Commerce. Release 8.6.9

Setting Up Solar Web Commerce. Release 8.6.9 Setting Up Solar Web Commerce Release 8.6.9 Legal Notices 2011 Epicor Software Corporation. All rights reserved. Unauthorized reproduction is a violation of applicable laws. Epicor and the Epicor logo

More information

Google Analytics Basics

Google Analytics Basics Google Analytics Basics Contents Google Analytics: An Introduction...3 Google Analytics Features... 3 Google Analytics Interface... Changing the Date Range... 8 Graphs... 9 Put Stats into Context... 10

More information

PRIVACY POLICY. The Policy is incorporated into Terms of Use and is subject to the terms laid down therein.

PRIVACY POLICY. The Policy is incorporated into Terms of Use and is subject to the terms laid down therein. PRIVACY POLICY This Privacy Policy ( Policy ) applies to the website Creditseva.com which is an online internet portal ( Creditseva ), offering credit repair, credit monitoring and credit consulting services

More information

SKoolAide Privacy Policy

SKoolAide Privacy Policy SKoolAide Privacy Policy Welcome to SKoolAide. SKoolAide, LLC offers online education related services and applications that allow users to share content on the Web more easily. In addition to the sharing

More information

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,

More information

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS

More information

SAS BI Dashboard 3.1. User s Guide

SAS BI Dashboard 3.1. User s Guide SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

BUSINESS IMPACT OF POOR WEB PERFORMANCE

BUSINESS IMPACT OF POOR WEB PERFORMANCE WHITE PAPER: WEB PERFORMANCE TESTING Everyone wants more traffic to their web site, right? More web traffic surely means more revenue, more conversions and reduced costs. But what happens if your web site

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

Friends Asking Friends 2.94. New Features Guide

Friends Asking Friends 2.94. New Features Guide Friends Asking Friends 2.94 New Features Guide 8/10/2012 Friends Asking Friends 2.94 Friends Asking Friends US 2012 Blackbaud, Inc. This publication, or any part thereof, may not be reproduced or transmitted

More information

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey 1 Google Analytics for Robust Website Analytics Deepika Verma, Depanwita Seal, Atul Pandey 2 Table of Contents I. INTRODUCTION...3 II. Method for obtaining data for web analysis...3 III. Types of metrics

More information

Analytics case study

Analytics case study Analytics case study Carer s Allowance service (DWP) Ashraf Chohan Performance Analyst Government Digital Service (GDS) Contents Introduction... 3 The Carer s Allowance exemplar... 3 Meeting the digital

More information

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija. The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.si ABSTRACT Health Care Statistics on a state level is a

More information

Sterling Business Intelligence

Sterling Business Intelligence Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights Our Data & Methodology Understanding the Digital World by Turning Data into Insights Understanding Today s Digital World SimilarWeb provides data and insights to help businesses make better decisions,

More information

Advanced Preprocessing using Distinct User Identification in web log usage data

Advanced Preprocessing using Distinct User Identification in web log usage data Advanced Preprocessing using Distinct User Identification in web log usage data Sheetal A. Raiyani 1, Shailendra Jain 2, Ashwin G. Raiyani 3 Department of CSE (Software System), Technocrats Institute of

More information

The Fundamentals of B2C Marketing Automation for Effective Marketing Communications

The Fundamentals of B2C Marketing Automation for Effective Marketing Communications The Fundamentals of B2C Marketing Automation for Effective Marketing Communications Mark Patron February 2013 Email and Website Optimisation Introduction Marketing automation is a process that uses insight

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Logging In From your Web browser, enter the GLOBE URL: https://bms.activemediaonline.net/bms/

Logging In From your Web browser, enter the GLOBE URL: https://bms.activemediaonline.net/bms/ About GLOBE Global Library of Brand Elements GLOBE is a digital asset and content management system. GLOBE serves as the central repository for all brand-related marketing materials. What is an asset?

More information

2. An E-commerce Value Chain and Data Requirements

2. An E-commerce Value Chain and Data Requirements IEEE Data Engineering Bulletin, March 2000, Vol. 23, No., pp. 23-28. Database Design for Real-World E-Commerce Systems Il-Yeol Song College of Information Science Technology Drexel University Philadelphia,

More information

Digital media glossary

Digital media glossary A Ad banner A graphic message or other media used as an advertisement. Ad impression An ad which is served to a user s browser. Ad impression ratio Click-throughs divided by ad impressions. B Banner A

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

Business Intelligence, Analytics & Reporting: Glossary of Terms

Business Intelligence, Analytics & Reporting: Glossary of Terms Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Usage Analysis Tools in SharePoint Products and Technologies

Usage Analysis Tools in SharePoint Products and Technologies Usage Analysis Tools in SharePoint Products and Technologies Date published: June 9, 2004 Summary: Usage analysis allows you to track how websites on your server are being used. The Internet Information

More information

Research and Development of Data Preprocessing in Web Usage Mining

Research and Development of Data Preprocessing in Web Usage Mining Research and Development of Data Preprocessing in Web Usage Mining Li Chaofeng School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. China Abstract Web Usage Mining is the

More information

MINING CLICKSTREAM-BASED DATA CUBES

MINING CLICKSTREAM-BASED DATA CUBES MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt

More information

Chapter 12: Web Usage Mining

Chapter 12: Web Usage Mining Chapter 12: Web Usage Mining By Bamshad Mobasher With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of clickstream and user data collected

More information