A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting

Size: px
Start display at page:

Download "A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting"

Transcription

1 A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting Xiaohua Hu College of Information Science Drexel University, Philadelphia PA, USA Nick Cercone Faculty of Computer Science Dalhousie University Halifax, Nova Scotia, Canada Abstract Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information etc) in order to understand and serve e-commerce customers better and improve the online business. In this paper we present a general Data Warehouse/OLAP framework for web usage mining and business intelligence reporting. We integrate the web data warehouse construction, data mining, On-Line Analytical Processing (OLAP) into the e-commerce system, this tight integration dramatically reduces the time and effort for web usage mining, business intelligence reporting and mining deployment. Our Data Warehouse/OLAP framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and cube construction, pattern evaluation and deployment. We discuss data transformation operations for web usage mining and business reporting in clickstream, session and customer level, describe the problems and challenging issues in each phase in details and provide plausible solution to the issues and demonstrate with some examples from some real websites. Our Data Warehouse/OLAP framework has been integrated into some commercial e-commerce systems. We believe this Data Warehouse/OLAP framework would be very useful for developing any real-world web usage mining and business intelligence reporting systems. 1. Introduction Knowledge about customers and understanding customer needs is essential for customer retention in a web store for online e-commerce applications, since competitors are just one click away. To maintain a successful e-commerce solution, it is necessary to collect and analyze customer click behaviors at the web store. A web site generates a large amount of reliable data and is a killer domain for data mining application. Web usage mining can help an e-commerce solution to improve up-selling, cross-selling, personalized ads, clickthrough rate and so on by analyzing the clickstream and customer purchase data through data mining techniques. Web usage mining has attracted much attention recently from research and e-business professionals and it offers many benefits to an e-commerce web site such as: Targeting customers based on usage behavior or profile (personalization) Adjusting web content and structure dynamically based on page access pattern of users (adaptive web site) 1

2 Enhancing the service quality and delivery to the end user (cross-selling, up-selling) Improving web server system performance based on the web traffic analysis Identifying hot area/killer area of the web site. We present a general Data Warehouse/OLAP framework for web usage mining and business intelligence reporting. In our framework, data mining is tightly integrated into the E-commerce systems. Our Data Warehouse/OLAP framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and pattern evaluation as shown in Figure 1. In this framework, it provides the appropriate data transformations (also called ETL: Extraction, Transformation and Loading) from the OLTP system to data warehouse, build data cubes from the data warehouse and mine the data for business analysis and finally deploy the mining results to improve the on-line business. We describe the problems and challenging issues in each phase in detail and provide a general approach and guideline to web usage mining and business intelligence reporting for e- commerce. The rest of the paper is organized as follows: in Section 2, we discuss the various data capture methods and some of the pitfalls and challenging issues. In Section 3, we will describe the data transformation operations for web data at different level of granularity (clickstream level, session level and customer level) and show how to organize the dimensions and facts tables for the webhouse, which is the data source for the web usage mining and business intelligence reporting. We discuss the cube construction and various data mining methods for web usage mining in Section 4 and pattern evaluation (mining rules evaluation) in Section 5. We conclude in Section 6 with some insightful discussion. Data Capture (clickstream, sale, customer, product, etc) Data Webhouse Construction ( dimensions, fact tables, aggregation table, etc) Mining, OLAP ( rules, prediction models, cubes, reports, etc) Pattern Evaluations & Deployment Figure 1: The Data Warehouse/OLAP Data Flow Diagram 2. Data Capture Capturing the necessary data in the data collection stage is a key step for a successful data mining task. A large part of web data is represented in the web log collected in the web server. A web log records the interactions between web server and web user (web browsers). A typical web log (Common Log format) contains information such as Internet provider IP address, ID or password for access to a restricted area, a time stamp of the URL request, method of transaction, status of error code, and size in bytes of the transaction. For the Extended Log format, it includes the extra information such as a referrer and agent. Web logs were originally designed to help debugging web server. One of the fundamental flaws of analyzing web log data is that log files contain information about the files transferred from the server to the client not information about people visiting the web site 2

3 [9,19]. Some of these fields are useless for data mining and are filtered in the data preprocessing step. Some of them such as IP address, referrer and agent can reveal much about the site visitors and the web site. Mining the web store often starts with the web log data. Web log data need to go through a set of transformation before data mining algorithms can be applied. In order to have a complete picture of the customers, web usage data should include the web server access log, browser logs, user profiles, registration data, user sessions, cookies, user search keywords, and user business events [1,9,14]. Based on our practice and experience in web usage mining, we believe that web usage mining requires conflation of multiple data sources. The data needed to perform the analysis should consist of five main sources: (1) The web server logs recording the visitors click stream behaviors (pages template, cookie, transfer log, time stamp, IP address, agent, referrer etc.) (2) Product information (product hierarchy, manufacturer, price, color, size etc.) (3) Content information of the web site (image, gif, video clip etc.) (4) The customer purchase data (quantity of the products, payment amount and method, shipping address etc.) (5) Customer demographics information (age, gender, income, education level, Lifestyle etc.) Data collected in a typical web site categorize to different levels of granularity: page view, session, order item, order header, customer. A page view has the information such as type of the page, duration on the page. A session consists of a sequence of page views; an order contains a few order items. It is the best practice in the data collection phase to collect the finest granular and detailed data possible describing the clicks on the web server, and items sold at the web store. Each web server will potentially report different details, but at the lowest level, we should be able to obtain a record for every page hit and every item sold if we want to have a complete portfolio of the click behavior and sale situation of the web store. There are various methods to capture and collect valuable information for visitors for e- commerce at the server level, proxy level and client level through the CGI interface, Java API, JavaScript [1,9,14]. Most of them use web log data or packet sniffers as a data source for clickstream. Web log data are not sufficient for data mining purpose for the following main reasons: (1) Unable to identify the sessions (2) Lack of web store transaction data; the web store transaction records all sale related information of a web store and it is necessary for business analysis and data mining in order to answer some basic and important business questions such as which referrer site leads more product sale at my site?, what is the conversion rate of the web site, which part of my web sites are more attractive to purchaser?. (3) Lack of business events of web store; business events of a web store such as add a item to shopping car, research key event, abandoning shopping cart are very useful to analyze the user shopping and browsing behavior of a web store. In our framework, we believe that collecting data at the web application server layer is the most effective approach, as suggested by some commercial vendors [9,14]. The web application server controls all the user activities such as registration, logging in/out, and can create a unified database to store web log data, sale transaction data and business events of 3

4 the web site. The discussion of these methods is beyond the scope of this paper. For interested readers, please refer to [9, 14]. There are challenging issues in the data capture phase for web usage mining. The following challenges illustrate three problems: (1) how to sessionize the clickstream data; (2) how to filter crawler s sessions; and (3) how to gather customer s information. These challenges are the most popular ones encountered in almost all the web usage mining projects. And these problems have a huge impact on the success or failure of web usage mining projects. Below we discuss each of them in detail. 2.1 Session Data A user web session is a sequence of consecutive page views (hits) before the user explicitly logs out or times out. A user who visits a site in the morning and then again in the evening would count as two user visits (sessions). Because of the statelessness of HTTP, clickstream data is just a sequence of page hits, a page hit may be an isolated event that is hard to analyze without considering the context. To make the raw clickstream data usable in web usage mining, the clickstream needs to be collected and transformed in such a way that it has a session perspective. Thus the first task after the data collection is to identify the sessions for the clickstream stream (sessionizing the clickstream). In some web usage mining systems, during preprocessing, individual log entries are aggregated into server sessions according to the IP address and agent information. New sessions are also identified using a 30-minute intersession timeout period [23,24]. Within each session, the log entries are grouped into a separate request where each request may correspond to an individual user click or a search event. Nonetheless there are some serious problems when processing in this way. Many internet users utilize an Internet Service Provider (ISP), their IP address may be assigned dynamically, so it is very likely that the same user will have a different address in different sessions [6,7,14]. Another problem is that users behind a firewall can all have the same IP address; an IP address is not suitable as an identification variable for such sessions. Realizing the limitations of relying on the IP address; cookies are used as a workaround to solve this problem and to sessionize the clickstream in many web sites. A cookie is a mechanism that allows the web server to store its own information about a user on the user s hard driver. It is a small file that a web server sends to a web user and stores on his computer so that it can remember something about you at a later time. The location of the cookies depends on the browser. Internet Explorer stores each cookie as a separate file under a Window s subdirectory. Netscape stores all cookies in a single cookies.txt file. Sites to store customization information or to store user demographic data often use this information. The main purpose of cookies is to identify users and possibly prepare customized web pages for them. If the cookie is turned on, that means the user will send the cookie back to the web server each time his browser opens one of web pages and the web server can identify the requesting users computer unambiguously. The browser thus puts all the hits with the same cookie as one session until the user explicitly logs out or times out. In some situations, for privacy concerns, some users choose to turn off cookies, then the web site needs to use login id, referrer and agent information, if possible, to identify user and server sessions [9,23]. 4

5 2.2 Crawlers Session A crawler is a software agent that traverses web sites based on web linkages in web pages. Search engines use crawlers to index web pages and crawlers can help users to gather information such as prices for certain products, and help web designers to diagnose web site problems (such as response time, isolated web pages etc). Most crawlers adopt a breadthfirst retrieval strategy to increase their coverage of the web site. In our experience with some wet site data, at times up to 30% of site clickstream session traffic may be crawlers; these sessions are called crawler sessions. Crawler sessions may mislead data mining analysis to generate inaccurate or incorrect results if they are not filtered. For example, an associate algorithm is used to find the page click orders in a session, as pointed out in [1, 4,10,23], and an association rule mining algorithm may inadvertently generate frequent item sets involving web pages from different page categories. Such spurious patterns may lead an analyst of an e-commerce site to believe that web surfers are interested in products from various categories when in fact crawlers induce such patterns [9,23]. This problem can be avoided if web crawler sessions are removed from the data set during data preprocessing. Thus, identifying crawler sessions is very important for web usage mining. There are a few ways to identify a crawler session. In [23], they build a classification model to identify sessions. The crawler sessions may have some of the characteristics such as: images turned off, empty referrers, visit robots.txt file, page duration time is very short, pattern is a depth-first or breadth-first search of the site, never purchase [6]. Some web sites adopt the approach that creates a invisible link on a page and since only crawlers follow invisible links (regular users can t click invisible links), the session consists of the invisible links are considered to be a crawler session. 2.3 Customer Demographics (Offline Date) Retaining customers and increasing sales is the only way for an e-commerce web store to survive in this very competitive on-line market. To retain customers, you need to understand their needs and preferences. As pointed in [7,11,17], fostering and promoting repeated sales requires knowledge about customers preferences, consumption rate, behavior, and lifestyle. This knowledge generally requires knowing items such as a customer s income, age, gender, life style and so on. To find the best way to reach its customers and increase sales, it is necessary for a company to enrich the clickstream with this offline information. The user of demographics, psychographics, property information, household characteristics, individual characteristics, lifestyle, has been used by database marketing professionals to improve their sales, retain customers and acquire new customers for bricks-and-mortar stores for decades. This information should also be used in a web store to enhance the vast amount of customer and clickstream behavior already captured at the website. In the web store, customer information can be collected through a registration form, which is often limited. Some web site offers incentives to users to encourage them to register or answer a set of questions. The problem is that users tend not to give the information or provide inaccurate information in registration forms. Fortunately, there are many commercial marketing database vendors that collect this information based on zip code or physical addresses. This information should be integrated to web data for additional insight into the identity, attributes, lifestyles, and behaviors of the web site visitors and customers [17]. There are several sources of demographic information at various levels like CACI, Acxiom, and Experian, to name a few. CACI provides neighborhood demographics; 5

6 Acxiom gives household-level psychographics; and Experian provides the MOSAIC targeting system, which identifies consumers according to the type of neighborhood in which they live [17]. These external offline demographics can inform whom your online visitors and customers are, where they live, and subsequently how they think, behave, and are likely to react to your online offers and incentives. Database marketers have used this information for years to segment their customers and potential prospects. The demographics and socioeconomic profiles are aggregated from several sources including credit card issuers, county recorder offices, census records, and other cross-referenced statistics [17]. When analyzing and mining customer demographics data from web data, it should always be kept in mind the privacy of the customers. Profiling customers is bad when web sites fail to do it anonymously. 3. Data Webhouse Construction A data warehouse provides the data source for online analytical processing and data mining. Designing a proper data warehouse schema and populate the data from the OLTP system to the warehouse is very time consuming and complex. A well-designed data warehouse would feed business with the right information at the right time in order to make the right decisions in e-commerce system [20,21,9]. In Section 2, we discussed data capture methods for the web site, which collect the clickstream, sales, customers, shipments, payment, and product information etc. These data are on-line transaction data and are stored in the transaction database system (OLTP). The database schemas of the OLTP are based on E-R modeling, normalized to reduce redundancy in the database and designed to maintain atomicity, consistency and integrity so as to maintain speed and efficiency for use in day-to-day business operations such as insert, update and delete a transaction. For an OLTP query, it normally only need to access a small set of records in the databases but demand very quick responses. For web usage mining purposes, it needs to have a database schema (called a data warehouse) designed to support decision-making and data analysis (On-Line Analytical Processing). Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing. As a result, data warehouses are designed differently than traditional relational databases. Data warehouses use OLTP data for historical, read-only analysis. The data in a data warehouse system are normally organized in multidimensional modeling with star schema (fact tables plus the surrounding dimension tables). The requirement of clickstream data in the data warehouse make the schema design even more complicated. The web challenges the current view of the data warehouse with multiple new requirements [11]. The data warehouse is required to make the customer clickstream available for analysis, so a new term webhouse was coined by Ralph Kimball [11,12]. A webhouse plays an integral role in the web revolution as the analysis platform for all the behavior data arriving from the clickstream, as well as for the many web sites that rely on the data warehouse to customize and drive the end user s web experience in real time [11]. We use webhouse to refer to the data warehouse system for the web usage mining. The webhouse is the source data of data mining and business intelligence reporting in the Data Warehouse/OLAP framework and it contains the fundamental business content of what a web store sells with web services and capabilities. A webhouse should allow you to analyze all hits on a web site, all the products sold in the web store from many viewpoints. Many systems have developed to mine web log records, which can find association patterns and sequential patterns about web accessing, but in order to understand customers such as repeated visitors vs. single visitors, single purchase customers vs. multiple purchase customers, it is necessary to include additional information such as order information from 6

7 the web store, product information about the product, user browsing sequence from clickstream, and customers information from user table. Below we discuss the requirement analysis, and dimensional modeling technique to design the webhouse. 3.1 Requirement Analysis of the Webhouse It is necessary to build a comprehensive view of the immerse stream of clicks arriving at web sites, including items sold thorough the site. We want to build a webhouse to provide insightful information and answer the important business questions for e-commerce. The design of a webhouse starts with the requirement analysis. We spent significant time to interview our clients, business analysts, engineers/developers and end-users to gather their requirements and what kinds of business problems they hope to get answers from the webhouse. Their questions cover a wide ranges and areas: Web site activity (hourly, daily, weekly, monthly, quarterly etc) Product sale (by region, by brand, by domain, by browser type, by time etc) Customers (by type, by age, by gender, by region, buyer vs. visitor, heavy buyer vs. light buyer etc) Vendors (by type, by region, by price range etc) Referrers (by domain, by sale amount, by visit numbers etc) Navigational behavior pattern (top entry page, top exit page, killer age, hot page etc) Click conversion-ratio Shipments (by regular, by express mail etc) Payments (by cash, by credit card, e-money etc) Some of the important questions are Who are my most profitable customers? What is the difference between buyers and non-buyers at my site? Which parts of my site attract the most visits? Which part of my site is a session killer? Which parts of the sites lead to the most purchases? What is the typical click path that leads to a purchase? What is the typical path of customers who abandoned the shopping cart? What are the percentages of customers visiting the product section? What is the new-visitor click profile? What are the top/bottom products? What are the peak traffic hours? We analyze these questions and determine the dimensions need to be constructed and the fact measure the business analysts are interested in. After identify the dimension and measures, we can move to the next step: webhouse schema design 7

8 3.2 Webhouse Schema Design In the webhouse, there are one or a few fact tables and a set of small tables called dimensional tables. The fact table is where numerical measurements of the business are stored. Each of these measurements is taken at the intersection of all dimensions. The dimension tables are where the textual descriptions of the dimensions of the business are stored [11]. There are several design methodologies to design a data warehouse such as architecture-based methodology proposed by Anahory and Murraym [2], four-step methodology used by Ralph Kimball. To construct the webhouse for the Data Warehouse/OLAP framework, we adopted Kimball s methodology to build the webhouse through dimensional modeling techniques. The four-steps include (1) define the source of data, (2) choose the grain of the fact tables, (3) choose the dimensions appropriate for the grain, and (4) choose the facts appropriate for that grain. Below we discuss each step in detail: Define the Source Data Since we wish to analyze click behavior and the sales situation of a web site for an on-line web store, we need to have the data for every web hit on the web site and every item sold in the web site. Data collection is at the page request level (clickstream) or purchase in the order item level and all of this information is already available in the transaction database (OLTP). In the data collection phase, we have collected every web page view, purchase data and customer information in the web database system, which is the data source for our webhouse. We need to extract them from the transaction database and transform them into the webhouse according to the design of the dimension tables and fact tables, which is described later. In addition, we also need to have product, use, page, time, payment, shipping, and promotion information Choose the Grain of the Fact Tables The fact table is the center of the webhouse. It contains a list of all measures and points to the key value of the lowest level of each dimension. The lowest level of each dimension table, business problems and domain determines the granularity of the fact table. Before the fact tables can be designed in detail, a decision must be made as to what an individual lowlevel record in that fact table means. This is the grain of the fact table [11]. In order to analyze the clickstream, every page hit should have a row in the clickstream fact table, which is the grain of the clickstream. To analyze the sales business of the web store, every item sold should have a row for order item fact table. Thus the grain of the item fact tables in the webhouse is every item sold Choose the Dimensions Appropriate for the Grain Dimensions are qualifiers that give meaning to measures. They organize the data based on the what, when, and where components of a business question. Dimensions are stored in dimension tables made up of dimensional elements and attributes. Each dimension is composed of related items, or elements. Dimensions are hierarchies of related elements. Each element represents a different level of summarization. For example, products roll up to subcategories, which roll up to categories (which in turn roll up to department etc). The lowest level in the hierarchy is determined by the lowest level of detail required for the analysis. Levels higher than the base level store redundant data. This denormalized table reduces the number of joins required for a query, and makes it easier for users to start querying at a higher level and to drill down to lower levels of detail as needed. All of the elements relating to the product, for example, would comprise the product dimension. This 8

9 allows the user to query for all categories, and drill down to the subcategories or product level for more detailed information. Below we discuss some of the important dimensions in the webhouse. (The discussions of some other dimensions such as business event dimension, promotion dimension are omitted because of space limitations; these dimensions are almost the same in the webhouse compared with the traditional data warehouse) Session Dimension The session dimension is more than just a tag that groups together all the page events that constitute a single user's session. The session dimension is the place where we label the session and trace its activity [11], to describe the characteristics of the session. The characteristics of a session should consist of: session length, the total page requests of the session, the first pages of the session, last pages of the session, the referrer of the session, cookie id, user agent, client host, the first request time, and last request time, total browsing time of the session, average viewing time of each page of the session, session visit count. We may need to characterize sessions as sessions with purchase, sessions without purchase random browsing, crawler sessions etc. With this information, we are able to answer business questions such as which page my customer comes to my site (top first page of the sessions) and where they leave? (top last page of the session), what are the characteristic of the sessions lead to purchase? Page Dimension Site area analysis is very important in order to understand which part of the web site attracts most of the hits, which part leads to a purchase, which part of the site is a killer, which part of the site is less visited and superfluous. The page dimension should contain meaningful context that tells the analyst the user's web site location. Each web page must contain some simple descriptors identifying the location and type of page such as Log in, Registration Hot Product, "Product Info," "Company Info," "Frequently Asked Questions," and "Order Form." [12]. A large web site should have a hierarchical description associated with each page that gives progressively more detail about what constitutes the page. This information needs to be stored in the page dimension and be maintained consistently as we update and modify the web site. A page dimension should also contain such information as Page Template, Page Category, Number of Images, and Banners in the Page etc. Time Dimension The time dimension is very important in every data warehouse because every fact table in the data warehouse is a time series of observations of some sort. In traditional data warehouses, the time dimension is at a daily grain, but for the webhouse the granularity is finer. We have seen some webhouses record at the hourly grain or even minute level. Data column in a relational table normally has the format as year, month, day, hour, minute and seconds (YYYYMMDD:: HHMMSS). We need to create new attributes representing date of week, day of year, quarters from date column attributes. Since, in a web environment, we are analyzing both the click stream behavior and sales, it makes perfect sense to have two time hierarchies: One is more or less the traditional time dimension in the data warehouse: date related to day, week, month, quarter, years (may need to use data transformation function to construct new attributes and properties: weekday, weekend, holiday season etc) which is useful to compare sale across day, month, quarter, or year. The other time hierarchy is the time-of-the-day related to a specific spot within a day, hour, minute in the 9

10 day, (some derived useful attributes are early morning, late afternoon, evening, working hour, lunch break, etc). This time hierarchy is useful for site traffic analysis. User Dimension To obtain good customer profiles, variables describing the characteristics of the customer should be added. If available, this information is given in a data warehouse where all customer characteristics and historical information about click behavior are stored. To combine this information with the transaction data, the users must identify themselves when visiting the web site so the cookie id could be matched with their names and the transactional data can be merged with customer-relevant data. The customer dimension should contain information such as name, addresses, gender, age, demographics, and lifestyle. Identifying the user is very important to distinguish different types of visitor to the web site. In the user dimension we need to label users as single visitors, repeat visitors, visit with single purchase, visitor with multiple purchases, or most profitable customer based on the amount they spend. Based on the user dimension information, we should be able to answer business questions related to different user types. Product Dimension The product dimension describes the complete portfolio of what the web site sells on-line and the information varies from different on-line stores. For example Amazon.com has a very large product dimension than an on-line bank. Normally the product dimension should contain information such as product key, SKU description, product property (weight, size, color, package type etc), brand, sub category, department), price, manufacturer, warranty information Choose the Facts Appropriate for That Grain Choosing the appropriate fact measures for the grain in the fact table depends on the business objective and analysis purposes. For the clickstream fact, we can choose the time (number of seconds) the user spent on each page. For the order fact table, we can choose revenue, profit, cost, quantity and other measures. The star schema for the webhouse is constructed as shown in Figure Data Transformation Creating a warehouse is not enough because many important information are not in the data warehouse yet, for example, for the session, it is essential to know the number of pages, time spent, or the session leads to purchase or not. For the customer, it is necessary to create attributes such as whether the customers are repeat visitors, heavy-spender or occasional shoppers etc. These new attributes need to be created/derived from existing database columns to make data mining and reporting easier or even possible. There are two sets of transformations that need to take place: (1) data must be transferred from the OLTP systems to the OLAP systems, (2) data may need to go through some transformation or conversion to create some new values, which are not explicitly represented in the data warehouse. The first set of transformation is relatively stable and straightforward. There are a lot of ETL tools on the market for this purpose [10]. The second set of transformation provides a significant challenging for web usage mining since a lot of these transformations are related to the application domains and business goals. Typically in the web warehouse, the data are collected in the clickstream level. For data mining and business intelligence reporting purpose, the data in the data warehouse need to be transformed or aggregated to different level of granularity (session level, order-header level or customer level) depending on the 10

11 Session Dimension Business Event Dimension Time Dimension Session_id Session_length Referrer Agent Session_host_name Session_IPAddress Cookie_id Client_host First_request_time Last_request_time Total_time_spent Average_time_per/page Session_customer_id Session_visit_count SessionWPurchase_flag RandomBrowsing_flag CrawlerSession_flag Sessiontimeout_flag many more User Dimension User_id City State Country Gender Age Profession Education_level Marital_status Phone_# Repeat_visitor_flag Frequent_purchase_flag Date Heavy_spender_flag Dimension Reader/Browser_flag #OfKids House_income..many more Date Dimension Date_id Day Week Month Quarter Year Day_#_in_month Day_#_in_quarter Day_#_in_year Week_#_in_month Week_#_in_quarter Week_#_in_year Weekday_flag Weekend_flag Holiday_flag Season many more BusinessEvent_id BusinessEventType BusinessEventDesc Search_key_flag Shopping_cart_flag many more Clickstream Fact Table BusinessEvent_id Session_id Time_id User_id Page_id Date_id Product_id Page_view_time Quantity_ordered many more Order Item Fact Table Order_item_id Session_id Time_id User_id Page_id Date_id Product_id Promotion_id Order_item_price QuantitySold TotalCost Profit Revenue..many more Promotion Dimension Promotion_id PromitionName PriceReductionPct AdvType CoupleType BeginDate EndDate Promotion_cost Promition_region many more Figure 2: Star Schema of Webhouse Time_id Second Minute Hour EarlyMorning_flag LateAfternoon_flag LunchTime_flag DinnerTim_flag LateEvening_flag many more Page Dimension Page_id PageTemplate PageLocation PageType PageCategory PageDescription Registration_page_flag Shipping_page_flag Checkout_page_flg NumOfProducts NumOfImage NumberOfBanner many more Product Dimension Product_id SKUDesciption. Brand SubCategory Dept. Size Color Weight Price Manufacturer Warranty_info.. many more 11

12 mining and reporting goals. For example if the analyst is interested in the difference between session with purchase and without purchase, then the transformation/aggregation operations need to be performed to convert clickstream data into session level. If she wants to understand the customers such as what are the characteristics of the most-profitable customers, then the data need to transform/aggregated further from session level to customer level. There are 3 types of transformations in the web usage mining context: 1. Generalizing/extracting the primitive values to high level values. For example, the referrer column for each click session has too many different values but some useful information is embedded in it, so it is useful to create new columns from it, such as the host of the referrer and the domain of the referrer. Similarly, new columns such as domain and host from ISPs and customer s 2. Grouping/summarizing information from multiple columns. For example, in the customer preference survey, there are columns such as Prefer Basketball, Prefer Football, and Prefer Baseball corresponding to customer s first, second, third preferred sport. For mining or reporting purpose, it s better to increase the granularity by generating new column to indicate the customer preference philosophy. 3. The third type of new column created is for inferring information not directly available from existing database columns. For example, to have an image of a customer s product page views, it needs to know whether a click record is a product page view from Brands which is not directly available. This information can be inferred from the Template and Referrer columns. Based on our experience, below are some of the typical data transformation operations we found are very popular and useful for web usage mining and reporting. (1) Click Level Transformation Transformation Transformation Description Result Name Referrer indicator for a product page Creates an indicator variable for referrer for an arbitrary product page. Product detail page views are important information for a webstore. Within a webstore, you can go to a product page from different places depending on how the webstore is designed. (e.g., ViewfromHotArea, ViewfromGifts). To analyzing the click stream data, it is helpful to know which area each product page view comes from and it is defined based on the Type, Template, and Referrer columns: Type Boolean Page view time The number of seconds that a person spends viewing a page Double Credit card indicator (MasterCard, Visa, AMX etc) Indicate whether the transaction was completed using which type of credit card Boolean Decode the query string The operation returns the search results of the search arguments String the customers typed while they surfed the web site. Path of session This operation pulls the templates of the session into a long String string. Detailed path of the session Similar to the operation above, except you will get the detailed String dynamic generated page Last page of the session Return the last page of the session String First page of the session Return the first page of the session String Clickarea The column tells which area a click is on, or None for nonclick. Boolean 12

13 click. Click tags Check whether a click is a giftbox view or shopping cart view. String Purchases of products that appear on Whats_hot pages It is very useful to know who bought products from the Whats_hot pages, or what products were bought from the Whats_hot pages. However, this is very hard to be done without event log. What can be done is to find purchases of products that appear on Whats_hot pages. Note that these products may appear on other pages and customers can buy Boolean When did a customer fill the registration (survey) form? them there. Web site normally has an optional registration form that contains some survey questions. Knowing answers to these questions can help to understand customers better (before any purchases, or after purchases), Boolean The above transformations can capture a lot of essential information for reporting and help the business analysts understand and improve the website performance and function, increase customer satisfaction. For example, use the query decode string transformation, it can capture the top 10 failed search key words from the customers as shown in Table 1 from a real online motor store. Fat boy, Chrome are the most popular items the customers are looking for. Then the store manager can make a decision to add these items to the webstore if a lot of customers showed interests in these items. Search String # Of Search Fat boy 1566 Chrome 791 Motorclothes 443 Gtype Fuel tank 325 G-sportster 280 maintenance 260 C-sidecar 210 sissy bar 175 seat 169 touring 163 Table 1: Top 10 Failed Search (2) Session Level Transformation Transformation Name Transformation Description Result Type Customer browser name Return a string containing browser s name from useragent. If String the flag is true,otherwise group all unknown browser s names to Others Browse release The release number of the browser given the useragent string. String The main release number is for Mozilla. It will contain the release number for MS Internet Explorer inside () if the browser is an IE browser. It contains AOL and its release number inside [] if the browser is an AOL browser. Browse OS The OS running the browse String Returned visitor True if the user is a returned visitor Boolean Session Length The total clicks of this session Integer 13

14 Long session Indicate whether the session is a long one or not (more than 6 Boolean clicks) Short session Indicate whether the session is a long one or not (1 0r 2 clicks) Boolean Session duration The total time spent on this session Double Referrer host Host of the referrer String Referrer domain Domain of the referrer String Url site Return the url site such as YAHOO, Excite and so on String ISP host Internet Service Provide Host String What day it is of the first A number to indicate which day it is for the first visit Double visit What day it is of the last A number to indicate which day it is for the last visit Double visit Is the visit a weekend Indicate whether the visit happen is a weekend or not Boolean Is the visit a weekday Indicate whether the visit happen is a weekday or not Boolean Any purchase on this Indicate whether the session leads any purchase or not Boolean session Purchase amount in In addition to the numbers of product detail page views that Double different areas came from different areas, it is also important to know customers purchase amount from each area. Since it is hard trace where a purchase come from precisely, it can be estimated by distributing the total purchase amount to each area using the proportion of the number of product detail page views from this area. Purchase quantity in Similar to customers purchase amount from each area, it is Double different areas necessary to know customers purchase quantity from each area. HourofDayof the server Shows the visitor s time from the first request date based on the Double location of the server Time period of the day Based on the HourofDay, you can add more columns to indicate whether the visit time is in the morning, or evening and so on (such as early morning, late evening, lunch time etc) Boolean Table 2 below shows the top 10 path of a website without any purchase. These paths can help the website understand the customer click behavior and reveal a lot of reasons why the customers left the website without purchase. For example, the top one path is main.jsp-> splash.jsp (14622 sessions). The customers visited the main.jsp and then left the website after they clicked the splash.jsp. A further analysis by the web designer found out that the splash.jsp took a while to compile and download and the effect of the animation of the spash.jsp make a huge portion of the contents of the store invisible, thus caused a lot of frustrations among the customers, so they left. After the splash.jsp was removed, the conversion rates got improved significantly Web Path Count main.jsp->splash.jsp main.jsp->main.jsp 3731 main.jsp->main.jsp->main.jsp 790 main.jsp->login.jsp 329 main.jsp->hot.jsp->registration.jsp 303 Login.jsp 274 main.jsp->survey.jsp 216 product.jsp 212 main.jsp->product.jsp 192 main.jsp->search.jsp

15 Table 2 : Top 10 Paths Lead to Non-Purchased Sessions (3) Customer Level Transformation Transformation Name Transformation Description Result Type domain The domain name is the portion of the address after the last String period such as COM, NET, EDU etc. hostname The hostname is the portion of the address after the at sign String (@) and before the last period (.) Time zone Time zone of the customer Areacode Area code of the customer s phone number String Country region Country region of the customer String Repeat buyers Indicator whether the visitor is a repeated buyer or not Boolean Single visit customer Customer only visit once and no purchase Boolean Multiple visit customer Customer visit multiple time but no purchase Boolean Single visit buyer Customer visit once and have purchase Boolean Multiple visit buyer Customer visit multiple times and have at least one purchase Boolean Profit ration (Average Profit ratio is defined to be the total number of sales divided by Double revenue per visit) total number of visits Propensity to purchase Indicate the likelihood the visit is going to purchase something. Double ratio Things preferred and things really bought In the survey form, there are questions like preferred brands preferred products special needs It should be valuable information if we know the correlation between what a customer prefers and what he/she buys. String The customer level transformation created a lot of new columns in the data warehouse to make the reporting and data mining easier and more meaningful at the customer level. For example, identify whether a customer is single visitor, buyer, repeated buyers etc, is very important for the webstore. Table 3 can reveal how many customers are loyal customers, occasional shoppers, or just pure visitors. Type Count Single Visit 1823 Multiple Visit 37 Single Visit Buyer 269 Multiple Visit Buyer 58 Unknown 2846 Table 3: Single/Multiple visitors/buyers After the data transformations are done, data in the web house are organized into different level. Below are some of the most useful summary tables and facts table for web usage mining and reporting. CLICK_LINES SESSIONS CUSTOMERS GIFT_LINES ORDER_LINE ORDER_HEADERS PROMOTIONS A row for each Web page viewed A row for each Web session A row for each customer A row for each gift registry item of each customer Contains a row for each order line of each order A row for each order of each customer A row for each promotion folder and promotion defined in the system 15

16 LINE_ITEMS ORDER_LINES joins with CUSTOMER, ORDER_HEADERS, PRODUCTS, ASSORTMENT, PROMOTIONS Table 4: Some Summary and Facts Table in the Web House 4. Pattern Discovery: A Data Warehouse/OLAP Approach Data Warehouse/OLAP (On Line Analytical Processing) is an approach of integrating data mining, data warehousing and OLAP technologies. OLAP systems pre-calculate summary information (data cubes) to enable drilling, pivoting, slicing and dicing, filtering to analyze business from multiple angles or views (dimensions). Web mining your site in the webhouse can reveal actionable and meaningful patterns for users and useful click sequence for the web site design. Below we discuss each of them in details. 4.1 Construct Cubes from Webhouse A data cube is pre-calculated summary data organized in a way that the cells of the cube contain measured values and the edges of the cube define the natural dimensions of the data. (The data cube may have more than 3 dimensions so technically it should be called a hypercube). The dimensional elements in the cube are organized in a hierarchy and you can roll-up and/or drill down the dimension hierarchy to get a different view or understanding about the cube data. A data cube offers the benefits for data analysis such as an immediate response to a business query, the ability to drill down and roll up the multiple dimensional data in the cube, to analyze business measures such as profit, revenue, quantity from different angles, perspectives and various ancillary factors. We can create two cubes from the webhouse as shown in Figure 2, one cube for the clickstream, and another cube for the order item based on the fact table clickstream, the ordered item and session, product, user, page, and time dimension table. In the webhouse, we already have data organized based on a multiple dimensional model. All that is required is to plug into the OLAP software. There are many OLAP tools such as MS OLAP, Cognos, and Essbase to choose from to build large cubes. Cutting and dicing these cubes reveal significant information about your web site and sales situations. For example, we can find out the top pages of the sites, top domains, top browsers, the view time of top page, top exits page of the site, top referrer of the site, top products by sales, quantity, top referrers by page request, sale, quantity, users, web site activity based on day, month or even hour, minutes. We can also find out who our visitors include, how much they spend, the sale cycles etc. From the OLAP cubes, many business intelligence reports can be derived. Business reports are the most important tool for business analysts but are unappreciated by a lot of companies. Business intelligent reports can provide many insightful information about the web store such as sales of products across different refers, best selling products/bottom products, top domains/bottom domain, top searched keywords etc. 4.2 Mining the Webhouse Data OLAP is a key component of this approach, but OLAP alone is not good enough for e- commerce applications. Some of the challenging issues cannot be answered by examining the measured values in the cubes. For example, to answer the following question such as Given a set of page views, will the visitor view another page on the site or will the visitor leave, it is very difficult if not impossible to find a satisfactory answer to them based on the OLAP cube data from the webhouse. A lot of mining algorithms and methods such as association algorithm, decision tree, neural network, Bayesian algorithm, clustering method etc can be applied for web usage mining to derive insightful knowledge rules to understand 16

17 the business and customers, build prediction models for classification and generate campaign scores for product promotion. Below we discuss how these algorithms can help to solve some of the challenging problems for the e-commerce Association Rules Association rule algorithms were originally designed to analyze market basket data to find correlations in items purchased together, like if a customer buys product A, what is the likelihood that he will buy product B. In the web usage mining, association rule algorithms can be used for two purposes. First, analyzing the on-line purchase data to determine which products are sold together by on-line customers (similar to the traditional supermarket basket data analysis). On-line shopping databases contain historical data on prior customer choices where each customer has selected a subset of products. This data can be used to generate a dynamic recommendation of new items to a customer who is in the process of making the item choice. Another use of the association rule algorithm is to analyze the page view hits in a session. Websites also display dynamically changing set of links to related sites depending on the browsing pattern during a surfing session. Use of the adapted association algorithm can find related pages that are often visited together. The pages may not have hyperlinks between them. As a result of association rule analysis, it is possible to optimize the web site structure and detect drawbacks that had not been obvious in the past. This information may help the web designer to redesign their web site (add direct link between those strong correlated pages), it may also help web server do per-fetching or precompiling the web pages (presently many web sites have dynamic page generation) to reduce the user waiting time. Websites also display dynamically changing set of links to related sites depending on the browsing pattern during a surfing session. However we feel that the recommendation is inherently a different problem, the main reason for this is that preferences are due largely to the taste and interest. When a customer surf the webstore, whether purchasing or visiting web site, not all actions (put a items into shopping cart, or click thorough different web pages) are elected because of their association with some previous actions (other item already in the cart, or already went thorough some previous pages) [8]. We believe there are two behaviors: renew choice or association choice. Starting from the scratch, some need drives the customer to select the click first page/first item, which is due to some independent need that we call the renewal choice. After the first move, a customer may stop, or click another page/select another item by association or by another renewal choice, iteratively. We propose a hybrid approach (statistical association rule approach) to compute the probability of a new move becoming the next choice given the current status, and make a recommendation list based on a ranking of this probability. What makes this approach different from the usual association rule approaches, is that they account not only for the choice making, or buying associated with the items present in the shopping cart (associative buying), but also for the fact that a customer exercises an independent choice unrelated to the existing items in the shopping cart (renew buying). We compute the probability of both renewal choice and associative choice given the items in the shopping cart, and obtain the probabilities for each item given the partial basket content, and obtain the probabilities for each item given one of these two buying modes. The results from this analysis are very useful for promoting cross-sell and up-selling for the online web store. Based on this consideration, we tested on one client s 17

18 site and the association rules in Table 5 reveals that this approach generates more meaningful and actionable associations. #Of Rela- Lift Support Confidence Rule Rule tions (%) (%) Bloom ==> Dirty_Girl Dirty_Girl ==> Bloom Philosophy ==> Bloom Bloom ==> Philosophy Dirty_Girl ==> Blue_Q Blue_Q ==> Dirty_Girl Tony_And_Tina ==> Girl Philosophy ==> Tony_And_Tina Tony_And_Tina ==> Philosophy Demeter_Fragrances ==> Smell_This Girl ==> Tony_And_Tina Smell_This ==> Demeter_Fragrances Table 5: Associations in a beauty-supply web store Classification/Prediction Classification/prediction is a very popular data mining technique to build a model based on the training data and then apply the model to assign a new item to a certain class. There are many algorithms such as decision trees, neural networks, Bayesian networks, and probability theory for classification. For example, to understand the customers who spend more than $12 in the web site, you can use decision tree algorithm to build a model, which may reveal such pattern as: the customer spends more than $12 are the single female, age between and make more than a year. Another application for classification/prediction is target-oriented campaign. Mass campaign has a very low response rate, typically 2-3%. In target-oriented campaign, company only send campaign message to a small portion of customer who are mostly like to respond. Even though sending s to all on-line customer is very cheap, It is still important to target effectively as suggested by Berry and Linoff because customers might read one target , are less likely to read a lot of junk messages [4,6]. And another important reason is that if the customers are fed up with these off-target messages, they can revoke their permission to be contacted by the web store. To identify who are mostly like to respond to your campaign, avoid generating too many off-target s and improve the service equality, we can build a prediction model based on the historical data (which has the responders vs. non-responders in the past campaigns) then apply the prediction model to the current customers and sort the customer list with a probability score, the top of the sorted list are those customers who are likely to respond the campaign Clustering Clustering techniques are useful when there are no classes to be classified or predicted. Clustering algorithms group a set of objects into different groups based on measures so that the objects in the same group are similar to each other and objects in different groups are different. In web usage mining, clustering algorithms can be used in several ways: (1) Profiling customers based on some features such as purchasing amount, region, purchased products. For example, we group customers into different groups such as heavy-spenders, light-spenders, or browsers based on the amount. We can extract 18

19 similar features from this cluster and find out heavy-spenders are mostly young technical professionals, single male. The results of clustering of web data can help on-line store to identify proper customer segments with common characteristics, and target these segments for campaign or product promotion, make special offer tailored to their needs and requirements. (2) Clustering navigational path of web hits. As shown in [7,16], clustering navigational path is very important for user segmentation, the result can help web designer understand, or predict visitors navigation pattern to make the web site more efficient or more close to the visitors preference. For example, if the clustering results shows Page P1, P2, P3 are in the same cluster, then the web server can prefetch Pages P2 and P3 or pre-compile Page P2, P3 while the user is still viewing Page P1 to reduce the loading time or compile time, that way it help reduce the user waiting latency. Another potential use is to find subsets of the users that would benefit from sharing a single web cache rather than using individual ones. 5. Pattern Evaluations and Deployment In the Data Darehouse/OLAP framework, the last step is to evaluate the mining results and then adopt actionable results. After the mining algorithms are applied, many patterns may be identified but not all of them are interesting or actionable. Unlike most of the pattern evaluation approaches, which rely on an SQL statement to query the database and evaluate the results, in our Data Warehouse/OLAP framework, the data cube is an essential component in the mining procedure and we can dice and roll up the data cube to easily verify mining results. After the mining patterns are verified to be golden nuggets, data miners and data analysts can take proper actions based on useful actionable mining results. In traditional data mining applications, it is always challenging or time consuming to convince the organization to take actions based on the mining results to improve the business. For example, in a brick-and-mortar store, if the data mining results reveal the customers buy product A tend to buy product B, in order to create the cross-sell opportunity based on this finding, some possible actions are to put the products A and B together in the same shelf, which need to physically move the products A and B from different shelves. In a web-store, such discovery can be made easily without much cost and hassle. It is very flexible to change the web site design, layout and put relevant product information together to create cross-sell and upper-sell opportunity. Another example is customer campaign. A lot of companies send marketing campaign information such as catalog and coupon via snail mail based on data mining prediction models, the whole procedure normally takes a few months and costs of millions dollars. But in a web-store, sending campaign s to massive customers is already a key component of E-commerce system, and target customers based on data mining findings via to promote product is easy to implement and these data mining findings can quickly bring more revenues to the web-store. 6. Conclusion E-commerce webstore provides a killer domain for data mining application. In this paper we have proposed a framework for web usage mining and business intelligence reporting. We address some of the key problems and issues in the web usage mining application. We use web application server to collect all the relevant data (clickstream, transaction, customer information) for the analyze purpose and provide a unified database schema for difference data source. The construction of the webhouse is an integral part of our framework, which provides an integrated environment for data collection, data 19

20 transformation. In the framework, we integrate the data warehouse construction, data mining, business intelligence reporting and pattern deployment into e-commerce system smoothly. This tight integration significantly reduces the total time and effort to build a data mining system for web usage mining. We provided a general approach and guidelines for on-line web-stores to mine their web data and generate business intelligence reports. We identify some of the challenging problems and pitfalls in each phase and provide possible solutions to them. Our framework focus on on-line web store and some of the ideas have been implemented in some commercial web usage mining system. We believe this framework can be adapted to apply in some other domain such as Business-to-Business. The framework and ideas presented in the paper have been implemented in some commercial web usage mining systems through the first author s consulting engagement with some industry vendors. There are some other challenging problems in web usage mining such as how to scale the web mining algorithms to handle large amount of data in the 100G or even terabyte range (some of the large e-commerce site like Yahoo handle 1 billion page view a day). The scalability is crucial for a successful e-commerce system. We hope to report our findings in this research topic in the near future. 7. References [1] Accrue Software Inc, Driving Business Decision in Web Time, Web Mining Whitepaper [2] Anahory,S. and Murray,D., Data Warehousing in the Real World, Addison Wesley, 1997 [3] Suhail Ansari, Ron Kohavi, Llew Mason and Zijian Zheng, Integrating E-Commerce and Data Mining: Architecture and Challenges, WebKDD2000 Workshop [4] Jonathan Becher, Ronny Kahavi, Tutorial on E-commerce and Clickstream Mining, First SIAM International Conference on Data Mining. [5] Michael Berry, Gordon Linoff, Mastering Data Mining: The Art and Science of Customer Relationship Management, John Wiley & Sons [6] Catledge L. and Pitkow J., Characterizing browsing behaviors on the world wide web, Computer Networks and ISDN Systems, 27(6), 1995 [7] Domingos P., Hulten G., A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering, Prod of the ICML-2001 [8] Hong S.J., Natarajan R., Belitskaya I., A New Approach for Item Choice Recommendation [9] Hu X., Cercone N., An OLAM approach for Web Usage Mining, Prod. o 2002 IEEE Fuzzy Systems [10] Kdnuggets.com [11] Ralph Kimball The Data Warehouse Toolkit, John Willey and Sons, 1996 [12] Ralph Kimball, Clicking with your Customer, Intelligence Enterprise, Intelligent Enterprise, Jan 05, 1999, Vol 2, No. 1 [13] Ralph Kimball, Richard Merz, The Data Webhouse Toolkit: Building the Web-Enabled Data Warehouse, John Willey and Sons, 2002 [14] Ronny Kohavi, Mining E-Commence Data: The Good, the Bad and the Ugly, Invited paper at SIGKDD 2001 Industry track [15] Ronny Kohavi and Foster Provost, Application of Data Mining to Electronic Commerce, Data Mining and Knowledge Discovery, 5(1), 2001 [16] Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey, ACM SIGKDD, July 2002, Vol 2, Issue 1 [17] Jesun Mena, Data Mining Your Website, Digital Press, [18] Jesus Mena, Beyond the Shopping Cart, Intelligent Enterprise, Jan 05, 1999, Vol 2, No. 1 [19] Sane Solution, Analyzing Web Site Traffic, 2002, 20

A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting

A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting Xiaohua Hu, 1, * Nick Cercone 2, 1 College of Information Science, Drexel University, Philadelphia,

More information

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users 1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data

More information

Web Analytics Definitions Approved August 16, 2007

Web Analytics Definitions Approved August 16, 2007 Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 standards@webanalyticsassociation.org 1-800-349-1070 Licensed under a Creative

More information

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

More information

Click stream reporting & analysis for website optimization

Click stream reporting & analysis for website optimization Click stream reporting & analysis for website optimization Richard Doherty e-intelligence Program Manager SAS Institute EMEA What is Click Stream Reporting?! Potential customers, or visitors, navigate

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING The Data Webhouse Toolkit Building the Web-Enabled Data Warehouse Ralph Kimball Richard Merz WILEY COMPUTER PUBLISHING John Wiley & Sons, Inc. New York Chichester Weinheim Brisbane Singapore Toronto Contents

More information

HOW DOES GOOGLE ANALYTICS HELP ME?

HOW DOES GOOGLE ANALYTICS HELP ME? Google Analytics HOW DOES GOOGLE ANALYTICS HELP ME? Google Analytics tells you how visitors found your site and how they interact with it. You'll be able to compare the behavior and profitability of visitors

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Chapter 6 - Enhancing Business Intelligence Using Information Systems Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning

More information

Web Traffic Capture. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

Web Traffic Capture. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com Web Traffic Capture Capture your web traffic, filtered and transformed, ready for your applications without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite

More information

Data Warehouse design

Data Warehouse design Data Warehouse design Design of Enterprise Systems University of Pavia 21/11/2013-1- Data Warehouse design DATA PRESENTATION - 2- BI Reporting Success Factors BI platform success factors include: Performance

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

1 Which of the following questions can be answered using the goal flow report?

1 Which of the following questions can be answered using the goal flow report? 1 Which of the following questions can be answered using the goal flow report? [A] Are there a lot of unexpected exits from a step in the middle of my conversion funnel? [B] Do visitors usually start my

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data

More information

Analyzing the footsteps of your customers

Analyzing the footsteps of your customers Analyzing the footsteps of your customers - A case study by ASK net and SAS Institute GmbH - Christiane Theusinger 1 Klaus-Peter Huber 2 Abstract As on-line presence becomes very important in today s e-commerce

More information

web analytics ...and beyond Not just for beginners, We are interested in your thoughts:

web analytics ...and beyond Not just for beginners, We are interested in your thoughts: web analytics 201 Not just for beginners, This primer is designed to help clarify some of the major challenges faced by marketers today, such as:...and beyond -defining KPIs in a complex environment -organizing

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

Demystifying Digital Introduction to Google Analytics. Mal Chia Digital Account Director

Demystifying Digital Introduction to Google Analytics. Mal Chia Digital Account Director Demystifying Digital Introduction to Google Analytics Mal Chia Digital Account Director @malchia @communikateetal Slides will be emailed after the session 2 Workshop Overview 1. Introduction 2. Getting

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Google Analytics Guide. for BUSINESS OWNERS. By David Weichel & Chris Pezzoli. Presented By

Google Analytics Guide. for BUSINESS OWNERS. By David Weichel & Chris Pezzoli. Presented By Google Analytics Guide for BUSINESS OWNERS By David Weichel & Chris Pezzoli Presented By Google Analytics Guide for Ecommerce Business Owners Contents Introduction... 3 Overview of Google Analytics...

More information

Alexander Nikov. 7. ecommerce Marketing Concepts. Consumers Online: The Internet Audience and Consumer Behavior. Outline

Alexander Nikov. 7. ecommerce Marketing Concepts. Consumers Online: The Internet Audience and Consumer Behavior. Outline INFO 3435 E-Commerce Teaching Objectives 7. ecommerce Marketing Concepts Alexander Nikov Identify the key features of the Internet audience. Discuss the basic concepts of consumer behavior and purchasing

More information

Arti Tyagi Sunita Choudhary

Arti Tyagi Sunita Choudhary Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 sheetal.raiyani@gmail.com

More information

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS 3.1 Introduction In this thesis work, a model is developed in a structured way to mine the frequent patterns in e-commerce domain. Designing and implementing

More information

WEBSITE ANALYSIS OVERVIEW

WEBSITE ANALYSIS OVERVIEW WEBSITE ANALSIS OVERVIEW Key Analysis Areas Web Traffic Web Visitors Web Navigation ecommerce Customers in this Area Include: BBC Worldwide Caja Duero La Caixa LendingTree.com Lexmark International Nygård

More information

Customer Relationship Management

Customer Relationship Management Customer Relationship Management CRM is Any application or initiative designed to help an organization optimize interactions with customers, suppliers, or prospects via one or more touch points for the

More information

Dimensional Data Modeling for the Data Warehouse

Dimensional Data Modeling for the Data Warehouse Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Dimensional Data Modeling for the Data Warehouse Prerequisites Students should

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

GOOGLE ANALYTICS 101

GOOGLE ANALYTICS 101 GOOGLE ANALYTICS 101 Presented By Adrienne C. Dupree Please feel free to share this report with anyone who is interested in the topic of building a profitable online business. Simply forward it to them

More information

Sterling Business Intelligence

Sterling Business Intelligence Sterling Business Intelligence Release Note Release 9.0 March 2010 Copyright 2010 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:

More information

Privacy Policy - LuxTNT.com

Privacy Policy - LuxTNT.com Privacy Policy - LuxTNT.com Overview TNT Luxury Group Limited (the owner of LuxTNT.com). knows that you care how information about you is used and shared, and we appreciate your trust that we will do so

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA

EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA Articles 37 Econ Lit C8 EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA Assoc. prof. Snezhana Sulova, PhD Introduction Today increasing numbers of commercial companies are using the electronic

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,

More information

Analytics case study

Analytics case study Analytics case study Carer s Allowance service (DWP) Ashraf Chohan Performance Analyst Government Digital Service (GDS) Contents Introduction... 3 The Carer s Allowance exemplar... 3 Meeting the digital

More information

Google Analytics Health Check Laying the foundations for successful analytics and optimisation

Google Analytics Health Check Laying the foundations for successful analytics and optimisation Google Analytics Health Check Laying the foundations for successful analytics and optimisation Google Analytics Property [UA-1234567-1] Domain [Client URL] Date of Review MMM YYYY Consultant [Consultant

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Why Business Intelligence

Why Business Intelligence Why Business Intelligence Ferruccio Ferrando z IT Specialist Techline Italy March 2011 page 1 di 11 1.1 The origins In the '50s economic boom, when demand and production were very high, the only concern

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Setting Up Solar Web Commerce. Release 8.6.9

Setting Up Solar Web Commerce. Release 8.6.9 Setting Up Solar Web Commerce Release 8.6.9 Legal Notices 2011 Epicor Software Corporation. All rights reserved. Unauthorized reproduction is a violation of applicable laws. Epicor and the Epicor logo

More information

SKoolAide Privacy Policy

SKoolAide Privacy Policy SKoolAide Privacy Policy Welcome to SKoolAide. SKoolAide, LLC offers online education related services and applications that allow users to share content on the Web more easily. In addition to the sharing

More information

Google Analytics Basics

Google Analytics Basics Google Analytics Basics Contents Google Analytics: An Introduction...3 Google Analytics Features... 3 Google Analytics Interface... Changing the Date Range... 8 Graphs... 9 Put Stats into Context... 10

More information

Regain Your Privacy on the Internet

Regain Your Privacy on the Internet Regain Your Privacy on the Internet by Boris Loza, PhD, CISSP from SafePatrol Solutions Inc. You'd probably be surprised if you knew what information about yourself is available on the Internet! Do you

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

MINING CLICKSTREAM-BASED DATA CUBES

MINING CLICKSTREAM-BASED DATA CUBES MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

Sterling Business Intelligence

Sterling Business Intelligence Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

Multidimensional Modeling - Stocks

Multidimensional Modeling - Stocks Bases de Dados e Data Warehouse 06 BDDW 2006/2007 Notice! Author " João Moura Pires (jmp@di.fct.unl.pt)! This material can be freely used for personal or academic purposes without any previous authorization

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

graphical Systems for Website Design

graphical Systems for Website Design 2005 Linux Web Host. All rights reserved. The content of this manual is furnished under license and may be used or copied only in accordance with this license. No part of this publication may be reproduced,

More information

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS

More information

Friends Asking Friends 2.94. New Features Guide

Friends Asking Friends 2.94. New Features Guide Friends Asking Friends 2.94 New Features Guide 8/10/2012 Friends Asking Friends 2.94 Friends Asking Friends US 2012 Blackbaud, Inc. This publication, or any part thereof, may not be reproduced or transmitted

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

Data W a Ware r house house and and OLAP II Week 6 1

Data W a Ware r house house and and OLAP II Week 6 1 Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot

More information

An Ideal E-Commerce Architecture for Building Web Sites Supporting Analysis and Personalization

An Ideal E-Commerce Architecture for Building Web Sites Supporting Analysis and Personalization 1 Information Organization and Retrieval Class, Berkeley Oct 19, 2000 An Ideal E-Commerce Architecture for Building Web Sites Supporting Analysis and Personalization Ronny Kohavi, Ph.D. Director of Data

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey 1 Google Analytics for Robust Website Analytics Deepika Verma, Depanwita Seal, Atul Pandey 2 Table of Contents I. INTRODUCTION...3 II. Method for obtaining data for web analysis...3 III. Types of metrics

More information

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights Our Data & Methodology Understanding the Digital World by Turning Data into Insights Understanding Today s Digital World SimilarWeb provides data and insights to help businesses make better decisions,

More information

Internet Advertising Glossary Internet Advertising Glossary

Internet Advertising Glossary Internet Advertising Glossary Internet Advertising Glossary Internet Advertising Glossary The Council Advertising Network bring the benefits of national web advertising to your local community. With more and more members joining the

More information

Business white paper. The road to strategic website design The Optimost Web Optimization Maturity Model

Business white paper. The road to strategic website design The Optimost Web Optimization Maturity Model Business white paper The road to strategic website design The Optimost Web Optimization Maturity Model Digital marketers spend large sums attracting website traffic. Much of that investment is wasted.

More information

The Fundamentals of B2C Marketing Automation for Effective Marketing Communications

The Fundamentals of B2C Marketing Automation for Effective Marketing Communications The Fundamentals of B2C Marketing Automation for Effective Marketing Communications Mark Patron February 2013 Email and Website Optimisation Introduction Marketing automation is a process that uses insight

More information

Index. AdWords, 182 AJAX Cart, 129 Attribution, 174

Index. AdWords, 182 AJAX Cart, 129 Attribution, 174 Index A AdWords, 182 AJAX Cart, 129 Attribution, 174 B BigQuery, Big Data Analysis create reports, 238 GA-BigQuery integration, 238 GA data, 241 hierarchy structure, 238 query language (see also Data selection,

More information

Business Intelligence Solutions for Gaming and Hospitality

Business Intelligence Solutions for Gaming and Hospitality Business Intelligence Solutions for Gaming and Hospitality Prepared by: Mario Perkins Qualex Consulting Services, Inc. Suzanne Fiero SAS Objective Summary 2 Objective Summary The rise in popularity and

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK 5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK CUSTOMER JOURNEY Technology is radically transforming the customer journey. Today s customers are more empowered and connected

More information

Web Analytics and the Importance of a Multi-Modal Approach to Metrics

Web Analytics and the Importance of a Multi-Modal Approach to Metrics Web Analytics Strategy Prepared By: Title: Prepared By: Web Analytics Strategy Unilytics Corporation Date Created: March 22, 2010 Last Updated: May 3, 2010 P a g e i Table of Contents Web Analytics Strategy...

More information

SAS BI Dashboard 3.1. User s Guide

SAS BI Dashboard 3.1. User s Guide SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard

More information

INFO 1400. Koffka Khan. Tutorial 6

INFO 1400. Koffka Khan. Tutorial 6 INFO 1400 Koffka Khan Tutorial 6 Running Case Assignment: Improving Decision Making: Redesigning the Customer Database Dirt Bikes U.S.A. sells primarily through its distributors. It maintains a small customer

More information

BUSINESS IMPACT OF POOR WEB PERFORMANCE

BUSINESS IMPACT OF POOR WEB PERFORMANCE WHITE PAPER: WEB PERFORMANCE TESTING Everyone wants more traffic to their web site, right? More web traffic surely means more revenue, more conversions and reduced costs. But what happens if your web site

More information

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) Data Driven Success Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) In business, data is everything. Regardless of the products or services you sell or the systems you support,

More information

Business Intelligence, Analytics & Reporting: Glossary of Terms

Business Intelligence, Analytics & Reporting: Glossary of Terms Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report

More information

How We Use Your Personal Information On An Afinion International Ab And Afion International And Afinion Afion Afion

How We Use Your Personal Information On An Afinion International Ab And Afion International And Afinion Afion Afion AFFINION INTERNATIONAL AB COMPANY PRIVACY AND COOKIES POLICY The privacy and cookies policy sets out how we use any personal information that you give to us, or that we may collect or otherwise process

More information

Context Aware Predictive Analytics: Motivation, Potential, Challenges

Context Aware Predictive Analytics: Motivation, Potential, Challenges Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

COURSE SYLLABUS COURSE TITLE:

COURSE SYLLABUS COURSE TITLE: 1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the

More information

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić Business Intelligence Solutions Cognos BI 8 by Adis Terzić Fairfax, Virginia August, 2008 Table of Content Table of Content... 2 Introduction... 3 Cognos BI 8 Solutions... 3 Cognos 8 Components... 3 Cognos

More information

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence Summary: This note gives some overall high-level introduction to Business Intelligence and

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 FREE echapter C H A P T E R1 Big Data and Analytics Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 percent of the data in the

More information

Executive Dashboard Cookbook

Executive Dashboard Cookbook Executive Dashboard Cookbook Rev: 2011-08-16 Sitecore CMS 6.5 Executive Dashboard Cookbook A Marketers Guide to the Executive Insight Dashboard Table of Contents Chapter 1 Introduction... 3 1.1 Overview...

More information

Google Analytics Audit. Prepared For: Xxxxx

Google Analytics Audit. Prepared For: Xxxxx Google Analytics Audit Prepared For: Xxxxx Please Note: We have edited all images and some text to protect the privacy of our client. 1. General Setup 3 1.1 current analytics tracking code 3 1.2 test purchase

More information

Web Mining in E-Commerce: Pattern Discovery, Issues and Applications

Web Mining in E-Commerce: Pattern Discovery, Issues and Applications Web Mining in E-Commerce: Pattern Discovery, Issues and Applications Ketul B. Patel 1, Jignesh A. Chauhan 2, Jigar D. Patel 3 Acharya Motibhai Patel Institute of Computer Studies Ganpat University, Kherva,

More information