Higher Education Web Information System Usage Analysis with a Data Webhouse
|
|
|
- Anna Jenkins
- 10 years ago
- Views:
Transcription
1 Higher Education Web Information System Usage Analysis with a Data Webhouse Carla Teixeira Lopes 1 and Gabriel David 2 1 ESTSP/FEUP, Portugal [email protected] 2 INESC-Porto/FEUP, Portugal [email protected] Abstract. Usage analysis of a Web Information System is a valuable help to predict user needs, to assess system s impact and to guide to its improvement. This is usually done analysing clickstreams, a low-level approach, with huge amounts of data that calls for data warehouse techniques. This paper presents a dimensional model to monitor user behaviour in Higher Education Web Information Systems and an architecture for the extraction, transformation and load process. These have been applied in the development of a data warehouse to monitor the use of SIGARRA, the University of Porto s Higher Education Web Information System. The efficiency and effectiveness of this monitorization method were confirmed by the knowledge extracted from a 3 month period analysis. A brief description of the main results and recommendations are also described. 1 Introduction The Web is growing in the number of users [12], usage rate [12] and complexity of its sites [5]. The use of this medium as an access interface to organizational Information Systems (IS) and their applications is also frequent. As the experience and expectation of users increases, the need to know and meet user demands becomes more pertinent. Monitoring users behaviour helps to know their needs and allows system adaptation based on their previous behaviours [15]. Besides system adaptation, it also: supports the evaluation of the system against its initial specifications and goals, enables the development of personalization strategies [1, 4, 6], helps increase system s performance [6, 7], supports marketing decisions [3], helps detect business opportunities that otherwise could remain unnoticed [10] and may contribute to increase the system s security [4, 14]. Monitoring the use of Web Information Systems (WIS) involves analysing clickstreams, a data source that aggregates information about all user actions in a website. Log file analyzers, applications that extract data directly from log files and generate several kinds of statistics, are one of the most adopted solutions to monitor WIS usage [11]. However, with this technique it s hard, if not impossible, to obtain the level of analysis that other techniques allow. Log file analyzers lack the ability to integrate and correlate information from M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3983, pp , c Springer-Verlag Berlin Heidelberg 2006
2 Higher Education Web Information System Usage Analysis 79 different sources. They can t, for example, correlate the number of accesses from a student to the web site with the program he is enrolled into. An alternative with more analytic potential, suitable to process large quantities of data (as happens with clickstream data), involves using a data webhouse, this is, a data warehouse that stores clickstreams and other contextual in order to understand user behaviour [8]. In Section 2 a dimensional model suitable to monitor Higher Education Web Information System (HEWIS) is presented. This has been the model used in the data webhouse to monitor the usage of the University of Porto s (UPorto) HEWIS. The architecture and a description of the processes involved in the extraction, transformation and load (ETL) are presented in Section 3.1. In the following section, some of the main results and in Section 3.3 some recommendations are presented. Conclusions and lines of future work are presented in the last section. 2 Dimensional Model Considering the HEWIS scenario, a dimensional model to monitor this specific type of WIS usage has been defined. This process has begun with context analysis, followed by the establishment of the granularity, the definition of the relevant dimensions and facts identification. 2.1 Granularity Not forgetting that dimensional models should be developed with the most atomic information [9], when the business process is associated with very large quantities of information, it is crucial to choose a granularity that is meaningful to the user and that, simultaneously, adds value to the organization s knowledge. Since the main goal of the present data warehouse is the analysis of user behaviour it has been decided to implement a granularity of web pages (see Figure 1) and web sessions (see Figure 2). The web page grain will allow answering questions related to user actions inside sessions, which is not possible with just a session fact table. The web session grain allows greater performance on questions related to WIS sessions. 2.2 Dimensions The model has 12 dimensions that will be described next. The Academic Date, User, Page, Session Type and Institution dimensions are specific to the higher education context. Access Date. The Access Date dimension stores information about the day of the civil calendar day in which the request was made. It only has one hierarchy with four levels: Year, Quarter, Month and Day. Time of Day. To avoid the size of a dimension that saves the time of day for each day of the civil calendar, it has been decided to split time into a new
3 80 C.T. Lopes and G. David Fig. 1. Web Page Fact Table Fig. 2. Web Session Fact Table dimension. This dimension has one hierarchy with three levels: Hour, Minute and Second. It has a record for each second of a day. Academic Date. An academic calendar is usually associated with different structures that differ on the number of modules (semesters, four month periods and trimesters). Each of these structures is a different hierarchy, each with five levels: Year, Module (6, 4 or 3 months), Period (classes or examination period), Week (variable length, defined internally by each institution) and Day. It still has another hierarchy related to academic sessions, which are specific periods, of variable length, in an academic calendar. For example, the University Day (UPorto s anniversary day) is a one day academic session. All vacations are academic sessions. This hierarchy has three levels: Year, Session and Day. The Session level has information about the start and end date of the session, the session type (with classes, without classes, vacancies) and the number of days in the session. User. This is a crucial dimension to the segmentation of users and to behaviour analysis. Accesses can be made by human users (identified or anonymous) or web crawlers. Identified users are students or workers (faculty or staff). Anonymous users are those who access the HEWIS without signing in. Comparatively, with WIS that gather information from online registration forms, HEWIS have the advantage of having more trustworthy information about identified users, as they usually obtain user s information in the student s school registration or in workers act of contract. This dimension saves information about the user s academic degree, age group, gender, civil status, activity status, birthplace, role and department/service. User Machine. The User Machine dimension gathers information about the physical geography (country) and web geography (top level domain, domain) of the machine that generates the web request. It also has information about the machine s location regarding the institution and the university and the access nature (for example: structured network, wireless network). Agent. This dimension keeps information about the agent that has made the request, either a browser used by humans or a crawler.
4 Higher Education Web Information System Usage Analysis 81 Page. This dimension is an obvious one in WIS monitorization context. Although it has been modelled having SIGARRA in mind, it can be easily adapted to other types of HEWIS. It has one hierarchy with four levels: Application, Module, Procedure and Page. An application is an autonomous software artefact with one or more modules. Modules are logical units of the main functionalities and can be seen as a set of related procedures. A procedure generates pages and is the conceptual unit of interaction with the user. The same procedure generates different pages if the received arguments are distinct. For instance, the official pages of department A and department B are both generated by the same procedure. Referrer. This dimension describes the page that has preceded the current access. This information is gathered from log files and is related to the domain of the referrer and the referrer itself: port, procedure (if it belongs to the HEWIS), query (everything that follows the? in an URL), the identification and description of the search engine (if this is the case) and the complete URL. HTTP Status Code. This dimension has the category of the HTTP Status Code (Informational, Success, Redirection, Client Error, Server Error) and the description of the HTTP status code returned in the request. Session Type. Here, web sessions are aggregated into predefined types of sessions. It has one hierarchy with several levels: session context (for example: enrolment in a course), local context (for example: consulting information of a course) and the final state of the session (if its main goal has been achieved). Event Type. This dimension has just one hierarchy with one level and it describes what happened in a page at a specific time (for example: open a page, refresh a page, click a hyperlink, enter data in a form). Institution. Information about academic institution associated with the web request is stored in this dimension. 2.3 Fact Tables Each line in the Page Fact Table (see Figure 1) corresponds to a page served by the HEWIS. The session id degenerate dimension is used to group pages in sessions. The double connection to the User dimension is explained by a SIGARRA s functionality that allows a user to act on behalf of another user (for example, course grades may be inserted by the faculty s secretary). The fact table has 6 measures: page time to serve (number of seconds taken by the web server to process all requests related to this page), page dwell (number of seconds the complete page is visible in user s browser), page hits loaded (number of resources loaded for the presentation of the page), page bytes transferred (sum of the bytes loaded in all the resources related to this page) and page sequence number (the sequence number of this page in the overall session). A line in the Session Fact Table (see Figure 2) records the occurrence of a session in the HEWIS. A session is a set of page accesses, in a single browser session, by the same user, requested in intervals with less than 30 minutes. The double connection to the Page dimension allows the identification of the entry and exit pages of a session. The time related dimensions are associated to
5 82 C.T. Lopes and G. David session s first request. The referrer dimension records the session s first referrer. This fact table measures are: session span (number of seconds between the first request and the complete load of the last request), session time to serve (number of seconds taken to serve all the requests in the session), session dwell (number of seconds of visibility of all the pages in the session), session pages loaded (number of pages in the session), session procedures loaded (number of distinct procedures in the session), session pages to authentication (number of pages until authentication; if there isn t any, this measure equals session pages loaded) and session bytes transferred (number of bytes transferred in this session). 3 SIGARRA Case Study Although SIGARRA is defined at the institution level and is supported by several database and web servers, similarities between the HEWIS s structure in the several institutions and the nature of a data warehouse suggest the adoption of a centralized architecture at the university level for the data webhouse. A prototype of a data webhouse has been built to monitor SIGARRA s usage in UPorto s Engineering Faculty, the institution where it is most used. As SIGARRA uses Oracle as its database management system (DBMS), this was also the underlying DBMS used in the staging area and in the data webhouse. They both co-exist in a single machine, independent of SIGARRA s machines. A three month period of clickstream data has been loaded into a data webhouse with the dimensional model described before. As expected, after the webhouse load, the fact tables are the largest tables (Page fact table has records and Session fact table has records), followed by the Page ( records), Referrer ( records) and User Machine ( records) dimensions. While log files from a 3 months period needed almost sixteen gigabytes of space (15,68 GB), the data webhouse with usage data from the same period needs almost three gigabytes (2,57 GB), a meaningful reduction of 83,6%. 3.1 Extraction, Transformation and Loading The ETL involves getting the data from where it is created and putting it into the data warehouse, where it will be used. The architecture defined for the ETL process has three types of data sources: clickstreams, SIGARRA s database and other sources. The first come from web servers logs. SIGARRA s database is essential to gather information about the institutions, their internal organization (departments, sections, etc.), academic data (academic calendar, academic events, evaluation periods), HEWIS application structure, users and other kind of data (countries, councils, parishes, postal codes, etc.). The last data source includes data such as IP ranges of each type of access (wireless, structured network, etc.), data relating IP addresses with geographical areas, domain names, HTTP status codes, and information on search engines, browsers, crawlers, platforms and operating systems. The extraction phases are the first to occur. At this phase, all data is extracted from its source and is transferred to the staging area with a simple file transfer.
6 Higher Education Web Information System Usage Analysis 83 Then, web servers logs must be joined, parsed and transferred to the staging area. Parsing is done by a Perl script that has a web log file as input and generates a tab-delimited file with several fields and includes host IP address resolution, URL and referrer parsing, search engine, browser, crawler and operating system identification and cookies parsing. After data loading into the staging area, clickstream is processed through PL/SQL, using a relational database. This process involves IP address/country resolution, session, page and user processing. Session and user tracking is based on session cookies, thus it is necessary to overcome the absence of cookies in first requests. A period of 30 minutes of inactivity will lead to a new session as proposed by several authors [5, 2, 13]. A change in the user associated with the session will lead to the same result. Users tracking must also deal with authentications that occur in the middle of a session. Dimensions have been built with information from the tab-delimited file generated by the clickstream parsing and from SIGARRA. Fact tables have been built after dimensions due to the dependencies between them. The webhouse loading is done by copying the data from the staging area to the posting schema. At the end, all records that belong to a closed session are deleted from the staging area. A session is closed if it does not have requests in the last 30 minutes of a day (sessions going on near the end of the day may continue on the following day and must be processed by the next ETL iteration). 3.2 Data Analysis The data analysis process has been made using Structured Query Language (SQL) and On-Line Analytic Processing (OLAP). Due to the star structure of the dimensional model, the queries were simple and had a good performance. Data analysis led to a detailed characterisation of SIGARRA s usage in several categories according to the main user types (students, faculty, staff, anonymous users, crawlers). Some of the results will be described next. Time Related Analysis of Sessions and Pages. The average number of sessions by day is and its distribution by user type is as presented in Figure 3. Excluding crawlers, the average session time span is 10,89 minutes, being the staff s sessions the longest ones (Figure 4). The average number of pages accessed by day is and its distribution by user type is as presented in Figure 5. Excluding crawler s sessions, the average number of pages by session is 7,7. Session Referrers. In the overall sessions, 79,23% were direct entries and 15,50% had origin in search engines, being Google the most used (99,8% of all search engines sessions). User Machines. There were 155 distinct access countries. Staff and faculty users access mainly from inside the institution and anonymous users from outside (Figure 6). Inside institution, most accesses are from the structured network.
7 84 C.T. Lopes and G. David Fig. 3. Distribution of sessions by user type Fig. 4. Session span in minutes by user type Fig. 5. Distribution of pages by user type Fig. 6. Access type by user type Fig. 7. Platforms used by user type Fig. 8. Browsers used by user type Access Agent. Windows is the most used platform. As it can be seen in Figure 7, faculty also use Unix and Macintosh platforms, although in a much smaller scale. As can be seen in Figure 8, the most used browsers are MSIE (88,10%) and Firefox (7,16%). Firefox use is growing and the inverse is happening with MSIE. The crawler with more sessions and pages requested is Googlebot (47,86% of all crawler s sessions and 86,89% of all crawler s requests for pages). Number of Sessions by User Profile. There were distinct users in the analysed period, were students, 416 faculty and 252 staff users. The number of sessions is higher in users with less than 20 years, in undergraduate programme s students and particularly in the first curricular years of undergraduate programmes.
8 Higher Education Web Information System Usage Analysis 85 HEWIS Navigation. Student, programme and institution modules are the most used ones. Students and anonymous users have similar preferences in pages viewed. Two of the main entry pages are: Dynamic Mail Files (due to following hyperlinks to files in dynamic received) and Computer labs first page (because it is loaded in the background of every lab s computer). Authentication is mainly done in home page s authentication area (64,59% of all authentications) and the main underlying motivations are access to Dynamic Mail Files, Legislation, Summaries and Courses. The Help module is mainly used by anonymous users. Specific Pages Usage. Home page connections most used are: authentication, search and programmes. The two undergraduate programmes most viewed by anonymous users are: Electrical and Computers Engineering and Informatics and Computing Engineering and the two master programmes most viewed by anonymous users are: MsC in Informatics Engineering and MsC in Information Management. The main searchs are related to students, staff and courses. 3.3 Recommendations The analysis has allowed the detection of some unusual access patterns: too long anonymous user s sessions (about pages requested over 3 days) with a name of an institution s machine; abnormally large processing time in a specific day of the period analysed. It has also allowed the production of some improvement recommendations. It should be created a direct connection to study plans of each programme in the programmes lists (due to the frequent path: programme list / programme page / study plan / course page, that suggests the course page is distant from the home page). The initial page of the student s module should be used to provide information and communicate with students (this is the 6th page most viewed, specially by anonymous users and students). Help module usage by faculty should be stimulated (these users rarely uses this module, preferring the phone). The complexity and usability of the pages from where an higher access to the help occurs should be analysed. Marketing strategies to promote the programmes with less page views should be developed. In order to minimise changes and 404 type error code (Not Found), it should be used URL independent from the underlying technology (79,12% of 404 error code are direct entries in the HEWIS which suggests the use of bookmarks with broken links due to URL changes). The procedures where the 505 (Internal Server Error) error code has most occurred should be analysed. 4 Conclusions and Future Work Data webhouse systems are here presented as a solution to monitor the use of Higher Education Web Information Systems (HEWIS). This paper pretends to enlarge the application and study of webhousing systems to the academic context. Despite the similarities between Web Information Systems, there are differences between HEWIS and e-commerce sites, being the last frequently used to
9 86 C.T. Lopes and G. David exemplifications and instantiations of data webhouses. While HEWIS pretends to archive and register higher education activity and has features adapted to this scope, in e-commerce sites the main intent is to sell products and has a well defined set of procedures is available (add to shopping cart, insert payment information, etc.). As they have different goals and scopes, the relevant information is different (for example: the Academic Date dimension doesn t make sense in an e-commerce site webhouse and its very important in an HEWIS webhouse), what justifies a different dimensional model. It was described a dimensional model to monitor HEWIS s usage. This model has been implemented in a data webhouse prototype to monitor UPorto s HEWIS usage. In the development of this prototype it has been defined an extraction, transformation and loading architecture that, with adaptations to specific data sources, can be used in similar contexts. The prototype developed proved the usefulness of data webhouses to WIS and more specifically to HEWIS. It allowed the generation of knowledge on SIGARRA s user behaviour, the detection of abnormal situations and the definition of a set of recommendations. It was also possible to verify that there is a significant reduction in the amount of disk space required to store web usage data, what stimulates the storage of web usage data in a dimensional model. On the other hand, it has demonstrated the analytic flexibility of data webhouses, an advantage when compared to other monitorization techniques. Also, it showed that queries executed on a star dimensional model with a meaningful amount of data have a good performance. Future work involves applying data mining techniques that allows user clustering based on navigation paths or preferences, navigation patterns discovery, detection of set of pages that have more probability of being together in the same session and user classification based on predefined parameters. References 1. Jesper Andersen, Anders Giversen, Allan H. Jensen, Rune S. Larsen, Torben Bach Pedersen, and Janne Skyt. Analyzing Clickstreams Using Subsessions. In Proceedings of the 3rd ACM International Workshop on Data Warehousing and OLAP, pages ACM Press, Bettina Berendt and Myra Spiliopoulou. Analysis of Navigation Behaviour in Web Sites Integrating Multiple Information Systems. The VLDB Journal, 9(1):56 75, M. S. Chen, J. S. Park, and P. S. Yu. Data Mining for Path Traversal Patterns in a Web Environment. In Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS 96), page 385. IEEE Computer Society, Robert Cooley. The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns. ACM Trans. Inter. Tech., 3(2):93 116, Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems, 1(2), 1999.
10 Higher Education Web Information System Usage Analysis Magdalini Eirinaki and Michalis Vazirgiannis. Web Mining for Web Personalization. ACM Trans. Inter. Tech., 3(1):1 27, Karuna P. Joshi, Anupam Joshi, Yelena Yesha, and Raghu Krishnapuram. Warehousing and Mining Web Logs. In Proceedings of the Second International Workshop on Web Information and Data Management, pages ACM Press, Ralph Kimball and Richard Merz. The Data Webhouse Toolkit. John Wiley & Sons, Inc., Ralph Kimball, Laura Reeves, Margy Ross, and Warren Thornthwaite. The Data Warehouse Lifecycle Toolkit. John Wiley & Sons, Inc., Ron Kohavi. Mining e-commerce Data: The Good, The Bad, and The Ugly. In Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, Richard Li and Jon Salz. Clickstream Data Warehousing. ArsDigita Systems Journal, Available from: [cited ]. 12. Brij M. Masand, Myra Spiliopoulou, Jaideep Srivastava, and Osmar R. Zaiane. WEBKDD 2002: Web Mining for Usage Patterns & Profiles. SIGKDD Explor. Newsl., 4(2): , Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explor. Newsl., 1(2):12 23, Mark Sweiger, Mark R. Madsen, Jimmy Langston, and Howard Lombard. Clickstream Data Warehousing. John Wiley & Sons, Inc., Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, and Umeshwar Dayal. From User Access Patterns to Dynamic Hypertext Linking. Computer Networks ISDN System, 28(7-11): , 1996.
The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING
The Data Webhouse Toolkit Building the Web-Enabled Data Warehouse Ralph Kimball Richard Merz WILEY COMPUTER PUBLISHING John Wiley & Sons, Inc. New York Chichester Weinheim Brisbane Singapore Toronto Contents
MINING CLICKSTREAM-BASED DATA CUBES
MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt
A Cube Model for Web Access Sessions and Cluster Analysis
A Cube Model for Web Access Sessions and Cluster Analysis Zhexue Huang, Joe Ng, David W. Cheung E-Business Technology Institute The University of Hong Kong jhuang,kkng,[email protected] Michael K. Ng,
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Ms.Dipa Dixit 1 Mr Jayant Gadge 2 Lecturer 1 Asst.Professor 2 Fr CRIT, Vashi Navi Mumbai 1 Thadomal Shahani Engineering College,Bandra 2
An Effective Analysis of Weblog Files to improve Website Performance
An Effective Analysis of Weblog Files to improve Website Performance 1 T.Revathi, 2 M.Praveen Kumar, 3 R.Ravindra Babu, 4 Md.Khaleelur Rahaman, 5 B.Aditya Reddy Department of Information Technology, KL
SENG 520, Experience with a high-level programming language. (304) 579-7726, [email protected]
Course : Semester : Course Format And Credit hours : Prerequisites : Data Warehousing and Business Intelligence Summer (Odd Years) online 3 hr Credit SENG 520, Experience with a high-level programming
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
Data Preprocessing and Easy Access Retrieval of Data through Data Ware House
Data Preprocessing and Easy Access Retrieval of Data through Data Ware House Suneetha K.R, Dr. R. Krishnamoorthi Abstract-The World Wide Web (WWW) provides a simple yet effective media for users to search,
PREPROCESSING OF WEB LOGS
PREPROCESSING OF WEB LOGS Ms. Dipa Dixit Lecturer Fr.CRIT, Vashi Abstract-Today s real world databases are highly susceptible to noisy, missing and inconsistent data due to their typically huge size data
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing
A Framework for Developing the Web-based Integration Tool for Web-Oriented Warehousing PATRAVADEE VONGSUMEDH School of Science and Technology Bangkok University Rama IV road, Klong-Toey, BKK, 10110, THAILAND
Analyzing Clickstreams Using Subsessions
Analyzing Clickstreams Using Subsessions Jesper Andersen Rune S. Larsen Department of Computer Science, Aalborg University spawn,giversen,skyt @cs.auc.dk Anders Giversen Torben Bach Pedersen analyze.dk,
An Overview of Preprocessing on Web Log Data for Web Usage Analysis
International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-2, Issue-4, March 2013 An Overview of Preprocessing on Web Log Data for Web Usage Analysis Naga
Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination
8 Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination Ketul B. Patel 1, Dr. A.R. Patel 2, Natvar S. Patel 3 1 Research Scholar, Hemchandracharya North Gujarat University,
A Survey on Web Mining From Web Server Log
A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering
AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING
AN EFFIIENT APPROAH TO PERFORM PRE-PROESSING S. Prince Mary Research Scholar, Sathyabama University, hennai- 119 [email protected] E. Baburaj Department of omputer Science & Engineering, Sun Engineering
Business Intelligence in E-Learning
Business Intelligence in E-Learning (Case Study of Iran University of Science and Technology) Mohammad Hassan Falakmasir 1, Jafar Habibi 2, Shahrouz Moaven 1, Hassan Abolhassani 2 Department of Computer
Web Mining Techniques in E-Commerce Applications
Web Mining Techniques in E-Commerce Applications Ahmad Tasnim Siddiqui College of Computers and Information Technology Taif University Taif, Kingdom of Saudi Arabia Sultan Aljahdali College of Computers
Methodology Framework for Analysis and Design of Business Intelligence Systems
Applied Mathematical Sciences, Vol. 7, 2013, no. 31, 1523-1528 HIKARI Ltd, www.m-hikari.com Methodology Framework for Analysis and Design of Business Intelligence Systems Martin Závodný Department of Information
WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques
From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques Howard J. Hamilton, Xuewei Wang, and Y.Y. Yao
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Length: Delivery Method: 3 Days Instructor-led (classroom) About this Course Elements of this syllabus are subject
Cloud Mining: Web usage mining and user behavior analysis using fuzzy C-means clustering
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 7, Issue 2 (Nov-Dec. 2012), PP 09-15 Cloud Mining: Web usage mining and user behavior analysis using fuzzy C-means
LEARNING SOLUTIONS website milner.com/learning email [email protected] phone 800 875 5042
Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300
Designing a Dimensional Model
Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and
Data Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
QAD Business Intelligence Data Warehouse Demonstration Guide. May 2015 BI 3.11
QAD Business Intelligence Data Warehouse Demonstration Guide May 2015 BI 3.11 Overview This demonstration focuses on the foundation of QAD Business Intelligence the Data Warehouse and shows how this functionality
COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server
Implementing a Data Warehouse with Microsoft SQL Server
Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL
Deductive Data Warehouses and Aggregate (Derived) Tables
Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,
Web Usage mining framework for Data Cleaning and IP address Identification
Web Usage mining framework for Data Cleaning and IP address Identification Priyanka Verma The IIS University, Jaipur Dr. Nishtha Kesswani Central University of Rajasthan, Bandra Sindri, Kishangarh Abstract
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 [email protected]
University Data Warehouse Design Issues: A Case Study
Session 2358 University Data Warehouse Design Issues: A Case Study Melissa C. Lin Chief Information Office, University of Florida Abstract A discussion of the design and modeling issues associated with
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Upon successful completion of this course, a student will meet the following outcomes:
College of San Mateo Official Course Outline 1. COURSE ID: CIS 364 TITLE: Enterprise Data Warehousing Semester Units/Hours: 4.0 units; a minimum of 48.0 lecture hours/semester; a minimum of 48.0 lab hours/semester
University of Gaziantep, Department of Business Administration
University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
Web Log Data Sparsity Analysis and Performance Evaluation for OLAP
Web Log Data Sparsity Analysis and Performance Evaluation for OLAP Ji-Hyun Kim, Hwan-Seung Yong Department of Computer Science and Engineering Ewha Womans University 11-1 Daehyun-dong, Seodaemun-gu, Seoul,
KOINOTITES: A Web Usage Mining Tool for Personalization
KOINOTITES: A Web Usage Mining Tool for Personalization Dimitrios Pierrakos Inst. of Informatics and Telecommunications, [email protected] Georgios Paliouras Inst. of Informatics and Telecommunications,
BW-EML SAP Standard Application Benchmark
BW-EML SAP Standard Application Benchmark Heiko Gerwens and Tobias Kutning (&) SAP SE, Walldorf, Germany [email protected] Abstract. The focus of this presentation is on the latest addition to the
Microsoft Data Warehouse in Depth
Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition
An Enhanced Framework For Performing Pre- Processing On Web Server Logs
An Enhanced Framework For Performing Pre- Processing On Web Server Logs T.Subha Mastan Rao #1, P.Siva Durga Bhavani #2, M.Revathi #3, N.Kiran Kumar #4,V.Sara #5 # Department of information science and
Implementing a Data Warehouse with Microsoft SQL Server
This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and
Identifying User Behavior by Analyzing Web Server Access Log File
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 327 Identifying User Behavior by Analyzing Web Server Access Log File K. R. Suneetha, Dr. R. Krishnamoorthi,
Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998
1 of 9 5/24/02 3:47 PM Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M. Firestone, Ph.D. White Paper No. Eight June 22, 1998 Introduction Dimensional Modeling (DM) is a favorite
A Sun Javafx Based Data Analysis Tool for Real Time Web Usage Mining
A Sun Javafx Based Data Analysis Tool for Real Time Web Usage Mining Kiran Patidar Department of Computer Engineering Padmashree Dr.D.Y. Patil Institute of Engineering And Technology Pimpri,Pune Abstract-
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Description: This five-day instructor-led course teaches students how to design and implement a BI infrastructure. The
Importance of Domain Knowledge in Web Recommender Systems
Importance of Domain Knowledge in Web Recommender Systems Saloni Aggarwal Student UIET, Panjab University Chandigarh, India Veenu Mangat Assistant Professor UIET, Panjab University Chandigarh, India ABSTRACT
Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Course
Web Analytics: Enhancing Customer Relationship Management
Web Analytics: Enhancing Customer Relationship Management Nabil Alghalith Truman State University The Web is an enormous source of information. However, due to the disparate authorship of web pages, this
Fluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
Why Google Analytics Cannot Be Used For Educational Web Content
Why Google Analytics Cannot Be Used For Educational Web Content Sanda-Maria Dragoş Chair of Computer Systems, Department of Computer Science Faculty of Mathematics and Computer Science Babes-Bolyai University
Pre-Processing: Procedure on Web Log File for Web Usage Mining
Pre-Processing: Procedure on Web Log File for Web Usage Mining Shaily Langhnoja 1, Mehul Barot 2, Darshak Mehta 3 1 Student M.E.(C.E.), L.D.R.P. ITR, Gandhinagar, India 2 Asst.Professor, C.E. Dept., L.D.R.P.
ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING
ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING L.K. Joshila Grace 1, V.Maheswari 2, Dhinaharan Nagamalai 3, 1 Research Scholar, Department of Computer Science and Engineering [email protected]
Web Mining as a Tool for Understanding Online Learning
Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA [email protected] James Laffey University of Missouri Columbia Columbia, MO USA [email protected]
A Hybrid Approach To Web Usage Mining
A Hybrid Approach To Web Usage Mining Authors: Søren E. Jespersen Jesper Thorhauge Torben Bach Pedersen Technical Report 02-5002 Department of Computer Science Aalborg University Created on July 17, 2002
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager
Rational Reporting Module 3: IBM Rational Insight and IBM Cognos Data Manager 1 Copyright IBM Corporation 2012 What s next? Module 1: RRDI and IBM Rational Insight Introduction Module 2: IBM Rational Insight
www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28
Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT
Microsoft SQL Server Installation Guide
Microsoft SQL Server Installation Guide Version 3.0 For SQL Server 2014 Developer & 2012 Express October 2014 Copyright 2010 2014 Robert Schudy, Warren Mansur and Jack Polnar Permission granted for any
Implementing a Data Warehouse with Microsoft SQL Server MOC 20463
Implementing a Data Warehouse with Microsoft SQL Server MOC 20463 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing
COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER MODULE 1: INTRODUCTION TO DATA WAREHOUSING This module provides an introduction to the key components of a data warehousing
Data Warehousing. Jens Teubner, TU Dortmund [email protected]. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1
Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview
Microsoft SQL Server Installation Guide
Microsoft SQL Server Installation Guide Version 2.1 For SQL Server 2012 January 2013 Copyright 2010 2013 Robert Schudy, Warren Mansur and Jack Polnar Permission granted for any use of Boston University
Implementing a Data Warehouse with Microsoft SQL Server 2014
Implementing a Data Warehouse with Microsoft SQL Server 2014 MOC 20463 Duración: 25 horas Introducción This course describes how to implement a data warehouse platform to support a BI solution. Students
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Designing Business Intelligence Solutions with Microsoft SQL Server 2012
Implementing a Data Warehouse with Microsoft SQL Server 2012
Implementing a Data Warehouse with Microsoft SQL Server 2012 Module 1: Introduction to Data Warehousing Describe data warehouse concepts and architecture considerations Considerations for a Data Warehouse
CHAPTER 4 Data Warehouse Architecture
CHAPTER 4 Data Warehouse Architecture 4.1 Data Warehouse Architecture 4.2 Three-tier data warehouse architecture 4.3 Types of OLAP servers: ROLAP versus MOLAP versus HOLAP 4.4 Further development of Data
Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda
Data warehouses 1/36 Agenda Why do I need a data warehouse? ETL systems Real-Time Data Warehousing Open problems 2/36 1 Why do I need a data warehouse? Why do I need a data warehouse? Maybe you do not
Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining
Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining Jaswinder Kaur #1, Dr. Kanwal Garg #2 #1 Ph.D. Scholar, Department of Computer Science & Applications Kurukshetra University,
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
Web Log Mining: A Study of User Sessions
UNIVERSITY OF PADUA Department of Information Engineering PersDL 2007 10th DELOS Thematic Workshop on Personalized Access, Profile Management, and Context Awareness in Digital Libraries Corfu, Greece,
Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University
Data Warehousing and Decision Support Torben Bach Pedersen Department of Computer Science Aalborg University Talk Overview Data warehousing and decision support basics Definition Applications Multidimensional
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Data Testing on Business Intelligence & Data Warehouse Projects
Data Testing on Business Intelligence & Data Warehouse Projects Karen N. Johnson 1 Construct of a Data Warehouse A brief look at core components of a warehouse. From the left, these three boxes represent
Presented by: Jose Chinchilla, MCITP
Presented by: Jose Chinchilla, MCITP Jose Chinchilla MCITP: Database Administrator, SQL Server 2008 MCITP: Business Intelligence SQL Server 2008 Customers & Partners Current Positions: President, Agile
Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher
Web Usage Mining from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher,
Analysis of Server Log by Web Usage Mining for Website Improvement
IJCSI International Journal of Computer Science Issues, Vol., Issue 4, 8, July 2010 1 Analysis of Server Log by Web Usage Mining for Website Improvement Navin Kumar Tyagi 1, A. K. Solanki 2 and Manoj Wadhwa
Chapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
Advanced Preprocessing using Distinct User Identification in web log usage data
Advanced Preprocessing using Distinct User Identification in web log usage data Sheetal A. Raiyani 1, Shailendra Jain 2, Ashwin G. Raiyani 3 Department of CSE (Software System), Technocrats Institute of
Web Usage Mining for a Better Web-Based Learning Environment
Web Usage Mining for a Better Web-Based Learning Environment Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, Alberta, Canada email: zaianecs.ualberta.ca ABSTRACT Web-based
