User Profile Refinement using explicit User Interest Modeling



Similar documents
User Profile Derivation and Network Analysis

Personalization of Web Search With Protected Privacy

Facilitating Business Process Discovery using Analysis

A Survey on Web Mining From Web Server Log

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval

Multiobjective Decision Support for defining Secure Business Processes

A Survey on Preprocessing of Web Log File in Web Usage Mining to Improve the Quality of Data

Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham

Exploitation of Server Log Files of User Behavior in Order to Inform Administrator

Semantically Enhanced Web Personalization Approaches and Techniques

Semantic Search in Portals using Ontologies

Research and Development of Data Preprocessing in Web Usage Mining

EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION

Web Page Prediction System Based on Web Logs and Web Domain Using Cluster Technique

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

Ontology for Home Energy Management Domain

A SURVEY ON WEB MINING TOOLS

PREPROCESSING OF WEB LOGS

Lightweight Data Integration using the WebComposition Data Grid Service

Does it fit? KOS evaluation using the ICE-Map Visualization.

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

A Mind Map Based Framework for Automated Software Log File Analysis

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Managing and Tracing the Traversal of Process Clouds with Templates, Agendas and Artifacts

Extending a Web Browser with Client-Side Mining

EDIminer: A Toolset for Process Mining from EDI Messages

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

María Elena Alvarado gnoss.com* Susana López-Sola gnoss.com*

Data Preprocessing and Easy Access Retrieval of Data through Data Ware House

Bisecting K-Means for Clustering Web Log data

Copyright EPiServer AB

A HYBRID RULE BASED FUZZY-NEURAL EXPERT SYSTEM FOR PASSIVE NETWORK MONITORING

A typology of ontology-based semantic measures

Web Mining as a Tool for Understanding Online Learning

Developing a Distributed Reasoner for the Semantic Web

Master Data Services Training Guide. Modeling Guidelines. Portions developed by Profisee Group, Inc Microsoft

A SEMANTICALLY ENRICHED WEB USAGE BASED RECOMMENDATION MODEL

Spam Detection Using Customized SimHash Function

World Wide Web Personalization

Viewpoint ediscovery Services

A Cube Model for Web Access Sessions and Cluster Analysis

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

Importance of Domain Knowledge in Web Recommender Systems

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

A Semantically Enriched Competency Management System to Support the Analysis of a Web-based Research Network

Context Capture in Software Development

Visualizing e-government Portal and Its Performance in WEBVS

KOINOTITES: A Web Usage Mining Tool for Personalization

The Value of Taxonomy Management Research Results

Using Subject- and Object-specific Attributes for Access Control in Web-based Knowledge Management Systems

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

FRAMEWORK FOR WEB PERSONALIZATION USING WEB MINING

USING SEMANTIC WEB MINING TECHNOLOGIES FOR PERSONALIZED E-LEARNING EXPERIENCES

ViewerPro enables traders to automatically capture the impact of news on their trading portfolios

Web Mining Functions in an Academic Search Application

Advanced Preprocessing using Distinct User Identification in web log usage data

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

DESIGNING AND MINING WEB APPLICATIONS: A CONCEPTUAL MODELING APPROACH

Recommendations on Web Page Using Domain Knowledge and Web Usage Mining For Personalization

Improving Traceability of Requirements Through Qualitative Data Analysis

Enterprise 2.0 in Action: Potentials for Improvement of Awareness Support in Enterprises

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

A Road map to More Effective Web Personalization: Integrating Domain Knowledge with Web Usage Mining

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context

Comparing Instances of the Ontological Concepts

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Configuration Management Models in Commercial Environments

Indirect Positive and Negative Association Rules in Web Usage Mining

Chapter 8 The Enhanced Entity- Relationship (EER) Model

SIP Service Providers and The Spam Problem

Administration of Access Control in Information Systems Using URBAC Model

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Chapter 5. Regression Testing of Web-Components

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

An Ontology Framework based on Web Usage Mining

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

DEVELOPMENT OF WEB USAGE MINING TOOLS TO ANALYZE THE WEB SERVER LOGS USING ARTIFICIAL INTELLIGENCE TECHNIQUES

SAC 2015 Tutorial Proposal Software Reuse and Reusability Involving Requirements, Product Lines, and Semantic Service Specifications

Does Artificial Tutoring foster Inquiry Based Learning?

Persona: A Contextualized and Personalized Web Search

Linking BPMN, ArchiMate, and BWW: Perfect Match for Complete and Lawful Business Process Models?

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

SPATIAL DATA CLASSIFICATION AND DATA MINING

AWERProcedia Information Technology & Computer Science

Application of ontologies for the integration of network monitoring platforms

An Effective Analysis of Weblog Files to improve Website Performance

2 AIMS: an Agent-based Intelligent Tool for Informational Support

Development/Maintenance/Reuse: Software Evolution in Product Lines

Self Organizing Maps for Visualization of Categories

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

Analysis of Data Mining Concepts in Higher Education with Needs to Najran University

Search and Information Retrieval

Data Refinery with Big Data Aspects

Mining Web Access Logs of an On-line Newspaper

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Transcription:

User Profile Refinement using explicit User Interest Modeling Gerald Stermsek, Mark Strembeck, Gustaf Neumann Institute of Information Systems and New Media Vienna University of Economics and BA Austria {firstname.lastname}@wu-wien.ac.at Abstract: In this paper, we present an approach to refine s that were derived from Web server logs in an automated procedure. In most application scenarios, such automatically derived profiles can only deliver a preliminary result and require human interaction for further refinement. We describe the individual steps to enhance and refine derived s which can be used for personalization purposes (e.g. information filtering). In particular, the user can choose to refine the profile manually or use supporting techniques, such as ontologies, that assist him in the refinement process. In addition to information included in automatically derived profiles, the user thus explicitly provides information to refine his profile. 1 Introduction The constantly growing information supply in Internet-based information systems poses high demands on concepts and technologies to support users in filtering relevant information. Nevertheless, not every user may be willing to define his from scratch as this can be a complex and time consuming task. Therefore, the first phase of our approach derives a preliminary. This first phase is introduced in [SSN07]. In particular, the first phase is based on log-file analysis using descriptive statistics and network analysis methods. Note that we keep users informed about the data we gather via P3P policies (cf. [CLM02]), and we only derive profiles for users who agree with these policies (for details see [SSN07]). An automatically derived profile can then be seen as preliminary profile that covers a user s interests but needs to be further refined and elaborated. The user, thus, has to review the preliminary to make sure that it represents his interests. In this paper, we now focus on the refinement of a derived profile. In particular, we discuss an approach to adapt the preliminary in order to define a more sophisticated user profile which better fits the user s information needs. The remainder of this paper is structured as follows. Section 2 gives on overview of our approach for definition. In Section 2.1 the extension of profiles is discussed and Section 2.2 explains the refinement process. We briefly discuss related work in Section 3. Section 4 concludes the paper. 2 Approach Overview In general, the s that we derive from Web server logs (see [SSN07]) provide the following information: Categories: Categories represent user interests and are derived from meta-data provided along with the Web pages the user visited.

Structural Information: If structural information is available we use to this information to derive relationships between categories. A simple example of a derived is shown in Figure 1. In this example, the automatically derived indicates that the respective user is interested in. A hierarchy structure of interest categories is not mandatory, though. If no structural information is available the results in a simple list of categories. However, depending on the context of the Information Filtering system a hierarchy structure of interest categories may be used for weighting purposes in the information filtering process (see e.g. [SWM02]). liverpool Figure 1: Example of a derived (preliminary) A high-level view of our approach is shown in Figure 2. The first two steps have already been elaborated in [SSN07] and, thus, are printed with dashed borders in Figure 2. In the following Sections we now describe the subsequent steps of our approach. Identify user Derive preliminary Extend Refine Check consistency [consistent] Figure 2: High level view of the user profiling approach 2.1 Extend The process to extend the is depicted in Figure 3. At first the preliminary derived has to be fetched and presented to the user. Afterwards, the user has four possibilities to alter the : Predefined categories: With this option the user is offered a list of predefined categories which he can add to his. This list is typically domain-dependent. Manually: Another option is to allow the user to add arbitrary user-defined categories to his. This may not be suitable for all users and all domains but allows for a freely customizable. External source: Additional user interests can also be imported from an external source. A user can, for example, import filtering keywords of an already configured news aggregator and add them as categories to his.

Remove: The user also has the possibility to remove interests from his if they do not (longer) represent his user interests. Get preliminary [predefined] [remove] [manually] [external] Fetch list of predefined categories Fetch user interests from external source Add user suggested interests to Add external interests to Remove interests from Add interests to Figure 3: Sub-process to extend s The result of this process is then used as input for the next step of user profiling. 2.2 Refine In this step of the proposed user profiling refinement approach, the user can finalize his. This can, again, be done manually or automatically. The corresponding process is depicted in Figure 4. Manual refinement: In this case the user has the possibility to refine the current to fit his needs. To do this he can add or remove explicit relationships between categories. When adding an explicit relationship the user has to indicate the related terms and define them as related. Automatical refinement: If the user chooses not to define structural relationships manually he may use an ontology-assisted approach, for example. In this case, the user then has to select an appropriate domain ontology or, if not available, use a general purpose ontology (e.g. WordNet [Fe98]). This ontology then serves as a basis to derive term relationships. In [BH06] different measures of semantic relatedness are discussed which can be used to structure interest categories as graphs. If the user is not satisfied with the automatically derived term relationships he may further refine them manually, of course. The two steps of adding interest categories and modifying the hierarchy structure can be iterated until the user is satisfied with the. Finally a brief error check of the current user profile is conducted (cf. Figure 2). This includes spell checking and the indication of duplicates. As individual s may be very specific we suggest to just indicate spelling errors and duplicates rather than correcting them automatically. The user then can decide on how to proceed on these issues. Figure 5 depicts our example from Figure 1 after the refinement process. As can be seen the user removed the category liverpool which was derived from his log file entries. In our

Get refined Select related terms Define explicit relationship [manually] [use ontology] Select domain ontology Derive term relationship [manually refine] Figure 4: Sub-process to refine s example, this was just an accidental hit and the user has no long-term interest in Liverpool. Instead, he added a new category mario gomez and defined an explicit relationship between mario gomez and. The user also added another interest category named wc2010 and defined a relationship with, expressing his interest in the forthcoming world cup. liverpool mario gomez wc2010 Figure 5: Example of a refined As mentioned above, defining a hierarchy structure is not mandadory but may revalue the user profile. A possibility to use the hierarchy structure of interest categories is to use categories from different levels to filter different information streams. An information system may, for example, use interest categories near the root category to select an appropriate RSS feed [RS06] for a respective user (e.g. sport news) and categories from the leaf nodes to filter information within this RSS feed (e.g. mario gomez,, wc2010). 3 Related work Web usage mining (WUM) is the application of data mining techniques to large Web data repositories, some algorithms commonly used include association rule mining, sequential pattern generation, and clustering (see, e.g., [CMS99]). WUM produces aggregated results to better understand Web usage and improve the service provided to the customer (cf. [FSS00]). In contrast, our approach concentrates on data mining at the level of individual user data and produces nonaggregated results which can be used for the purpose of personalization, e.g. to form s for information filtering. Ontology-based user profiling [GCP03] uses ontologies to represent user interests via concept hierarchies. Compared to other concepts of representation using ontologies means a semantic revaluation (cf. [GCP03]). However, general-purpose ontologies with a high number of concepts are often not appropriate for profiling a single (cf. [GA05]). Ontologies often represent the shared knowledge of either a particular community or a group of users and therefore they may fail to capture an individual user s specific understanding of a domain [GA05]. In [HK04] Holland and Kießling present an approach for mining user preferences from user log

data using strict partial order preferences. Eliciting complex user preferences from simple Web server logs may be diffucult. Holland and Kießling therefore suggest to use application server logs as they are a better source for user preferences. The refinement process presented in this paper can be applied to the approach of [HK04] as well. 4 Conclusion and Future Work In this paper, we presented an approach to extend and refine automatically derived s. Our approach benefits from the combination of automatic and manual user profiling. Automatically deriving a first version of a relieves the user from the complex and time consuming task to define his from scratch. This enables the user to better concentrate on the refinement process. We have already elaborated scripts to preprocess Web server log files and to automatically derive preliminary s using the R software environment (see [SSN07]). The approach presented in this paper results in more elaborated s which better fit the user s interests. The user can refine the profile manually or use supporting techniques, such as ontologies, that assist him in the refinement process. We are currently building a graphical tool that supports the presented refinement approach. References [BH06] Budanitsky, A. and Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics. 32(1):13 47. 2006. [CLM02] Cranor, L., Langheinrich, M., and Marchiori, M. The Platform for Privacy Preferences 1.0 (P3P1.0) Specification, W3C Recommendation. April 2002. [CMS99] Cooley, R., Mobasher, B., and Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems. 1(1):5 32. 1999. [Fe98] [FSS00] [GA05] [GCP03] [HK04] [RS06] [SSN07] Fellbaum, C. (Hrsg.): WordNet: An Electronic Lexical Database. The MIT Press. Cambridge, MA, USA. 1998. Fu, Y., Sandhu, K., and Shih, M.-Y.: A Generalization-Based Approach to Clustering of Web Usage Sessions. In: WEBKDD 99: Revised Papers from the International Workshop on Web Usage Analysis and User Profiling. S. 21 38. London, UK. 2000. Springer-Verlag. Godoy, D. and Amandi, A.: User profiling for web page filtering. IEEE Internet Computing. 9(4):56 64. 2005. Gauch, S., Chaffee, J., and Pretschner, A.: Ontology-based personalized search and browsing. Web Intelligence and Agent System. 1(3-4):219 234. 2003. Holland, S. and Kießling, W.: User Preference Mining Techniques for Personalized Applications. Wirtschaftsinformatik. 46(6):439 445. 2004. RSS Advisory Board. RSS 2.0 Specification (2.0.8). August 2006. http://www.rssboard.org/rssspecification. Stermsek, G., Strembeck, M., and Neumann, G.: A User Profile Derivation Approach based on Log-File Analysis. In: Proc. of the International Conference on Information and Knowledge Engineering. June 2007. [SWM02] Shepherd, M., Watters, C., and Marath, A.: Adaptive user modeling for filtering electronic news. In: Proc. of the 35th Annual Hawaii International Conference on System Sciences (HICSS). S. 1180 1188. 2002.