Grids, Logs, and the Resource Description Framework



Similar documents
RDF Resource Description Framework

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

DISCOVERING RESUME INFORMATION USING LINKED DATA

A Semantic web approach for e-learning platforms

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Semantic Interoperability

Evaluating Semantic Web Service Tools using the SEALS platform

Towards a Semantic Wiki Wiki Web

How semantic technology can help you do more with production data. Doing more with production data

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

Object Database on Top of the Semantic Web

OSLC Primer Learning the concepts of OSLC

A Survey Study on Monitoring Service for Grid

OWL based XML Data Integration

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim

Semantic Web Applications

An Ontology-based e-learning System for Network Security

Lightweight Data Integration using the WebComposition Data Grid Service

OWL Ontology Translation for the Semantic Web

A generic approach for data integration using RDF, OWL and XML

On the Standardization of Semantic Web Services-based Network Monitoring Operations

12 The Semantic Web and RDF

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

Introduction to programming

THE SEMANTIC WEB AND IT`S APPLICATIONS

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Network Graph Databases, RDF, SPARQL, and SNA

Secure Semantic Web Service Using SAML

eservices for Hospital Equipment

Designing a Semantic Repository

Intelligent interoperable application for employment exchange system using ontology

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Semantic Web Technology: The Foundation For Future Enterprise Systems

Defining a benchmark suite for evaluating the import of OWL Lite ontologies

Introduction to Testing Webservices

A collaborative platform for knowledge management

Slogger: A Profiling and Analysis System based on Semantic Web Technologies

We have big data, but we need big knowledge

RDF Support in Oracle Oracle USA Inc.

Object-Process Methodology as a basis for the Visual Semantic Web

Data-Gov Wiki: Towards Linked Government Data

From Atom's to OWL ' s: The new ecology of the WWW

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Graph Database Performance: An Oracle Perspective

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski

MD Link Integration MDI Solutions Limited

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

Position Paper: Validation of Distributed Enterprise Data is Necessary, and RIF can Help

Security Issues for the Semantic Web

Business Performance Management Standards

A Generic Transcoding Tool for Making Web Applications Adaptive

Visualizing RDF(S)-based Information

Semantic Search in Portals using Ontologies

Semantic Stored Procedures Programming Environment and performance analysis

Publishing Linked Data Requires More than Just Using a Tool

Linked Open Data A Way to Extract Knowledge from Global Datastores

Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services.

The Ontological Approach for SIEM Data Repository

How To Understand And Understand Common Lisp

Introduction to Web Services

Integration of Hotel Property Management Systems (HPMS) with Global Internet Reservation Systems

Logging in Java Applications

ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004

Presentation / Interface 1.3

mle: Enhancing the Exploration of Mailing List Archives Through Making Semantics Explicit

A Framework for Collaborative Project Planning Using Semantic Web Technology

Fast Infoset & Fast Web Services. Paul Sandoz Staff Engineer Sun Microsystems

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

An XML Based Data Exchange Model for Power System Studies

D5.3.2b Automatic Rigorous Testing Components

FIPA agent based network distributed control system

Semantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution

Albert Rainer, Jürgen Dorn, Peter Hrastnik

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

ARC: appmosphere RDF Classes for PHP Developers

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Principles and Foundations of Web Services: An Holistic View (Technologies, Business Drivers, Models, Architectures and Standards)

Application of ontologies for the integration of network monitoring platforms

Automatic Timeline Construction For Computer Forensics Purposes

T Network Application Frameworks and XML Web Services and WSDL Tancred Lindholm

SWARD: Semantic Web Abridged Relational Databases

Documentum Developer Program

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

DRUM Distributed Transactional Building Information Management

New Generation of Social Networks Based on Semantic Web Technologies: the Importance of Social Data Portability

An Ontology Model for Organizing Information Resources Sharing on Personal Web

Model Driven Interoperability through Semantic Annotations using SoaML and ODM

TECHNICAL Reports. Discovering Links for Metadata Enrichment on Computer Science Papers. Johann Schaible, Philipp Mayr

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Semantic Web & its Content Creation Process

IBM Rational Asset Manager

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Transcription:

Grids, Logs, and the Resource Description Framework Mark A. Holliday Department of Mathematics and Computer Science Western Carolina University Cullowhee, NC 28723, USA holliday@cs.wcu.edu Mark A. Baker, Richard J. Boakes Distributed Systems Group University of Portsmouth Mercantile House, Hampshire Terrace Portsmouth, Hampshire P01 2EG, UK mark.baker@computer.org, rjboakes@boakes.org Abstract Logs are an important tool in evaluating grid performance. Unfortunately, the many different log formats complicate the development of log analysis tools. We propose using the Resource Description Framework (RDF) to provide a common log format for grid environments. We develop an RDF vocabulary and then use this vocabulary to define the new log format. We then illustrate by an example how a log in the new format can be created from a log in one of the formats that is widely used. 1 Introduction In order to evaluate the correctness of a grid and subsequently improve its performance and dependability (performability), its current behaviour must be known. This often requires that events and other operational parameters are recorded in logs, which can be analyzed for metrics of interest. There are many different types of events that can be recorded such as message exchange, resource utilization, latency, or simple failure and success. Moreover, even when different systems are recording the same events, those systems may be using different record formats. As a result, a large number of different log analysis tools with similar functionality have had to be developed. The different log formats and event types lead to problems beyond the mere redundancy of analysis tools; they can mask patterns. If those patterns were recognizable, incorrect operation of the grid could be detected and the performance and dependability of the grid could be improved. We argue that it is desirable to use a single common log format that is written in a language that is both expressive and already used in other environments. Such homogeneity would reduce the number of log analysis tools and allows patterns across event generation sources to be detected. The requirement for expressiveness and use in other environments implies that existing work can be leveraged and is likely to aid in analysis of the logs. Mandating that all event generation sources use the same log format is not practical. Instead an attractive, single format should be identified and translation to that format from heterogeneous log formats needs to be supported. We contend that the Resource Description Framework (RDF) [1] is a good candidate for the specification of such a common format. This paper is organized with the next section identifying some important example log formats that may be generated in a grid. The Resource Description Framework has two key components that we will need to use: the RDF Model and the RDF Schema [2]. The RDF Model is introduced in the third section and the RDF Schema is introduced in the fourth section. Our proposed procedure for generating RDF-formatted logs is introduced in the next two sections. The fifth section presents the RDF vocabulary we use. The sixth section shows how this vocabulary can be used to construct an RDF-formatted log by showing an example translation. The seventh section briefly discusses log analysis. In the last section we discuss related work, future work, and conclude. 2 Grid-Related Logging Grids, as reflected in the documents produced by the Global Grid Forum [3], are increasingly based on Web Services' technologies. The Apache Jakarta Tomcat Servlet Container [4] and the Apache Axis SOAP engine [5] are widely used in grids and in Web Services implementations. Thus, we will focus on these technologies and applications using them as examples of software to be logged. Some logs record arbitrary events and are created using general purpose log statements inserted into the code being logged (see section 2.1). Conversely, specialized event logging may be required in the recording of message exchange or to log utilizations of particular resources (e.g. CPU utilization); see section 2.2. 2.1 General Purpose Logging For general purpose logging both Tomcat and Axis have adopted the Apache Jakarta Commons Logging (JCL)

package 0[6] which defines a log Java interface. JCL provides an implementation of this interface, which is a thin wrapper that can be used in conjunction with existing logging tools. Of these other logging tools, one often used is Apache Log4J [7]. The combination of JCL and Log4J can be used for logging of web applications as well as logging events internal to Tomcat and Axis. As illustrated by Figure 1, a Log4J logging statement typically just specifies the logging level (such as warn ) and the log message (such as low fuel level ). // get a logger instance Logger logger = Logger.getLogger("org.example "); // set its level logger.setlevel(level.info); // make a log entry logger.warn("low fuel level."); Figure 1 - Example use of Log4J Log4J records extra information about each log event in addition to that provided by the author of the logging statement. The PatternLayout class of the Log4J package can be used to specify the format and content of the output describing a particular log event. Conversion Characters specified in that class identify the aspects of the log event that can be displayed. Fourteen aspects are specified, such as the priority, the time of the event (in milliseconds since the start of the application); the method name where the logging event was issued, and the message associated with the logging event. Given that Tomcat and Axis are written in Java, another approach is to use the java.util.logging package [8], available since the release of Version 1.4 of the Java Development Kit. A combination, such as JCL and Log4J, can provide logging of general events at a single site. However, web applications are typically distributed. NetLogger [8][9] can be used with Log4J and supports logging for distributed applications. t DATE: 2004-04-15T21:30:01.425059 s LVL: Information s HOST: 131.243.2.143 s TGT: appended s EVNT: program.end Figure 2 - Example NetLogger record (from [9]) The NetLogger event or log record is divided into fields with each field having three parts: the typecode, the key, and the value. The typecode indicates how to interpret the value. Figure 2 shows an example NetLogger record where 's' denotes a string and 't' a timestamp. The key is a string describing the meaning of the value. The NetLogger authors have identified the need for a means of clock synchronization across the distributed sites in order to correctly merge the logs from the different sites. The Network Time Protocol (NTP) [9] [10] is the commonly used means of providing that clock synchronization. 2.2 Specialized Event Logging The stratified nature of computer systems results in log recorders of differing specializations and granularities; for example, the sending or receiving of a message between two systems could be considered a specialized event. Axis provides TCPMon for displaying the TCP packets exchanged and SOAPMon for displaying the SOAP messages exchanged [11]. Both record the same events, at differing granularities, external to of any programs that may generate, transmit, receive and use the messages. Complementally, the Web Services Interoperability Organization (WS-I) have developed two testing tools called WS-I Monitor [12] and WS-I Analyzer [13]. These tools will help ensure that Web Services implementations operate correctly, and thus, improve inter-operation between vendors. WS-I Monitor logs SOAP messages. WS-I Analyzer analyzes the logs generated by the WS-I Monitor to see if the message exchange patterns conform to the SOAP specification. Logging of specialized events extends beyond the recording of message transmission and receipt: the Discovery and Monitoring Event Description Working Group of the Global Grid Forum (GGF DAMED-WG) has defined a basic set of monitoring event descriptions called the Top 'N' Events [14]. The DAMED-WG follows the Grid Monitoring Architecture [15] in defining an event to be the smallest stand-alone unit of measurement information about grid components (usually hardware). A log record, called a sample event instance, has four parts: a timestamp, the event type, the target, and the value of the event type. Table 1 shows a subset of an example from the DAMED- WG specification [14]. Example event types are CPU load, system uptime, disk size, and ping RTT (Round Trip Time). There are six target types: host, process, disk partition, network link, software, and scheduler. Each target type has an appropriate identifier, for example, the identifier for the host target type is an IP address. Table 1 - A sample event instance for several of the DAMED-WG event types (a subset from [14]). Event Name Target Value delay.roundtrip 129.42.17.99, 140.221.9.95 23.5 processor.load 129.42.17.99 12 system.os.name 129.42.17.99 Linux 2.4.7-10

3 RDF Model Now that we have reviewed the typically used types of logs that we want to store in a common format, we turn to the representation we recommend for that common format. As described in the RDF Primer [16], the Resource Description Framework is a language for representing information about resources in the World Wide Web. In this section we introduce the part of the Resource Description Framework called the RDF Model. The RDF Model consists of a series of statements, with each statement asserting a fact about a resource. Much like sentences in many natural languages, each statement is organized into three parts: a subject, a predicate, and an object. The subject is the resource; the predicate is the property of the resource; the object is the value of that property of that resource (see Figure 3). Figure 3 - A sentence modeled as simplified RDF. An RDF statement is represented graphically by an ellipse for a Resource. An arrow that originates at the resource represents one of its Properties. This arrow terminates at an Object, which may be another resource (again represented as an ellipse) or a literal value, represented as a rectangle. Resources and Properties are therefore generally specified by Uniform Resource Identifiers (URI) [17], while the property value can be specified by a URI or by a literal, hence the distinction between Ellipses and Boxes for the Object. The simplicity of the RDF model gives it adaptability; any concept can be modeled. For example, an RDF Model might be used to describe a resource that is identified by the URI http://dsg.port.ack.uk/~rjb/foaf#me that is a Person whose name is Rich Boakes. This description is translated into two RDF statements. The first statement is that a resource that is identified by http://dsg.port.ack.uk/~rjb/foaf#me is a Person. The second statement is that a resource that is identified by http://dsg.port.ack.uk/~rjb/foaf#me has the name Rich Boakes. Of course, an English description of an RDF statement is not sufficient. RDF supports three representations of a RDF statement: graph-based, triples, and XML. Figure 4 shows the RDF graph representation for the example we are considering. In Figure 4 the top ellipse represents a resource that is identified by the URI http://dsg.port.ack.uk/~rjb/foaf#me. This resource is the subject for two statements. Figure 4 - The RDF graph representation for the example describing the Person named Rich Boakes (similar to an example in the RDF Primer [16]). An RDF statement can also be represented as a triple. As the name suggests, a triple consists of the three parts of an RDF statement (resource, property, and property value) which are shown as text each separated by white-space with a full stop terminating the line. Each field contains the identifier (URI or literal) for one of the three parts of the statement (resource, property, or property value) surrounded by brackets. Each arc in the graph representation describes a single RDF statement and thus a single triple. For example, the left hand arc in Figure 4 describes the triple: <http://dsg.port.ac.uk/~rjb/foaf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/#person>. As this example illustrates, URI references can be lengthy. They can be replaced by a corresponding XML qualified name (or QName), which consists of a prefix that has been assigned to a namespace URI, followed by a colon, and then a local name. There are some well-known URI prefixes such as the prefix rdf for the namespace URI http://www.w3.org/1999/02/22-rdf-syntax-ns# and foaf for http://xmlns.com/foaf/0.1/#. Thus, the above triple could have been written as <http://dsg.port.ac.uk/~rjb/foaf#me> <rdf:type> <foaf:person>. It is also possible to serialize an RDF model as XML. RDF/XML is part of the RDF specification and is the most commonly used form for storing and communicating RDF models on the Web. The above description is only intended to provide an introduction to the RDF Model; a more comprehensive and precise description can be found in the RDF Specification [1] and the RDF Primer [16]. 4 RDF Schema As described in the RDF Primer, RDF provides a way to express simple statements about resources, using named properties and values. However, RDF user communities also need the ability to define the vocabularies (terms) they intend to use in those statements, specifically, to indicate that they are describing specific kinds or classes

of resources, and will use specific properties in describing those resources. [16]. In RDF, basic vocabularies can be defined using RDF Schema (RDF-S). Where vocabularies define term meanings; ontologies define how vocabulary terms are structured and related. The Web Ontology Language (OWL) builds upon the rudimentary structures provided by RDF and RDF-S and has more facilities for expressing meaning and semantics than XML, RDF, and RDF-S [18]. Using RDF-S and/or OWL, a number of ontologies have already been developed. For example, the FOAF (Friend of a Friend) schema is an RDF vocabulary that facilitates the creation of a Web of machine-readable homepages describing people, the links between them and the things they create and do. [19]. The example in Figure 5 uses the Jena [20] API which enables RDF Models to be created and manipulated in the Java language. The program creates the same RDF model that is represented graphically in Figure 4, and illustrates how an RDF schema (FOAF) is used. Each addproperty method results in the creation of a statement within the model. The call uses its first argument to define the property (the arrow) and the second argument to define that property s value (which is either a literal (box) or another resource (ellipse)). The serialization to RDF/XML is automatically done by the model.write method call at the end of the program. 5 A Logging Vocabulary and Ontology In sections 2, 3 and 4 we reviewed some of the widely used logging tools for grid and Web Services' environments; and introduced the Resource Description Framework. We now propose a specific procedure for creating RDF-formatted log files from logs that have been generated by logging software that is widely used in these environments. We define a vocabulary using RDF-S to describe the elements in this Generic Unified Log Format (GULF). The vocabulary is defined in the schema file whose address is http://dsg.port.ac.uk/schemas/log - for brevity we assign the QName gulf. We will consider three types of originating logs: Log4J records (gulf:l4j-), NetLogger (gulf:nl-) records, and the event descriptions (the Top 'N' Event descriptions) specified by the GGF DAMED-WG (gulf:damed-). We then present an example RDF model for this common log format including a Jena program fragment for creating the RDF/XML for that model. We then illustrate how several log formats used in Web Services and the grid can be mapped to this common log format. Finally, we briefly discuss some possible opportunities for analysis of logs in this new format. This section describes the first step in our procedure, which is to define the gulf vocabulary. This vocabulary describes the components of logs that have been translated into RDF format. The gulf vocabulary consists of one class definition and several property definitions. The class describes a single log record in RDF-format. Each property represents a possible field in such a log record. Once class is defined, gulf:record, using the following triple: <gulf:record> <rdf:type> <rdfs:class>. Recall that in triple notation the first value is the resource, the second is the property, and the third is the property value. A class in RDF Schema is any resource, in this case import com.hp.hpl.jena.rdf.model.*; public class Example1 { public static void main(string[] args) { // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resources Resource rich = model.createresource("http://dsg.port.ac.uk/~rjb/contact#me"); Resource foafperson = model.createresource("http://xmlns.com/foaf/0.1/#person"); // create the properties Property rdftype = model.createproperty("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"); Property foafname = model.createproperty("http://xmlns.com/foaf/0.1/#name"); // add the properties rich.addproperty(rdftype, foafperson); rich.addproperty(foafname, "Rich Boakes"); } } // now write the RDF model in XML format to System.out model.write(system.out); Figure 5 A java program using the Jena API [20] to create the example model.

named gulf:record, which has a rdf:type property whose value is rdfs:class. Thus, in this triple we are defining the class gulf:record. We next want to specify the properties of the gulf:record class. Since this class represents a single record in a log file; a record could have come from any of the original types of logs, we use all the fields of the original logs as properties for the gulf:record class. We consider, in turn, properties arising from each of the three log sources: Log4J, NetLogger and DAMED-WG. For supporting logs generated by Log4J the gulf:record class must represent each conversion character in the PatternLayout class of Log4J (see section 2). For example, the conversion character p causes the display of the priority of the logging event. The corresponding property is defined by the triple: <gulf:l4j-eventpriority> <rdf:type> <rdf:property>. This triple illustrates how a property is defined in the RDF Schema. A property in RDF-S is any that has a property rdf:type whose value is rdf:property. Thus, in this triple we are defining a property called gulf:l4jeventpriority. Note that on its own this triple only defines that there is a property with a name; nothing is stated about the possible values it can hold, or the classes to which it relates. We were unable to find a list of the standard values for the part named key in a NetLogger field. It may be left to the person creating a particular log to decide. One that is clearly important that is not part of the Log4J properties is HOST. This is because NetLogger was designed for distributed applications where the merged logs would contain entries from different hosts. Thus we added the property gulf:nl-host to the gulf vocabulary. As discussed in Section 2, for the DAMED-WG specification of event descriptions, the Top 'N' Events, there are four properties per event (i.e. log record): timestamp, event name, target, and value. We added a triple statement for each of these properties as shown below. <gulf:damed-timestamp> <rdf:type> <rdf:property>. <gulf:damed-eventname> <rdf:type> <rdf:property>. <gulf:damed-target> <rdf:type> <rdf:property>. <gulf:damed-value> <rdf:type> <rdf:property>. So far we have used triples to describe the gulf vocabulary, which consists of one class and a number of properties. We often want to restrict the possible values that a property might have. That a value of a particular property must be an instance of a designated class or of a particular type literal is done using the rdfs:range property. The example below causes gulf:l4j-timestamp to be restricted to integer values. <gulf:l4j-timestamp> <rdfs:range> <xsd:integer>. Besides having properties to represent the fields in a log record, it is possible that we may want a property to indicate an ordering among the log records. The property gulf:l4j-next can be used in this way if it is declared using the following two triples: <gulf:l4j-next> <rdf:type> <rdf:property>. <gulf:l4j-next> <rdfs:range> <gulf:record>. These two triples declare a property and restrict it to only take as values instances of the class gulf:record. This is an unusual property compared to the other properties that we have defined since its value is not a literal. There is still much left to do. Describing the vocabulary so far does not relate the introduced properties to the introduced class. We can do that through RDF-S by using the rdfs:domain property. The rdfs:domain property is used to indicate that a particular property applies to a particular class (and only that class). For example: <gulf:l4j-timestamp> <rdfs:domain> <gulf:record>. states that the property gulf:l4j-timestamp can only occur as a property of an instance of the class gulf:record. Since all the properties that we have declared are meant to be possible fields of a log record we want to use rdfs:domain to restrict the domain of all these properties. Thus, for each property that we have defined in the gulf schema we add a triple statement similar to the one above. So far we have been describing the gulf vocabulary using triples notation. In Figure 7 we use Jena to make this conversion. In particular, we use Jena to write a Java program that has statements similar to the statements in Figure 5 to specify each RDF statement. The program then includes a model.write statement to create the RDF/XML file for gulf. 6 The Proposed Procedure At this point we have an RDF/XML file for an ontology that we can use when creating an RDF/XML file to represent the records in a particular log from one of the originating log formats (Log4J, NetLogger, or DAMED- WG). Section 6.1 shows the ontology in use, and section 6.2 describes a conversion program. 6.1 Using the Ontology The next step is to show how that an RDF model containing log statements can be created. In Figure 5, we illustrated how a Java program using the Jena framework can be used to create RDF. We now extend this example to create a model containing a single log record that uses our ontology. The program is a series of statements which specify the RDF triples. For each log record there is one RDF triple to specify the log record itself and then an RDF triple to specify the value of each property that is a field of that log record in the original log.

Table 2 describes a hypothetical series of DAMED-WG records that we have constructed following the description of the DAMED-WG specification [14]. It is similar to that described in Table 1, except that the timestamp field has been added. It contains three log records, so three gulf:record instances are necessary, each with four properties: timestamp, event name, target, and value. Table 2 - An example hypothetical log of records generated using the DAMED-WG specification. Timestamp Event Name Target Value 0 system.architecture 129.42.17.99 Intel Pentium III 28 processor.load 129.42.17.99 12 72 processor.load 129.42.17.99 8 In Figure 6 we show the triple statements needed for creating and initializing the first of these three RDF log records. The first triple indicates that the resource gulf:rec:1 is an instance of the class gulf:record. The second triple indicates that one property of this log record is the DAMED-WG timestamp field and the value of that property for this record is the number 0. The other triples are similar. <gulf:rec:1> <rdf:type> <gulf:record>. <gulf:rec:1> <gulf:damed-timestamp> 0. <gulf:rec:1> <gulf:damed-eventname> system.architecture. <gulf:rec:1> <gulf:damed-target> 129.42.17.99. <gulf:rec:1> <gulf:damedvalue> Intel Pentium III. Figure 6 - The first record of Table represented as triples. The Java program in Figure 7 uses Jena to create an RDF model containing the first record from Table 2. 6.2 A Conversion Program The job of a conversion program is to translate a log file from its initial format into an RDF model; storing the resulting model in a format that can be interchanged freely. We therefore recommend the RDF/XML serialization format. A conversion program must translate all records, or a subset of records, from their original format into RDF using the gulf ontology. This involves selecting each record in turn, extracting the portions of those records and assigning them to properties, and then writing them into the model as illustrated in Figure 7. A key issue is the generation of a unique URI for each the resource. Within a single log file, each resource can be uniquely identified using an incremental URI, however, if import com.hp.hpl.jena.rdf.model.*; public class Example2 { public static void main(string[] args) { // create an empty Model Model model = ModelFactory.createDefaultModel(); // specify the GULF namespace String gulf = "http://dsg.port.ac.uk/schema/gulf"; // create the properties Property gulfdamedtimestamp = model.createproperty(gulf, "damed-timestamp"); Property gulfdamedeventname = model.createproperty(gulf, "damed-eventname"); Property gulfdamedtarget = model.createproperty(gulf, "damed-target"); Property gulfdamedvalue = model.createproperty(gulf, "damed-value"); Property gulfrecord = model.createproperty(gulf, "Record"); Property rdftype = model.createproperty("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"); // create the resource Resource record0001 = model.createresource( "gulf:rec:1" ); // add the properties record0001.addproperty(rdftype, gulfrecord); record0001.addproperty(gulfdamedtimestamp, 0); record0001.addproperty(gulfdamedeventname, "system.architecture" ); record0001.addproperty(gulfdamedtarget, "127.42.17.99" ); record0001.addproperty(gulfdamedvalue, "Intel Pentium III" ); //... statements for the other two log records } } // now write the RDF model in XML format to System.out model.write(system.out); Figure 7 - An example Java program using the Jena Java API [20] for creating a DAMED-WG gulf:record

the same base name is used for each file, then there will be a clash when records are combined. To avoid this, we recommend a naming strategy that encodes the log s original host, filename and start date into the base URI: gulf:host/path/filename:datestamp#recordnumber e.g. gulf:example.org/logs/access.log:20041208#1 Further, if any of the logs already incorporate a unique field, it is highly recommended that this be incorporated into the URI. 7 Log Analysis The procedure described above would generate a file in RDF/XML that describes the records in a log that was created by Log4J, NetLogger, or is following the DAMED-WG specification. Having such a common log format has a number of advantages as described in Section 1. The next step is to develop log analysis tools that are designed for processing RDF and in particular files that use the gulf vocabulary. We conjecture that the work that has been undertaken developing RDF-based tools elsewhere can be leveraged to aid in the development of quality log analysis tools; and describe some of the issues which must be overcome by such tools. RDF can be an inefficient mechanism for data storage when compared to normalized databases which remove redundancy. The nature of conversions tools, when built as stand alone entities, is that they have little or no access to data that already exists in other RDF/XML files, only what exists in their temporary model, so duplicate Resources and Literals are common. Consequently significant filtering and inference capabilities are necessary within analysis programs. RDF is a machine oriented model. RDF is designed to be read and interpreted by computers, not humans; consequently an analysis tool must provide an adequate presentation, navigation and query mechanism such that logs can be properly investigated. The benefit of a common log format is that logs from disparate sources can be combined and analyzed as if they were a single recording of events. Converting data from different logs into a common format, and then combining the results is a positive first step: however, to fully realize the combination it is necessary to do more than a naïve graph merge. The masking of information described in Section 1 can begin to be overcome when more intelligent methods of graph merging are utilized, perhaps those that look for matches in string literals and combine them, or those that enable plug-in annotation engines to analyze every new statement with reference to the existing model as it is imported. In order to promote more intelligent merging it has become apparent to us that a superset of log concepts should be developed or adopted. A possible avenue of research here involves the adoption of the Semantic-Web Knowledge Organization Systems (SKOS) [21] mapping vocabulary [22] so that each property that is specific to a certain log type would then also relate to a concept in a more generic logging thesaurus. This would enable inference engines to more effectively query log graphs. 8 Conclusions This paper proposes using the Resource Description Framework to provide a common log format for logs that are generated by web applications and logs generated by the key components of Web Services' and grid environments. As shown in Section 2 there has been substantial work in developing logging for web applications and web service components. As shown in Section 3 and Section 4 there has been substantial work in developing the RDF for representing metadata. However, we are unaware of any work studying the combination of these two areas in order to use RDF for log storage and log analysis. This paper makes two contributions. First, it proposes the idea of creating logs in a common RDF format that are translations of logs from heterogeneous sources that would arise in grid and Web Services' environments. We identified some of the widely used ways of creating logs in these environments including Log4J, NetLogger, and the DAMED-WG specification. We identify several advantages from having homogeneous logs that come from all these sources, from having that common file format be RDF-based, and from having the ability to develop log analysis tools for such logs. The second contribution is to make a first attempt at developing a procedure for creating such common logs using RDF. As background we explained the key concepts in the RDF model including the three representations: graph, triples, and XML. We then explained the need and operation of RDF Schema for creating a vocabulary. The other part of the background was illustrating how a Java program can be written and used with Jena to simplify the generation of a RDF/XML file. After this background, we started the development of our proposed procedure by introducing a vocabulary, gulf. The vocabulary specification is illustrated as a series of RDF statements (shown as triples) with those statements using parts of the RDF Schema predefined vocabulary (e.g. rdfs:class) and the property rdf:type. The gulf vocabulary contains one class, Record, and a property for each field that might arise in a log record generated by Log4J, NetLogger, or by following the DAMED-WG specification. Also the possible values of those properties were restricted using rdfs:range and rdfs:domain.

The code for the creation of an RDF based log, using the gulf vocabulary, is specific to the particular log being translated. We illustrated this using a hypothetical (and simple) DAMED-WG. We showed the RDF statements (as triples) necessary to translate the first of the example log records. We then showed the Java code to generate those RDF statements and to generate the RDF/XML file. This paper primarily focuses on the RDF log format issues. Detailed discussion of RDF log analysis will be considered in future work. This paper starts the exploration of the issues involved in using RDF for storing and analyzing logs generated in a grid or Web Services' environments. Clearly the next step is to implement the proposed procedure. In particular, the gulf vocabulary needs to be completely stated. Then, the simple example presented of translating DAMED-WG log records into gulf RDF log records needs to be converted and generalized to be a Java Jena program that can handle an arbitrary DAMED-WG log. That translation program then needs to be extended to handle Log4J and NetLogger logs. At that point there are two possible next steps that could be explored concurrently. One step is to evaluate the usefulness of RDF-formatted log records by developing and using analysis tools that work with logs in this format. The other next step is to revisit the gulf vocabulary; either with reference to the RDF-S class structure capabilities, or to adopt the more descriptive Web Ontology language (OWL) in order to more expressively describe the structural relationships between different log formats. A final extension of the gulf vocabulary would be to include other originating log formats besides Log4J, NetLogger, and DAMED-WG. 9 Acknowledgements Our thanks to the National Science Foundation (DUE 0410667) and the Office of the President of the University of North Carolina System for their financial support of this project. Our thanks also to Mark Baker and the Distributed Systems Group at the University of Portsmouth in England for the time Mark Holliday spent there while working on this paper. 10 References [1] W3C--Resource Description Framework, http://www.w3.org/rdf/. [2] W3C--RDF Schema Specification, http://www.w3.org/tr/2000/cr-rdf-schema-20000327/. [3] Global Grid Forum Documents, http://www.ggf.org/documents/final.htm. [4] Apache Jakarta Tomcat Project, http://jakarta.apache.org/tomcat/index.htm. [5] Apache Axis Project, http://ws.apache/axis/. [6] Short Introduction to Log4J, 2002, http://logging.apache.org/log4j/docs/manual.html. [7] Java Logging APIs (java.util.logging), http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/ [8] B. Tierney and D. Gunter, NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging, LBNL Tech Report LBNL-51276. [9] Netlogger Toolkit, http://wwwdidc.lbl.gov/netlogger/. [10] Network Time Protocol (NTP), http://www.ntp.org/. [11] Apache Axis TCP Monitor and SOAP Monitor, http://ws.apache.org/axis/java/user-guide.html, http://ws.apache.org/axis/java/soapmonitor-userguide.html. [12] Web Services Interoperability Organization, Monitor Tool Functional Specification, http://www.ws-i.org/documents.aspx. [13] Web Services Interoperability Organization, Analyzer Tool Functional Specification, http://www.ws-i.org/documents.aspx. [14] Global Grid Forum: Discovery and Monitoring Event Description Working Group (DAMED-WG), http://www-didc.lbl.gov/damed/. [15] Global Grid Forum Grid Monitoring Architecture Working Group, A Grid Monitoring Architecture, revised January 2002, http://www-didc.lbl.gov/ggf-perf/gma-wg/. [16] W3C--RDF Primer, http://www.w3.org/tr/rdf-primer/. [17] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax", August 1998, http://www.isi.edu/in-notes/rfc2396.html/. [18] Web Ontology Language (OWL), http://www.w3.org/2004/owl/. [19] Friend of a Friend Project, http://www.foaf-project.org/. [20] Jena: A Semantic Web Framework for Java, http://jena.sourceforge.net/. [21] W3C Semantic Knowledge Organisation Systems, http://www.w3.org/2004/02/skos/ [22] W3C SKOS Mapping Specification, http://www.w3.org/2004/02/skos/mapping/spec/