Novel Data Extraction Language for Structured Log Analysis

Size: px
Start display at page:

Download "Novel Data Extraction Language for Structured Log Analysis"

Transcription

1 Novel Data Extraction Language for Structured Log Analysis P.W.D.C. Jayathilake 99X Technology, Sri Lanka. ABSTRACT This paper presents the implementation of a new log data extraction language. Theoretical formation of the language schema was presented in a previous work of ours (Jayathilake, 2011). In the design of new language we focus on specific problems encountered in automating log analysis. Emphasis is put on the structured nature of log files. A brief review on existing data format description mechanisms is also provided. After describing the implementation of the new language, we compare it with another popular data description language to highlight the unique capabilities of it. KEYWORDS Log data extraction, Data format description language, Log analysis, Declarative language INTRODUCTION Software log files contain information pertaining to most user and system actions within an organization. Regulations such as PCI DSS, FISMA, HIPAA and frameworks like ISO and COBIT emphasize standards on logging. If utilized properly, log data can generate a huge value in various facets of a business. Log analysis has proven its potential in intrusion detection, unintended user activity identification, system compliance testing, software troubleshooting, software monitoring, performance benchmarking and functional testing. Despite its benefits, log analysis is a process that incurs a huge cost if the entire range of its phases is performed manually. Reasons are two fold; log analysis requires expertise and a significant time is consumed for digging deep into loads of data and making inferences. Commercial tools that exist in the market deliver a range of functionalities for automating certain stages of the analysis process. These include log data collection from different sources, data indexing, searching, automatic identification of common log file constructs such as timestamps and IP addresses, customizable dashboards for data visualization, highlighting anomalies, automatic compliance checks, etc. All existing tools treat log data as unstructured information. Though high entropy of log information justifies this practice, it imposes numerous limitations in automating log analysis. Lack of contextual correctness, for example, poses many challenges in creating semantics for inferring results automatically. Jayathilake (2011) published an initial version of a framework that creates a platform for structured log analysis. Its core constituent was a new procedural language, which was designed to be used in every phase of automated log analysis. Though the language proved to be powerful in processing log data, we soon realized its inappropriateness in log data extraction. Jayathilake (2011) later published a specification for a declarative language for describing the format of any log file. The intention was to make the log file format declaration more readable and to pick information of interest from log files more easily. We proved the flexibility of the specification in expressing formats of different log file types such as line logs, highly structured logs and tabular logs. Furthermore, we verified that the specification is resilient for log file corruptions, which is a prominent problem in the domain. 1

2 This paper presents an implementation of that specification based on Simple Declarative Language. Simple Declarative Language (SDL) is an easy representation mechanism for data structures (Leuck, 2012). Java and.net implementations for the language already exist so that a syntax expressed in a compliant format can be parsed easily. We formulate a syntax that facilitates easy expression of all language constructs. Since the syntax is compliant with SDL, we could use an existing SDL parser for lexical analysis. Interpretation stage is implemented according to a new algorithm, which uses recursion extensively. Inconsistent log data are handled through a hierarchical fault tolerance mechanism that provides users to select the level of recovery after detecting a log file corruption. Selective data extraction is supported to enable users to cherry-pick data from huge log files for further analysis. Supportive routines are added to reduce effort in dealing with common log file constructs such as timestamps, IP addresses, port numbers and error codes. Output of the data extraction process is a tree that incarnates semantic relationships between log entries. High expressiveness, simplicity, short learning curve, readability and immunity for log file corruptions are the strengths that we identify in the language. EXISTING DATA DESCRIPTION LANGUAGES This section provides an overview on existing data description languages. EAST - East is an ISO standard data description language, which is developed by the Consultative Committee for Space Data Systems (CCSDS, 2007). It provides a rich mechanism to express data format completely and non-ambiguously. Data are regarded as a collection of data entities and the EAST description is used to interpret and gain access to those entities. Its main design goals are strong data description capabilities, human readability, and computer interpretability. One prominent problem with EAST is the lack of support for describing file structures where position of one data entity needs to be determined at run time by examining fields in other entities. DRB Data Request Broker is an open source Java application programming interface (GAEL, 2009). It is an expansion on EAST. It can be used for reading, writing and processing heterogeneous data. DRB is a software abstraction layer, which can be utilized by developers for programming applications independently from the way data are encoded within files. It is also possible to perform calculations using XQuery from within the data description allowing full description of files where the locations of data fields within a file must be calculated from other data fields. However calculations must be described in XQuery and can lead to increased complexity hence reduced human readability. PADS/ML - This is a domain-specific language designed to improve the productivity of data analysts. It is a functional computer language to formally specify the logical and physical structure in data (Mandelbaum et al, 2007). In contrast to other data description languages, PADS/ML provides a platform where the description can stand as a sound documentation on the data too. However, it does not offer a satisfactory level of support for describing semantic information. DFDL Data Format Description Language is an open standard that came up due to the need for representing text and binary data with various formats in a common XML paradigm (OGF, 2010). It also allows data to be taken from an XML model and written out to its native format. By having a data format described with a DFDL description, which is accessible to 2

3 multiple applications, one can provide a common interface to the data, therefore facilitating data interchange. DFDL does not inherently support semantic information but can be used in conjunction with ontologies for this purpose. One drawback is the verbose nature of DFDL because of XML metadata that affects human readability. HAWK This is a powerful, flexible language for log file analysis, which uses simple methods to analyse. Its basis in pattern-action pairs allow for flexible combination of programs. It provides support for a range of log file analysis functionality such as filtering, recoding, and counting. The language provides the processing power for analysing the log files (HAWK, 2009). BFD - The Binary Format Description language is an XML-based language for expressing binary data formats (National Collaboratories, 2003). It is an extension to extensible Scientific Interchange Language(XSIL). A BFD template can be used to extract data from a set of files and put them into an XML for further processing. REQUIREMENT FOR A NEW LOG DATA EXTRACTION LANGUAGE Above-mentioned languages are mostly generic data format description languages that consider a wide range of applications. Log analysis is one niche where those tools can be utilized. However, log analysis, as a separate domain, exhibits unique characteristics and poses specific problems. For example, corrupted data is a prominent challenge facing any attempt to automate the analysis process. Huge amount of data, inconsistent formats and frequent format changes further add to this. Even so it is vital to have a data description scheme that results in more human readable templates compared to highly verbose XML solutions. THE NEW LANGUAGE In order to address these unique needs we designed a new log data extraction language based on a simple schema. Jayathilake (2011) discussed the theoretical formulation of this language along with case studies on its applications. In summary, it is based on interpreting a log file as a hierarchy of units termed log entities. Three types of log entities are identified. 1. Type A This type is defined as a sequence of other log entries defined by the pair ([LE 1, LE 2,, LE N ], ERROR_RECOVERY) where LE i are log entries. The sequence should be built with the same order of log entries as specified inside the square brackets in the first element of the pair. ERROR_RECOVERY is a flag that indicates whether the system should try to recover from parse errors for this type of log entries. 2. Type B This is a sequence of other log entries defined by the 4-tuple ({LE 1, LE 2,, LE N }, MAX, MIN, ERROR_RECOVERY) where LE i are log entries. The sequence can be built with those log entries by putting them in any order. Each LEi can appear in the sequence zero or more times. The list containing LEis is termed the candidate list for the sequence. MAX is the maximum number of log entries permitted in the sequence. If its value is -1, there is no limit for the length of the sequence. Similarly MIN is the minimum number of log entries that should 3

4 be present in the sequence. -1 indicates that there is no lower bound for the length of the sequence. ERROR_RECOVERY is a flag having same semantics as in definition for Type A. 3. Type C A singleton (k) where k is a fixed sequence of bytes. The language also provides a mechanism to recover from corruptions in a log file. When a part of text that does not follow the format in the description is detected, the interpreter has the ability to fall-back to the next log entry and to continue execution without premature termination. IMPLEMENTATION We implemented the language syntax in Simple Declarative Language (SDL), which provides infrastructure for describing arbitrary data formats. Below we explain the syntax for each of the three log entry types through examples. 1. Type A Line typea Timestamp Process TID Area Category ER=true This syntax defines a log entry named Line, which is built by a sequence of other log entries Timestamp, Process, TID, Area, and Category. Error recovery (ER) is set to true. 2. Type B Gap typeb Space Tab Max=-1 Min=2 ER=false This is a definition of a log entry named Gap, which stands for an empty space created by two or more spaces and tabs. Spaces and tabs can occur in any order and quantity. Error recovery (ER) is set to false. Char typec a 3. Type C The Type C log entry Char defined here stands for the character a. Implementation of the language is shown in Fig. 1. It constitutes two main components; the lexical analysis module (parser) and an interpreter module. The parser module processes the given format specification using SDL. This is possible since the new language syntax is compliant with SDL. Log file content is lexically analysed with respect to the pre-processed format specification. After that, the interpreter extracts log file content and converts it to a proprietary binary format. This data format is ready to be processed by the log data analysis framework presented in Jayathilake (2011). A recursive algorithm is used to implement the interpreter module. 4

5 Figure 1: Implementation of the new language COMPARISON WITH DFDL In this section we provide a comparison of the new language with DFDL, which is another promising technology to express file formats. Similar to any other XML based schemas DFDL incurs significant metadata overhead. Log entry formats expressed in DFDL are much verbose than their expressions in our schema. Less verbose Resilient for log corruptions Optimized for log file formats Lot of metadata Unable to handle log corruptions Offers a powerful type system Line typea Timestamp Process TID Area Category ER=true Figure 2: Comparison between our schema and DFDL 5

6 Fig. 2 provides a comparison between the expressions of one log entry in our schema and in DFDL. The new schema results in more compact and readable format expressions. Since the new language is specifically designed for log file formats, in contrast to DFDL, which is a generic format expression mechanism, the new language offers few other benefits too. One prominent advantage of it is the ability to deal with log corruptions. On the other hand, DFDL provides a rich type system so that most common data types are natively identified. CONCLUSION The new log data extraction language has the capability to express a wide range of log file formats while offering a simple, human-readable syntax. Its hierarchical interpretation of log entries enables it to capture difficult log formats containing lot of peculiarities that many other existing data format description languages fail on. The schema is proven to work with many industrial log file types such as line logs, message logs, XML logs and tabular logs. A prominent feature in the new language is its ability to deal with inconsistencies and corruptions in log files. It strengthens the automated log analysis mechanism with the ability to use as much correct data as possible. Simple Declarative Language provided a useful platform when implementing the language syntax. The current implementation of the language supports only text log files, which constitutes a limitation. It can be enhanced to handle binary logs too. A further improvement can be adding the capability to handle log file formats where the location of one log entry should be dynamically read from another log entry. REFERENCES Jayathilake, D. (2011) A mind map based framework for automated software log file analysis, Proceedings of the International Conference on Software and Computer Applications (ICSCA 2011), pp Jayathilake, D. (2011) A novel mind map based approach for log data extraction, Proceedings of the 6th IEEE International Conference on Industrial and Information Systems (ICIIS 2011), pp Andrews, J. H. (1988) Testing using log file analysis: tools, methods and issues, Proceedings of the 13th IEEE International Conference on Automated Software Engineering, pp Valdman, J. (2001) Log file analysis, Department of Computer Science and Engineering (FAV UWB), Tech. Rep. DCSE/TR Consultative Committee for Space Data Systems, The Data Description language EAST Specification. [pdf]. Available at: < p-2.1/ccsds p-2.1.pdf> [Accessed 05 May 2012]. GAEL Consultant, Data Request Broker. [online] Available at: < [Accessed 05 May 2012]. Mandelbaum, Y., Fisher, K., Walker, D., Fernandez, M. and Gleyzer, A. (2007) PADS/ML: A Functional Data Description Language, Proceedings of the 34th annual ACM SIGPLAN- SIGACT symposium on Principles of programming languages (POPL 07), pp

7 OGF Data Format Description Language Working Group, Data Format Description Language (DFDL) v1.0 Core Specification. [pdf]. Available at: < [Accessed 05 May 2012]. HAWK Network Defense, The Future: Dynamic Log Analysis. [pdf]. Available at: < Whitepaper4.pdf> [Accessed 05 May 2012]. National Collaboratories, Binary Format Description (BFD) Language. [online] Available at: < [Accessed 05 May 2012]. Daniel Leuck, Simple Declarative Language. [online] Available at: < [Accessed 05 May 2012]. 7

A Mind Map Based Framework for Automated Software Log File Analysis

A Mind Map Based Framework for Automated Software Log File Analysis 2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) (2011) IACSIT Press, Singapore A Mind Map Based Framework for Automated Software Log File Analysis Dileepa Jayathilake

More information

FiskP, DLLP and XML

FiskP, DLLP and XML XML-based Data Integration for Semantic Information Portals Patrick Lehti, Peter Fankhauser, Silvia von Stackelberg, Nitesh Shrestha Fraunhofer IPSI, Darmstadt, Germany lehti,fankhaus,sstackel@ipsi.fraunhofer.de

More information

Rotorcraft Health Management System (RHMS)

Rotorcraft Health Management System (RHMS) AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

2. Distributed Handwriting Recognition. Abstract. 1. Introduction XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

MULTI AGENT-BASED DISTRIBUTED DATA MINING

MULTI AGENT-BASED DISTRIBUTED DATA MINING MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:

More information

Automatic Timeline Construction For Computer Forensics Purposes

Automatic Timeline Construction For Computer Forensics Purposes Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,

More information

XML DATA INTEGRATION SYSTEM

XML DATA INTEGRATION SYSTEM XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

Secure Semantic Web Service Using SAML

Secure Semantic Web Service Using SAML Secure Semantic Web Service Using SAML JOO-YOUNG LEE and KI-YOUNG MOON Information Security Department Electronics and Telecommunications Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon KOREA

More information

Introduction to Web Services

Introduction to Web Services Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

A Recommendation Framework Based on the Analytic Network Process and its Application in the Semantic Technology Domain

A Recommendation Framework Based on the Analytic Network Process and its Application in the Semantic Technology Domain A Recommendation Framework Based on the Analytic Network Process and its Application in the Semantic Technology Domain Student: Filip Radulovic - fradulovic@fi.upm.es Supervisors: Raúl García-Castro, Asunción

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge

More information

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

BUSINESS VALUE OF SEMANTIC TECHNOLOGY BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director

More information

Distributed Database for Environmental Data Integration

Distributed Database for Environmental Data Integration Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information

More information

XpoLog Competitive Comparison Sheet

XpoLog Competitive Comparison Sheet XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT

More information

Some Research Challenges for Big Data Analytics of Intelligent Security

Some Research Challenges for Big Data Analytics of Intelligent Security Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,

More information

II. PREVIOUS RELATED WORK

II. PREVIOUS RELATED WORK An extended rule framework for web forms: adding to metadata with custom rules to control appearance Atia M. Albhbah and Mick J. Ridley Abstract This paper proposes the use of rules that involve code to

More information

Multiple electronic signatures on multiple documents

Multiple electronic signatures on multiple documents Multiple electronic signatures on multiple documents Antonio Lioy and Gianluca Ramunno Politecnico di Torino Dip. di Automatica e Informatica Torino (Italy) e-mail: lioy@polito.it, ramunno@polito.it web

More information

A Semantic Approach for Access Control in Web Services

A Semantic Approach for Access Control in Web Services A Semantic Approach for Access Control in Web Services M. I. Yagüe, J. Mª Troya Computer Science Department, University of Málaga, Málaga, Spain {yague, troya}@lcc.uma.es Abstract One of the most important

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

The Ontological Approach for SIEM Data Repository

The Ontological Approach for SIEM Data Repository The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian

More information

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University

More information

Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

More information

DataDirect XQuery Technical Overview

DataDirect XQuery Technical Overview DataDirect XQuery Technical Overview Table of Contents 1. Feature Overview... 2 2. Relational Database Support... 3 3. Performance and Scalability for Relational Data... 3 4. XML Input and Output... 4

More information

A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS

A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS Abstract T.VENGATTARAMAN * Department of Computer Science, Pondicherry University, Puducherry, India. A.RAMALINGAM Department of MCA, Sri

More information

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Ramaswamy Chandramouli National Institute of Standards and Technology Gaithersburg, MD 20899,USA 001-301-975-5013 chandramouli@nist.gov

More information

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Integrity Preservation and Privacy Protection for Digital Medical Images M.Krishna Rani Dr.S.Bhargavi IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Abstract- In medical treatments, the integrity

More information

A Scalability Model for Managing Distributed-organized Internet Services

A Scalability Model for Managing Distributed-organized Internet Services A Scalability Model for Managing Distributed-organized Internet Services TSUN-YU HSIAO, KO-HSU SU, SHYAN-MING YUAN Department of Computer Science, National Chiao-Tung University. No. 1001, Ta Hsueh Road,

More information

FIPA agent based network distributed control system

FIPA agent based network distributed control system FIPA agent based network distributed control system V.Gyurjyan, D. Abbott, G. Heyes, E. Jastrzembski, C. Timmer, E. Wolin TJNAF, Newport News, VA 23606, USA A control system with the capabilities to combine

More information

Advantages of XML as a data model for a CRIS

Advantages of XML as a data model for a CRIS Advantages of XML as a data model for a CRIS Patrick Lay, Stefan Bärisch GESIS-IZ, Bonn, Germany Summary In this paper, we present advantages of using a hierarchical, XML 1 -based data model as the basis

More information

Syntax Check of Embedded SQL in C++ with Proto

Syntax Check of Embedded SQL in C++ with Proto Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 383 390. Syntax Check of Embedded SQL in C++ with Proto Zalán Szűgyi, Zoltán Porkoláb

More information

Efficient Information Retrieval in Network Management Using Web Services

Efficient Information Retrieval in Network Management Using Web Services Efficient Information Retrieval in Network Management Using Web Services Aimilios Chourmouziadis 1, George Pavlou 1 1 Center of Communications and Systems Research, Department of Electronic and Physical

More information

Enterprise Data Quality

Enterprise Data Quality Enterprise Data Quality An Approach to Improve the Trust Factor of Operational Data Sivaprakasam S.R. Given the poor quality of data, Communication Service Providers (CSPs) face challenges of order fallout,

More information

Chapter 11 Mining Databases on the Web

Chapter 11 Mining Databases on the Web Chapter 11 Mining bases on the Web INTRODUCTION While Chapters 9 and 10 provided an overview of Web data mining, this chapter discusses aspects of mining the databases on the Web. Essentially, we use the

More information

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR Andrey V.Lyamin, State University of IT, Mechanics and Optics St. Petersburg, Russia Oleg E.Vashenkov, State University of IT, Mechanics and Optics, St.Petersburg,

More information

COCOVILA Compiler-Compiler for Visual Languages

COCOVILA Compiler-Compiler for Visual Languages LDTA 2005 Preliminary Version COCOVILA Compiler-Compiler for Visual Languages Pavel Grigorenko, Ando Saabas and Enn Tyugu 1 Institute of Cybernetics, Tallinn University of Technology Akadeemia tee 21 12618

More information

Ontology and automatic code generation on modeling and simulation

Ontology and automatic code generation on modeling and simulation Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Modeling Turnpike: a Model-Driven Framework for Domain-Specific Software Development *

Modeling Turnpike: a Model-Driven Framework for Domain-Specific Software Development * for Domain-Specific Software Development * Hiroshi Wada Advisor: Junichi Suzuki Department of Computer Science University of Massachusetts, Boston hiroshi_wada@otij.org and jxs@cs.umb.edu Abstract. This

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

Database Migration- How hard can it be?

Database Migration- How hard can it be? 2012 2 nd International Conference on Information Communication and Management (ICICM 2012) IPCSIT vol. 55 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V55.12 Database Migration- How

More information

A Mechanism for VHDL Source Protection

A Mechanism for VHDL Source Protection A Mechanism for VHDL Source Protection 1 Overview The intent of this specification is to define the VHDL source protection mechanism. It defines the rules to encrypt the VHDL source. It also defines the

More information

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web Corey A Harper University of Oregon Libraries Tel: +1 541 346 1854 Fax:+1 541 346 3485 charper@uoregon.edu

More information

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence Augmented Search for IT Data Analytics New frontier in big log data analysis and application intelligence Business white paper May 2015 IT data is a general name to log data, IT metrics, application data,

More information

The International Journal of Digital Curation Volume 7, Issue 1 2012

The International Journal of Digital Curation Volume 7, Issue 1 2012 doi:10.2218/ijdc.v7i1.217 Grammar-Based Specification 95 Grammar-Based Specification and Parsing of Binary File Formats William Underwood, Principal Research Scientist, Georgia Tech Research Institute

More information

Programming Languages

Programming Languages Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

Middleware support for the Internet of Things

Middleware support for the Internet of Things Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,

More information

UNIVERSITY OF MALTA THE MATRICULATION CERTIFICATE EXAMINATION ADVANCED LEVEL COMPUTING. May 2011

UNIVERSITY OF MALTA THE MATRICULATION CERTIFICATE EXAMINATION ADVANCED LEVEL COMPUTING. May 2011 UNIVERSITY OF MALTA THE MATRICULATION CERTIFICATE EXAMINATION ADVANCED LEVEL COMPUTING May 2011 EXAMINERS REPORT MATRICULATION AND SECONDARY EDUCATION CERTIFICATE EXAMINATIONS BOARD AM Computing May 2011

More information

INTRUSION PROTECTION AGAINST SQL INJECTION ATTACKS USING REVERSE PROXY

INTRUSION PROTECTION AGAINST SQL INJECTION ATTACKS USING REVERSE PROXY INTRUSION PROTECTION AGAINST SQL INJECTION ATTACKS USING REVERSE PROXY Asst.Prof. S.N.Wandre Computer Engg. Dept. SIT,Lonavala University of Pune, snw.sit@sinhgad.edu Gitanjali Dabhade Monika Ghodake Gayatri

More information

A Standards-Based Approach to Extracting Business Rules

A Standards-Based Approach to Extracting Business Rules A Standards-Based Approach to Extracting Business Rules Ira Baxter Semantic Designs, Inc. Stan Hendryx Hendryx & Associates 1 Who are the presenters? Semantic Designs Automated Analysis and Enhancement

More information

From Business World to Software World: Deriving Class Diagrams from Business Process Models

From Business World to Software World: Deriving Class Diagrams from Business Process Models From Business World to Software World: Deriving Class Diagrams from Business Process Models WARARAT RUNGWORAWUT 1 AND TWITTIE SENIVONGSE 2 Department of Computer Engineering, Chulalongkorn University 254

More information

TZWorks Windows Event Log Viewer (evtx_view) Users Guide

TZWorks Windows Event Log Viewer (evtx_view) Users Guide TZWorks Windows Event Log Viewer (evtx_view) Users Guide Abstract evtx_view is a standalone, GUI tool used to extract and parse Event Logs and display their internals. The tool allows one to export all

More information

MD Link Integration. 2013 2015 MDI Solutions Limited

MD Link Integration. 2013 2015 MDI Solutions Limited MD Link Integration 2013 2015 MDI Solutions Limited Table of Contents THE MD LINK INTEGRATION STRATEGY...3 JAVA TECHNOLOGY FOR PORTABILITY, COMPATIBILITY AND SECURITY...3 LEVERAGE XML TECHNOLOGY FOR INDUSTRY

More information

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 3 rd, 2013 An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration Srinivasan Shanmugam and

More information

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

Application of Data Mining Techniques in Intrusion Detection

Application of Data Mining Techniques in Intrusion Detection Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology leiminxuan@sohu.com Abstract: The article introduced the importance of intrusion detection, as well as

More information

How To Create A Data Transformation And Data Visualization Tool In Java (Xslt) (Programming) (Data Visualization) (Business Process) (Code) (Powerpoint) (Scripting) (Xsv) (Mapper) (

How To Create A Data Transformation And Data Visualization Tool In Java (Xslt) (Programming) (Data Visualization) (Business Process) (Code) (Powerpoint) (Scripting) (Xsv) (Mapper) ( A Generic, Light Weight, Pluggable Data Transformation and Visualization Tool for XML to XML Transformation Rahil A. Khera 1, P. S. Game 2 1,2 Pune Institute of Computer Technology, Affiliated to SPPU,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.

More information

Test Data Management Concepts

Test Data Management Concepts Test Data Management Concepts BIZDATAX IS AN EKOBIT BRAND Executive Summary Test Data Management (TDM), as a part of the quality assurance (QA) process is more than ever in the focus among IT organizations

More information

Protecting Business Information With A SharePoint Data Governance Model. TITUS White Paper

Protecting Business Information With A SharePoint Data Governance Model. TITUS White Paper Protecting Business Information With A SharePoint Data Governance Model TITUS White Paper Information in this document is subject to change without notice. Complying with all applicable copyright laws

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Improving the visualisation of statistics: The use of SDMX as input for dynamic charts on the ECB website

Improving the visualisation of statistics: The use of SDMX as input for dynamic charts on the ECB website Improving the visualisation of statistics: The use of SDMX as input for dynamic charts on the ECB website Xavier Sosnovsky, Gérard Salou European Central Bank Abstract The ECB has introduced a number of

More information

INTRUSION DETECTION ALARM CORRELATION: A SURVEY

INTRUSION DETECTION ALARM CORRELATION: A SURVEY INTRUSION DETECTION ALARM CORRELATION: A SURVEY Urko Zurutuza, Roberto Uribeetxeberria Computer Science Department, Mondragon University Mondragon, Gipuzkoa, (Spain) {uzurutuza,ruribeetxeberria}@eps.mondragon.edu

More information

Lumousoft Visual Programming Language and its IDE

Lumousoft Visual Programming Language and its IDE Lumousoft Visual Programming Language and its IDE Xianliang Lu Lumousoft Inc. Waterloo Ontario Canada Abstract - This paper presents a new high-level graphical programming language and its IDE (Integration

More information

Transparency and Efficiency in Grid Computing for Big Data

Transparency and Efficiency in Grid Computing for Big Data Transparency and Efficiency in Grid Computing for Big Data Paul L. Bergstein Dept. of Computer and Information Science University of Massachusetts Dartmouth Dartmouth, MA pbergstein@umassd.edu Abstract

More information

Scalable Extraction, Aggregation, and Response to Network Intelligence

Scalable Extraction, Aggregation, and Response to Network Intelligence Scalable Extraction, Aggregation, and Response to Network Intelligence Agenda Explain the two major limitations of using Netflow for Network Monitoring Scalability and Visibility How to resolve these issues

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

On the general structure of ontologies of instructional models

On the general structure of ontologies of instructional models On the general structure of ontologies of instructional models Miguel-Angel Sicilia Information Engineering Research Unit Computer Science Dept., University of Alcalá Ctra. Barcelona km. 33.6 28871 Alcalá

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

Automated Medical Citation Records Creation for Web-Based On-Line Journals

Automated Medical Citation Records Creation for Web-Based On-Line Journals Automated Medical Citation Records Creation for Web-Based On-Line Journals Daniel X. Le, Loc Q. Tran, Joseph Chow Jongwoo Kim, Susan E. Hauser, Chan W. Moon, George R. Thoma National Library of Medicine,

More information

Connections to External File Sources

Connections to External File Sources Connections to External File Sources By using connections to external sources you can significantly speed up the process of getting up and running with M-Files and importing existing data. For instance,

More information

Web Forensic Evidence of SQL Injection Analysis

Web Forensic Evidence of SQL Injection Analysis International Journal of Science and Engineering Vol.5 No.1(2015):157-162 157 Web Forensic Evidence of SQL Injection Analysis 針 對 SQL Injection 攻 擊 鑑 識 之 分 析 Chinyang Henry Tseng 1 National Taipei University

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Hadoop Technology for Flow Analysis of the Internet Traffic

Hadoop Technology for Flow Analysis of the Internet Traffic Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet

More information

Recovering Business Rules from Legacy Source Code for System Modernization

Recovering Business Rules from Legacy Source Code for System Modernization Recovering Business Rules from Legacy Source Code for System Modernization Erik Putrycz, Ph.D. Anatol W. Kark Software Engineering Group National Research Council, Canada Introduction Legacy software 000009*

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

BASI DI DATI II 2 modulo Parte II: XML e namespaces. Prof. Riccardo Torlone Università Roma Tre

BASI DI DATI II 2 modulo Parte II: XML e namespaces. Prof. Riccardo Torlone Università Roma Tre BASI DI DATI II 2 modulo Parte II: XML e namespaces Prof. Riccardo Torlone Università Roma Tre Outline What is XML, in particular in relation to HTML The XML data model and its textual representation The

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

ML for the Working Programmer

ML for the Working Programmer ML for the Working Programmer 2nd edition Lawrence C. Paulson University of Cambridge CAMBRIDGE UNIVERSITY PRESS CONTENTS Preface to the Second Edition Preface xiii xv 1 Standard ML 1 Functional Programming

More information

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks

More information

DLDB: Extending Relational Databases to Support Semantic Web Queries

DLDB: Extending Relational Databases to Support Semantic Web Queries DLDB: Extending Relational Databases to Support Semantic Web Queries Zhengxiang Pan (Lehigh University, USA zhp2@cse.lehigh.edu) Jeff Heflin (Lehigh University, USA heflin@cse.lehigh.edu) Abstract: We

More information

Preservation Handbook

Preservation Handbook Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7

More information

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference PTK Forensics Dario Forte, Founder and Ceo DFLabs The Sleuth Kit and Open Source Digital Forensics Conference What PTK is about PTK forensics is a computer forensic framework based on command line tools

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Tool Support for Model Checking of Web application designs *

Tool Support for Model Checking of Web application designs * Tool Support for Model Checking of Web application designs * Marco Brambilla 1, Jordi Cabot 2 and Nathalie Moreno 3 1 Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci,

More information

Data Integration Hub for a Hybrid Paper Search

Data Integration Hub for a Hybrid Paper Search Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu,

More information

Information Visualization of Attributed Relational Data

Information Visualization of Attributed Relational Data Information Visualization of Attributed Relational Data Mao Lin Huang Department of Computer Systems Faculty of Information Technology University of Technology, Sydney PO Box 123 Broadway, NSW 2007 Australia

More information

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala

More information

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

BPCMont: Business Process Change Management Ontology

BPCMont: Business Process Change Management Ontology BPCMont: Business Process Change Management Ontology Muhammad Fahad DISP Lab (http://www.disp-lab.fr/), Université Lumiere Lyon 2, France muhammad.fahad@univ-lyon2.fr Abstract Change management for evolving

More information