Scope. Cognescent SBI Semantic Business Intelligence

Similar documents
Oracle Service Bus Examples and Tutorials

THE SEMANTIC WEB AND IT`S APPLICATIONS

LDIF - Linked Data Integration Framework

Client Overview. Engagement Situation. Key Requirements

Software Architecture Document

TopBraid Insight for Life Sciences

Live Model Pointers A requirement for future model repositories

Semantic Interoperability

D5.3.2b Automatic Rigorous Testing Components

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

An Ontology Model for Organizing Information Resources Sharing on Personal Web

BMC Software Inc. Technical Disclosure Publication Document Application Integration Manager (AIM) Author. Vincent J. Kowalski.

Exam Name: IBM WebSphere Process Server V6.2,

Linked Open Data A Way to Extract Knowledge from Global Datastores

Service Oriented Architecture

Developer Tutorial Version 1. 0 February 2015

Business-Driven Software Engineering Lecture 3 Foundations of Processes

Java Metadata Interface and Data Warehousing

Information Management Metamodel

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Usage of Business Process Choreography

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Extending SOA Infrastructure for Semantic Interoperability

Visualization of Semantic Windows with SciDB Integration

Data Mining Governance for Service Oriented Architecture

Model-Driven Data Warehousing

EUR-Lex 2012 Data Extraction using Web Services

Publishing Linked Data Requires More than Just Using a Tool

Students who successfully complete the Health Science Informatics major will be able to:

Integrating VoltDB with Hadoop

Metadata Management for Data Warehouse Projects

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Principles and Foundations of Web Services: An Holistic View (Technologies, Business Drivers, Models, Architectures and Standards)

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Filtering the Web to Feed Data Warehouses

A generic approach for data integration using RDF, OWL and XML

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Leveraging existing Web frameworks for a SIOC explorer to browse online social communities

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Oracle BI 11g R1: Build Repositories

Aiding the Data Integration in Medicinal Settings by Means of Semantic Technologies

San Jose State University

How To Build A Connector On A Website (For A Nonprogrammer)

Managing Third Party Databases and Building Your Data Warehouse

Developers Integration Lab (DIL) System Architecture, Version 1.0

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Lightweight Data Integration using the WebComposition Data Grid Service

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

CHAPTER 4: BUSINESS ANALYTICS

Structured Content: the Key to Agile. Web Experience Management. Introduction

Semantic Search in Portals using Ontologies

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

Models and Architecture for Smart Data Management

Introduction to XML Applications

Sage CRM Connector Tool White Paper

Cloud Storage Standards Overview and Research Ideas Brainstorm

Epimorphics Linked Data Publishing Platform

Dynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

Enabling REST Services with SAP PI. Michael Le Peter Ha

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Open Data Integration Using SPARQL and SPIN

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

MarkLogic Server. Reference Application Architecture Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

DISCOVERING RESUME INFORMATION USING LINKED DATA

CHAPTER 5: BUSINESS ANALYTICS

Middleware support for the Internet of Things

TopBraid Life Sciences Insight

A Comparison of Service-oriented, Resource-oriented, and Object-oriented Architecture Styles

ON DEMAND ACCESS TO BIG DATA THROUGH SEMANTIC TECHNOLOGIES. Peter Haase fluid Operations AG

Intelligent interoperable application for employment exchange system using ontology

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

SwiftScale: Technical Approach Document

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Software Architecture Document

Qlik REST Connector Installation and User Guide

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

Team: May15-17 Advisor: Dr. Mitra. Lighthouse Project Plan Client: Workiva Version 2.1

Policy Driven Practices for SOA

Introduction to Service Oriented Architectures (SOA)

The Ontology and Architecture for an Academic Social Network

We have big data, but we need big knowledge

Collaborative Open Market to Place Objects at your Service

Course Syllabus For Operations Management. Management Information Systems

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN INFORMATION AND COMMUNICATION TECHNOLOGY

Semantic Description of Distributed Business Processes

XBRL Processor Interstage XWand and Its Application Programs

Contracts for Services: Needs and Nonsense!

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Semantic Method of Conflation. International Semantic Web Conference Terra Cognita Workshop Oct 26, Jim Ressler, Veleria Boaten, Eric Freese

Business Rule Standards -- Interoperability and Portability

Redefining Static Analysis A Standards Approach. Mike Oara CTO, Hatha Systems

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Oracle Application Development Framework Overview

Best Practices for Hadoop Data Analysis with Tableau

Vertical Integration of Enterprise Industrial Systems Utilizing Web Services

ORCA-Registry v1.0 Documentation

Transcription:

Cognescent SBI Semantic Business Intelligence Scope...1 Conceptual Diagram...2 Datasources...3 Core Concepts...3 Resources...3 Occurrence (SPO)...4 Links...4 Statements...4 Rules...4 Types...4 Mappings...5 Templates...5 Transforms...5 Architecture...5 Registry (Resources)...5 Index (Mappings)...6 Naming (Templates)...6 Client...6 Service...6 Application...6 Inference...6 Formal Concept Analysis (FCA)...7 Semiotic Inference...7 Machine Learning...7 Basic syllogistic inference algorithm...7 Application Runtime...8 API / REST Services...8 Request lifecycle...8 Queries, queues, topics and names...8 Request output...8 Application Implementation...9 Use case interaction...9 References:...9 Scope Basically, the scope of the project is to provide a product which offers diverse facilities for analysts and developers building knowledge oriented applications. With 'knowledge oriented' we mean any application which could benefit from integration and normalization of loosely coupled data and merging of diverse sources of data Our goal is to achieve this without any previous knowledge of the sources of data, information or knowledge trying to be captured. The system should take care of discovering the identity and the relationships between the entities provided. Also, for such a goal to be met, application designers should not be aware of any specific APIs beyond what are provided in the system. For this to be accomplished an arrangement of concepts / classes is made and is

described below. Algorithmic methods and reasoning are used to perform the inferences needed for being able to describe an application domain or a set of use cases in the way of a unified set of services. Knowledge Description and Discovery (KDD) is used in such a way that analytical, dimensional or rule set evaluation is possible. The approach taken for trying to make this possible is to use the service (Application described below) as a request response box. Request may be queries or data. Responses may be query results or posted data augmented with the service knowledge. Queries are indexed as knowledge. Over this scheme are built the patterns needed for building stateful web applications. This document only has brief descriptions about the main components of the services. Further information will be found in the project repositories. Conceptual Diagram A conceptual diagram in a pseudo UML notation is depicted below. It contains the main concepts discussed in this document and the relationships between them.

Datasources RDF / Semantic Web. No need of schema information. LOD (Linked Open Data) via NER (Named Entity Resolution) engine. Relational (via d2rq). OLAP (via olap4j connector). File formats (XLS, CSV, via Apache MetaModel). XML (Via Apache MetaModel). Activation: any connector / content type that could possibly be built that outputs RDF: Plain RDF, Relational, OLAP, CSV, Spreadsheets and any possible connectors (JSON for example) as sources of structured data are candidates and some of them have been tested using as an intermediate layer producing XML RDF. Core Concepts This are the main concepts or 'classes' in the system. Resources A Resource is an instance of an URI (sourceuri) coming from input RDF. Resources are singletons in respect that there is only one Resource for a sourceuri coming from the input RDF. Any Subject, Predicate, Object of a Link is an instance of an Occurrence which means 'occurrence' of a Resource in that context. A Resource can have many Occurrences. Beyond the source URI of the RDF statement which gave birth to the Resource (sourceuri) the Resource has the following metadata: sourceuri: This is the URI of the Resource as is in the source RDF feed to the application, without merging or normalizing. classid: This is a numeric ID assigned to all Resources at creation time coming from a prime number sequence unique to all Resources. Kind of primary key. instanceid: This is a numeric identifier which gets 'augmented' with each Resource Occurrence as explained below. Useful for algorithmic inference. normalizeduri: Once this Resource is merged/aligned, it gets a new URI which is network retrieveable (by the a service endpoint). mergedclassids, mergeduris: Upon merging/alignment this is where holds previous Resources history. predicatesfrom: Predicates in Links where this Resource is the Subject.

predicatesto: Predicates in Links where this Resource is the Object. Occurrence (SPO) When a Resource occurs into a Link, its instanceid gets augmented by the following rules. A Resource can play an Occurrence as a: Subject: Its instanceid is updated by the product of it's instanceid by the instanceid of the Predicate of the Link. Predicate: Its instanceid is updated by the product of it's instanceid by the instanceid of the Subject and the instanceid of the Object of the Link. Object: Its instanceid is updated by the product of it's by the instanceid of the Predicate of the Link. This 'instanceid' makes a Resource 'remembers' where it has occurred and is useful for merging / algebraic inference. The first time a Resource appears its instanceid is its classid. Links A link is the representation of a statement triple coming from an RDF source. Link instantiation performs Occurrence (SPO) IDs 'augmentation' and generates metadata for, for example, updating Occurrence Types and the Resource type registry. Also updates knowledge of predicates coming in an out of a Resource (example: being the Resource the Subject or the Object of the Link). Statements A set of (three) Links form a Statement in the propositional sense. The statements could be meaningful and they could take part in a (syllogistic) inference chain where any one of the three Links could be inferred from the other two. The section on inference explains better this method. Rules Two Statements conclusions may lead to a Rule inference. Using the Links statements and the Statement metadata relationships could be drawn in respect of one being conclusion or cause of another. The inference section on machine learning could be the most appropriate place to take this subject further. Types Roughly, Type Inference can be performed regarding which Predicates comes out from a set of Subjects. If a set of Subjects share the same set of Predicates the resulting set could be regarded as an 'inferred' same type. As an example, if I have a set of entities having properties 'price', 'brand', 'discount' I could aggregate the type 'Product'.

Mappings Mappings are wrappers around Resources designed to extract the most metadata possible from them. There is a correspondence between a Resource and a Mapping but there could be many Resources matching a given Mapping. This is used in the indexing facility mentioned below where it's used for similarity matching. The first thing a Mapping retrieve from a Resource are its 'roles' (model or schema, predicatesto). A Resource for a given person can have many roles: 'friend', 'employee'. Any of this roles comprises a set of predicates for which exist a 'metaclass' which is the instance of the Resource in a given role with its predicates (salary for employee). The Mapping takes care about this arrangement given its 'state' (context). All this is performed using the Resource metadata, its predicates (from and to) and its instanceids. For example for considering existing 'employee' role there must exist an statement like: 'employer' 'employs' 'resource'. And, for this to work in every case, inverse relationships must be made and taken into account so every resource has at least one role. Then, predicatefrom(s) Resource properties are used to determine metaclass predicates according the role being evaluated. Templates Templates are a set of statements (instances) for a given set of classes which have a given set of properties. They are built mainly from Mappings metadata and for presentation purposes. Imagine a set of results from a query (tabular) or a navigable graph (Template properties are navigable) of entities which must be presented to the user. Content type handling (serialization/de-serialization) must be taken into account when building applications using Templates. Transforms A transformation is, basically, the Mapping of another Mapping. In other words, nested mappings. The result is what both mappings have in common. For example, the source mapping for a 'Theater' in respect to a Mapping for 'Buildings' should contain the roles and metaclasses (see Mappings) 'shared' with both of them (building.address = theater.address, building.cantfloors = theater.cantfloors) Architecture The application design is a REST based request/response set of services which beyond returning a response publish invocation results to 'topics' and 'queues' for further processing. Registry (Resources) The Registry Service takes care of Resource mappings and resolution. It also takes care of consume / fetch RDF statements to process, normalize statements subjects, predicates, objects (add new source URI Resources to Registry), creates Links and Predicate metadata.

Index (Mappings) Apache Lucene backend index. Indexes Mappings metadata. Performs discovery, ID resolution. Performs NER (Named Entity Resolution). Resolves query results. Resolves Mapping keys: inference could be done in which I could possibly assert that two predicates (properties) have the same meaning using learning from the example Mappings at hand and infer they are the same keys (building.address = edificio.direccion). Naming (Templates) URI Resolution services (for normalized Resources). Topics, queues. Publish query results. Client Entry point. Main application. Node environment. Service The main application REST Services follows: Where [topicname] is specified, request resolves to that topic. Queues and topics are described below in the section 'Application Runtime'. Main RDF: /topic/[topicname] *. * stands for all topics. Indexes request body (RDF). Response dumps (topicname) RDF indexed / augmented. Publish results to [topicname] Useful for SPARQL Endpoints. REST HATEOAS RDF: /dialog/topic/[topicname] *. * stands for all topics. Stateful. Dialog. Navigation state. Referrer. Previous posts kept in query context. Send modified response: update / prompt. DOM: JAXB/JSON Output: /dom/topic/[topicname] *. tordf method in response object for CRUD. JAXB Bindings can be built from outputs from the other services. DOM: OData JS Output: Apache ODataJS. /odata/topic/[topicname] *. tordf method in response object for CRUD. DOM stands for Dynamic Object Model, an internal object representation of the statement graph with classes, objects and properties. Application level API methods are described below. Application Application level API methods. getform, getstate (see below). Inference These are a couple of inference mechanisms over Resources and over Mappings.

Formal Concept Analysis (FCA) This section is regarding ontology merge and data sources alignment over Resources. Given two RDF graphs two FCA relations could be built as having each of them in the object side the subjects of the two graph's statements and in the properties side the objects of the two graph's statements then having relation A and relation B from each graph. The graphs could be built by using graphs properties as axis and Subjects as objects and Objects as properties. If there exist in relation A an situation such as this object, property pairs apply: (a, 1); (a, 2); (a, 3) and (b, 2); (b, 3) And in relation B exists: (x, 4); (x, 5); (x, 6) and (y, 5); (y, 6) There is a probability of a being similar to x and b being similar to y. Semiotic Inference Inference layer over Resources: Regarding semiotic triples (sign, concept, object) and mapping them to (Occurrence, Type, Resource) semiotic mappings may be applied to obtain equivalent resources in the way: sign(concept) objects sign(object) concepts concept(sign) objects concept(object) sign object(sign) concepts object(concept) signs Machine Learning Proper arrangement of training data with framework data / metadata enables inference of facts for identity resolution and link discovery. Links Statements and Rules are a source for this training data. A proper format for serializing and discover results of learning should be done. Basic syllogistic inference algorithm Being Links the representation of syllogistic statements of the kind: (Socrates, kind, Human) (Human, life, Mortal) (Socrates, kind x life, Mortal) * kind x life = relation product, its ID gets calculated as predicate IDs into a Link triple. And being this labels for the parts of the preceding reasoning: From = (a, rel1, b); Rel = (b, rel2, c); Dest = (a, rel1 x rel2, c);

Then, being '1' the identity relationship, a equals b, we can infer: From (a, rel1, b) = (b, rel1 x rel2, c) x (b, 1, b) / (b, rel2, c); Rel (b, rel2, c) = (a, rel1 x rel2, c) x (b, 1, b) / (a, rel2, b); Dest (a, rel1 x rel2, c) = (a, rel1, b) x (b, rel2, c) / (b, 1, b); And the Identity resolution formula (a equals c, knowing b): (a, 1, c) = (a, rel1, b) x (b, rel2, c) / (b, rel1 x rel2, b); Application Runtime The application mainly is a REST service listening to the mentioned endpoints. It also can be distributed in a configuration where queues are bound to nodes listening for incoming statements for distribution or federation of knowledge. API / REST Services Application level API services are a set of services enabling MVC/DCI object oriented design patterns to be used when developing against the model. The main services where already mentioned. Application design patterns are in development process. Request lifecycle Typically a client would like to index a set of statements, or retrieve another set of statements according a given query. This two cases are basically the same in this model given that indexes are performed in every input in each request. If the client submitted a set of statements to index them, the resulting output may be an 'augmented' version of the statements posted. And where a client submits statements to perform a query, it 'augments' the index knowledge trying to use similarity to return some meaningful response. Queries, queues, topics and names Publishing to a topic is optional. A default request would return its output (see below) the same way it is made available to the topic. Topics and queues are a facility provided for inter process interaction and application services integration. Request output Queues and topics are basically sets of statements. When publishing to a queue, topic, a Template and its metadata is used as to be able to post the most knowledge available into the given index results (Mappings). Joins between Mappings are made (see Mapping keys) and the Template is augmented.

Application Implementation The application building upon this model intends to be clean and concise. In a few words, a client requests a form from the application and the application returns the corresponding form in respect to the state the client is in. The client performs its operation in this form and asks the server the next state regarding the form manipulation it has performed. This next state is further manipulated by the client which uses it for asking for the next form. Use case interaction The development of any application is use case driven following the well known MVC and DCI design patterns. A protocol of view-state exchange is followed for enabling developers build applications in a conversational state manner (HATEOAS REST design pattern). Utility methods are provided and a view/state serialization formats so building frontend frameworks enabled to use knowledge facilities is straightforward. getform(state : State) : Form getstate(form: Form) : State Libraries in any web / scripting language could be build upon this concepts for seamless user driven user interaction. A functional programming style library could be the best bet for working with resources in this way. It is also possible to use this API and its utility methods (not mentioned here) for building services and service enabled interactions with knowledge discovery and service integration facilities. References: MVC: https://en.wikipedia.org/wiki/model view controller DCI: https://en.wikipedia.org/wiki/data,_context_and_interaction