TopBraid Insight for Life Sciences



Similar documents
TopBraid Life Sciences Insight

An industry perspective on deployed semantic interoperability solutions

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Business Intelligence for Everyone

How to Enhance Traditional BI Architecture to Leverage Big Data

Data warehouse and Business Intelligence Collateral

SimCorp Solution Guide

JOURNAL OF OBJECT TECHNOLOGY

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Analance Data Integration Technical Whitepaper

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Make the right decisions with Distribution Intelligence

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Data virtualization: Delivering on-demand access to information throughout the enterprise

MDM and Data Warehousing Complement Each Other

Implementing Oracle BI Applications during an ERP Upgrade

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Data Virtualization: Achieve Better Business Outcomes, Faster

A business intelligence agenda for midsize organizations: Six strategies for success

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Analance Data Integration Technical Whitepaper

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Implementing Oracle BI Applications during an ERP Upgrade

Data Integration Checklist

Extend the value of your service desk and integrate ITIL processes with IBM Tivoli Change and Configuration Management Database.

High-Volume Data Warehousing in Centerprise. Product Datasheet

Data Modeling for Big Data

Creating a Business Intelligence Competency Center to Accelerate Healthcare Performance Improvement

The IBM Cognos Platform

Gradient An EII Solution From Infosys

Data Virtualization A Potential Antidote for Big Data Growing Pains

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7

SQL Server 2012 Business Intelligence Boot Camp

Data Warehousing Systems: Foundations and Architectures

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

QlikView Business Discovery Platform. Algol Consulting Srl

Move Data from Oracle to Hadoop and Gain New Business Insights

Integrating SAP and non-sap data for comprehensive Business Intelligence

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Databases in Organizations

Sterling Business Intelligence

MarkLogic Enterprise Data Layer

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002

Get More Value from Your Reference Data Make it Meaningful with TopBraid RDM

A Service-oriented Architecture for Business Intelligence

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center

Data Warehousing and OLAP Technology for Knowledge Discovery

Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

POLAR IT SERVICES. Business Intelligence Project Methodology

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

CHAPTER-6 DATA WAREHOUSE

IST722 Data Warehousing

Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software

Architected Blended Big Data with Pentaho

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Business Intelligence Solutions: Data Warehouse versus Live Data Reporting

Preferred Strategies: Business Intelligence for JD Edwards

Establishing a business performance management ecosystem.

Reverse Engineering in Data Integration Software

Management Accountants and IT Professionals providing Better Information = BI = Business Intelligence. Peter Simons peter.simons@cimaglobal.

Database Marketing, Business Intelligence and Knowledge Discovery

Business Intelligence

Week 3 lecture slides

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Deriving Business Intelligence from Unstructured Data

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

White Paper. Software Development Best Practices: Enterprise Code Portal

Key Benefits of Microsoft Visual Studio Team System

Large Telecommunications Company Gains Full Customer View, Boosts Monthly Revenue, Cuts IT Costs by $3 Million

A technical paper for Microsoft Dynamics AX users

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

Create Mobile, Compelling Dashboards with Trusted Business Warehouse Data

Oracle Warehouse Builder 10g

Microsoft Dynamics AX 2012 A New Generation in ERP

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Transcription:

TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information. In accessing, searching and using information, you may have needs or goals such as the following: increasing the efficiency of the target and lead drug discovery processes providing knowledgeable researchers, physicians and payers with credible information correlating data from clinical trials across multiple studies ensuring the right treatment to the right person at the right price The quality of decisions that you and your customers will be making directly relates to the accuracy, completeness, timeliness and correctness of information. Sourcing, aligning and generating insight from aggregated information, at a given time and in a given context, are the key challenges. Figure 1: TopBraid Insight for Life Sciences presents data from multiple datasets as if it were within a single data warehouse. TopBraid Insight for Life Sciences (TopBraid Insight-LS) provides an out of the box Logical Data Warehouse that enables an agile, extensible approach to querying data from diverse data sources using a high performance open architecture that has proven enterprise scalability. TopBraid LSI decreases the time and cost for clinical trials and drug discovery by enabling diverse databases to appear as a single logical data space. TopBraid Insight-LS includes an upper ontology specific to Life Sciences and configuration of popular publicly available data sources such as CHEBI, CHEMBL SIDER, LinkedCT, DRUGBank and more. Customers internal data sources and public data sources not within the pre-configured set are easily configured into the solution. TopBraid Insight-LS uses strategic technology standards to future-proof customers investments while complementing existing infrastructure through an open architecture.

The Data Integration Challenge Diverse data needs federated queries The need to integrate data from multiple silos of information has been a problem plaguing businesses ever since databases first appeared. Commonly, a query that needs to access multiple sources must be mapped to a series of requests to distributed data sources. Brokering these queries, optimizing their execution, and aggregating their results are time-consuming and expensive tasks. Many different approaches to integrating access to diverse data sources have been attempted. All are complex. Often, integration is provided by applications that can talk to one of several data sources, depending on the user s request. Hard-wiring of data sources and data alignment results in poor maintainability In these systems, the data sources are typically hardwired ; replacing one data source with another means rewriting a portion of the application. In addition, data from different sources cannot be compared in response to a single request unless the comparison is likewise wired into the application. Moving all relevant data to a warehouse allows greater flexibility in retrieving and comparing data, but at the cost of re-implementing or losing the specialized functions of the original source, as well as the cost of maintenance. Furthermore, each data warehouse is designed to answer predefined queries. Modifying or extending it to answer new questions is expensive and time-consuming. Connecting data from multiple sources is a hard problem Yet another approach to accessing data from multiple sources is to create a homogeneous object layer to encapsulate diverse sources. This encapsulation makes applications easier to write, and more extensible, but does not solve the problem of meaningfully connecting data from multiple sources. Pros and Cons of Current Approaches 1: Build a Megaapplication Database Avoid the data-silo problem entirely by adopting a single vendor for all application systems. One vendor is responsible for merging all data. Pros: In theory the integration problems go away. Cons: It is unlikely that one vendor can provide a single application with sufficient scope to incorporate all business information. Data silos still appear. 2: Use data warehouses to extract data from silos Extract data from data silos, with transformation to a composite schema, then load into a data mart. Pros: Query performance of the data within each data mart is good. OLAP tools are good at querying the data marts and performing data analysis. Cons: Inadequate composite schema. New concepts require their own data mart. New data marts require a specialist to create ETL for the source datasets. Loaded data is often aggregated, and access to original data is lost. Provenance of the data is also usually lost. 2

This encourages and often drives the creation of multiple copies of data. The advent of big data, with its large and growing data volumes, argues strongly against duplication of data. Since data is replicated it is never as up to date as the actual data. 3: Perform SQL distributed querying RDBMS vendors, and others, offer the concept of distributed querying. A common query engine resolves the query by visiting all connected databases. Pros: The federated query engine uses SQL, which is well supported. Since data is federated the overhead of replication is avoided. Cons: Federated query performance is often very poor as large quantities of data are fetched. The distributed query has to resolve the mismatch of schemas, resulting in complex federated queries. As more datasets are added, the distributed query performance is likely to degrade geometrically. A universal enterprise schema must be created often a lengthy process prior to the use of distributed queries. Once this is created, it is difficult to adapt to changing requirements or to add additional concepts introduced from new data sources. Advantages of a Logical Data Warehouse Approach TopBraid Insight-LS is an Integration Framework in the form of a Logical or Virtual Data Warehouse, an approach that is emerging in response to problems with warehousing and distributed querying. Instead of creating an actual data warehouse, TopBraid Insight-LS presents data from multiple datasets as if it were within a single data warehouse. Different virtual data Master Queries User Application Results The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. Each worker node processes the smaller problem and passes the answer back to its master node. warehouse models can be tailored to different users without duplication of data. The approach adopted by the TopBraid Insight-LS integration framework is similar to that of Map- Reduce. To avoid the distributed query problem, each request for data is divided into tasks (queries) that can be resolved by a single data source. The worker nodes may do this division of tasks again based on data that is returned leading to a multi-level tree structure. The data is then reassembled into the result set that was required by the original query. Since the data is federated, the overhead of replication is avoided. It is simple to incrementally add new datasets. Unlike a data mart, the consolidated schema can be easily changed or updated to reflect new concepts without the need to change any underlying datasets. Worker Nodes Map: A worker node executes an individual query against a single dataset. Reduce: The master node then collects the answers to all the sub-problems and combines them to form the result for output. According to Gartner 1 the Logical Data Warehouse (LDW) is a new data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategy. Dataset 1 Dataset 2 Dataset 3... Figure 2: Illustration of query execution using Map-Reduce type strategy over federated data Dataset n Query performance is excellent as the Map-Reduce approach only fetches the data from sources that are required. Expensive cross-database joins are 1 Gartner Hype Cycle for Information Infrastructure, 2012 - http://www.gartner.com/id=2101415 3

How Gartner re-defines the Enterprise Data Warehouse for the Era of Big Data 2 The new type of warehouse the Logical Data Warehouse (LDW) is an information management and access engine that takes an architectural approach which de-emphasizes repositories in favor of new guidelines: The LDW follows a semantic directive to orchestrate the consolidation and sharing of information assets, as opposed to one that focuses exclusively on storing integrated datasets. The semantics are described by governance rules from data creation and business use case processes in a data management layer, instead of via a negotiated, static transformation process located within individual tools or platforms. Integration leverages both steady-state data assets in repositories and services in a flexible, audited model via the best available optimization and comprehension solution available. avoided. Provenance of individual statements is retained. Since data need not be replicated there are no concurrency problems nor large data storage problems. The TopBraid Insight-LS solution, based on Semantic Web technologies, uses the power of RDF and SPARQL to unify data for querying and aggregation of results. Traditional data warehouses have limited dimensions and measures. In comparison, TopBraid Insight-LS has been proven to support more than 100 dimensions and measures. Using TopBraid Insight- LS allows users to drill down through dimensions and measures into other, more complex, data structures. Moreover, unlike traditional data warehousing, the approach used for TopBraid Insight-LS makes it possible to dynamically refactor the model with no further Extract, Transform and Load (ETL) operations. LINKSET exactmatch InChi-Key:YYSDNLZCJQMZCZ-UHFFFAOYSA-N 5,5-dimethyl-3-methylenedihydrofuran-2-one CHEBI Key: 8760 5,5-dimethyl-3-methylenedihydrofuran-2-one CHEMBL Figure 3: Illustration of a linkset that maps compounds from CHEBI to compounds from CHEMBL, showing just one example mapping within the linkset 2 Does the 21st-Century Big Data Warehouse Mean the End of the Enterprise Data Warehouse? -http://www.gartner.com/id=1775719

TopBraid Insight-LS is Easy to Configure and Use Configuring data sources for use in the logical data warehouse is simple and consists of the following steps: Dataset Ontology (VoID) 1: Configure a new data source A new data source is identified to the system by registering it. Federation is not limited to RDBMS datasets, but includes other formats such as XML, spreadsheets, Linked Open Data and web services. TopBraid Insight-LS examines the data source and auto-populates its schema. 2: Configure concepts within a data source and define how the data within it links to other sources Relevant concepts and their properties from the data source are mapped to existing concepts and properties in the upper or canonical ontology (socalled universal concepts). If they are new concepts or properties, you can easily add them to the upper ontology using a simple user interface. Additionally, properties to be used for keyword searching are identified. Since this mapping is virtual, it is easy to change it as requirements evolve, or even change the upper ontology universal concepts as the scope changes. These modifications do not require reextracting data and re-populating warehouses. This step also defines how to match data found in one data source with data in other data sources in the presence of different identifier keys. TBI accomplishes this using two complimentary approaches: LinkMaps and LinkSets. LinkMaps identify which data entities are the same by using functional schema-level mappings. LinkSets, on the other hand, contain explicit links between identifier keys in two data sources. This is useful in cases where a functional mapping is not possible or becomes too complex. It is also useful to hold exceptions to functional mappings described in LinkMaps. Since multiple datasets might share the same data items, mappings can grow geometrically. To avoid the geometric growth, TopBraid Insight dynamically creates a chain of links. As a result, if data source Figure 4: Linksets for multiple data sources A is linked to data source B and data source B is linked to data source C, data sources A and C become automatically linked without needing any additional configuration steps. This way, the number of either LinkMaps or LinkSets only grows linearly as new datasets and concepts are added. See Figure 4, illustrating that the brown dataset (lower left), though it is linked directly only to the light green dataset, is also connected to the dark green dataset at top (and other datasets) through a chain of links. 3: Create new or update existing ConnectSets A ConnectSet consists of a package of datasets and selected concepts and properties with their mappings. This way, the same artifacts can be used for various search and analysis purposes in different ConnectSets and ExploreSpaces. An ExploreSpace uses a ConnectSet to provide users with the ability to explore data of interest. ConnectSets can be thought of as templates used to create ExploreSpaces shared by one or more users to perform research and data analysis. As users interact with the ExploreSpace to search and browse data from the diverse sources, the space is dynamically populated with the retrieved results. At 3 VoID (from Vocabulary of Interlinked Datasets ) is an RDF based schema to describe linked datasets. For more information, see http://semanticweb.org/wiki/void 5

any given time, there may be multiple ExploreSpaces based on the same ConnectSet; each created to answer different questions and containing different slices of data. This way, TBI provides the ability to dynamically create datamarts focused on particular subsets of data. Retrieved data can be cleared after each exploration or stored in ExploreSpaces for as long as desired and refreshed on demand to reflect the latest changes in the data sources. Figure 5 illustrates how a user can create and access ExploreSpaces to retrieve, view, search and make use of diverse data. Users can interact with ExploreSpaces using a pre-built TBI search and browse application. Alternatively, any other data exploration and visualization front-ends can be seamlessly integrated through TopBraid Insight s data federation APIs. One of the available APIs is SPARQL. Figure 5: An illustration of the use of an ExploreSpace (Pain Relief Opiate Drugs) based on the ConnectSet (Pharma Sample). The data screens (Morphine) show some of the integrated data retrieved from the several data sources (Chebi, Chembl, Drugbank).

In Summary This white paper provides a high level overview of how TopBraid Life Sciences Insight delivers a Logical Data Warehouse: an agile, extensible approach to querying data from diverse Life Sciences sources in a high performance, open architecture integration framework with proven enterprise scalability. TopQuadrant has plans to offer additional domain-specific logical data warehouse solutions such as TopBraid Insight for Oil & Gas (TopBraid Insight-O&G). To find out more, contact us at sales@topquadrant.com. About TopQuadrant Create a semantic ecosystem, the next step in data evolution. TopQuadrant s standards-based solutions enable a semantic ecosystem among people, applications and data lowering the cost of ownership and enabling intelligent, data-driven action. Our TopBraid TM platform connects data by accessing, linking and combining internal and external data sources faster, better, more broadly and in a more future-proof and flexible way than conventional data integration products.

For more information visit www.topquadrant.com, or contact us at sales@topquadrant.com or by phone at +1 919 300 7945. 2014 TopQuadrant, Inc. All rights reserved. TopBraid Insight for Life Sciences and the TopQuadrant logo are trademarks of TopQuadrant Inc. in the U.S. All other trademarks are the property of their respective owners. Specifications subject to change without notice.