Information Architecture

Similar documents
IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data and Big Data Modeling

Luncheon Webinar Series May 13, 2013

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Cloud Computing and Advanced Relationship Analytics

How to Enhance Traditional BI Architecture to Leverage Big Data

Data Integration Checklist

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

How To Handle Big Data With A Data Scientist

Business Intelligence for Big Data

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data on Microsoft Platform

Azure Data Lake Analytics

Actian SQL in Hadoop Buyer s Guide

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data Processing: Past, Present and Future

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

ENABLING OPERATIONAL BI

Big Data Analytics Nokia

Oracle Big Data Building A Big Data Management System

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

The Future of Data Management

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Customized Report- Big Data

Big Data - Infrastructure Considerations

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Modernizing Your Data Warehouse for Hadoop

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Manifest for Big Data Pig, Hive & Jaql

Understanding the Value of In-Memory in the IT Landscape

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Parallel Data Warehouse

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Oracle Big Data SQL Technical Update

BIG DATA-AS-A-SERVICE

Virtualizing Apache Hadoop. June, 2012

Apache Hadoop: The Big Data Refinery

Enabling High performance Big Data platform with RDMA

In-Database Analytics

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Please give me your feedback

INTRODUCTION TO CASSANDRA

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Microsoft Analytics Platform System. Solution Brief

Next-Generation Cloud Analytics with Amazon Redshift

In-Memory Analytics for Big Data

Big Data at Cloud Scale

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Microsoft Big Data. Solution Brief

Hadoop Trends and Practical Use Cases. April 2014

What's New in SAS Data Management

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

White Paper. Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms

Informatica Version 10 Features and Advancements

Bringing Big Data into the Enterprise

Implement Hadoop jobs to extract business value from large and varied data sets

How To Scale Out Of A Nosql Database

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

BIG DATA TRENDS AND TECHNOLOGIES

Ali Ghodsi Head of PM and Engineering Databricks

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Introducing Oracle Exalytics In-Memory Machine

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Advanced In-Database Analytics

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

Integrating Netezza into your existing IT landscape

Testing Big data is one of the biggest

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

The Future of Business Analytics is Now! 2013 IBM Corporation

Big Data Management and Security

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

SQL Server 2012 Parallel Data Warehouse. Solution Brief

How To Make Data Streaming A Real Time Intelligence

Native Connectivity to Big Data Sources in MSTR 10

IBM BigInsights for Apache Hadoop

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

QlikView Business Discovery Platform. Algol Consulting Srl

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

WHAT IS A CLOUD DATABASE?

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Harnessing the power of advanced analytics with IBM Netezza

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Transcription:

The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER

The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to acquire the Ingres database business from CA, Actian has gradually built a portfolio of products, most of which squarely target the Big Data market. It includes several types of database products and a whole suite of data integration and analytics software, all of which culminate into the Actian Analytics Platform. The Actian Analytics Platform The Actian Analytics Platform is rich in database capability that can and surely will be used in Big Data projects and to build Big Data architectures. Its core functionalities can be briefly described as follows: It includes a very high performance column-store database that was built to deliver extreme scale-up performance on a single server. It is one of the few technologies that has been engineered to take maximum advantage of on-chip vector instructions of x86 chips. It has been proven in database implementations up to tens of terabytes. It currently benchmarks as the world s fastest query-oriented database by a wide margin. Rather than choosing to build a more conventional MPP version of its column-store database, Actian preferred to implement it over Hadoop s HDFS and name the product Hadoop SQL Edition. While this product is in its first release at the time of writing, it has, nevertheless, benchmarked as considerably faster (by multiples) on the TPC-DS benchmark. Currently it can scale to 30 Hadoop nodes, but this will likely increase in future releases. It offers a scale-out analytical database which can be deployed across hundreds of server nodes to process extremely large collections of data at and beyond the petabyte level. It has many built-in analytical functions in the engine and thus parallelizes both queries and analytical calculations. It offers an object database which is often deployed as a graph database for traversing data networks rather than tables. That type of workload would be its most likely role within a Big Data environment, although it could also be used as a document database. Data in Motion The Actian Analytics Platform includes a software building and execution capability that processes data in flight. It can be used to build data workflows where data is processed as it is piped from one source to another. In terms of Big Data architecture, it is a key feature for Actian as it complements Actian s variety of database products. The important aspects of this capability are: It processes data in parallel using both pipeline parallelism and data segmentation parallelism. As such, it is extremely fast, and when used with HDFS, it is far faster than Hadoop s native MapReduce framework. The underlying parallelization engine auto-configures to make optimal use of the available computer resources on which it is deployed. It is the fundamental technology that was used to build Actian s ETL and data cleansing products and thus is responsible for their speed. It comes with a series of connectors to databases and data stores. 1

For users and software developers, it provides a codeless drag-and-drop prototyping environment for building data workflows. It scales out across multiple server nodes, and it can span Hadoop and non-hadoop environments. It can also interface to data streams. In respect to analytics, it is directly integrated with the open source KNIME suite of machine learning software and can execute routines written in the R language. If one considers the broad field of business intelligence (BI) and data analytics, which will be the primary application area for Big Data, it is clear that many activities (data access, metadata capture, data cleansing, data transformation and organization prior to ingest into a database) are not database applications. They are, however, suitable applications for the workflow development and data processing features built into Actian s platform. Clearly the Actian Analytics Platform can also be used to carry out analytical processing and to query Hadoop directly (using SQL via Hive or, of course, Actian s own Hadoop SQL Edition). Thus, in many scenarios, the platform is an alternative as well as a complement to an analytical database. Actian and Big Data Architecture In our research paper entitled The Big Data Information Architecture (June 2014) we describe an event-driven architecture that we expect to supersede the traditional data warehouse architecture that has dominated the IT industry for almost two decades. The Actian Analytics Platform fits the described architecture very well. We illustrate this in Figure 1 on the following page, which depicts what we refer to in our research paper as a Data Refinery and Processing Hub. This Hub is responsible for both ingesting data into an organizations s data layer and providing a processing service that may involve data queries and analytical calculations on collections of data. The Data Hub is an arrangement of hardware and software that replaces the collection of ETL jobs, data staging areas, data warehouse and operational data stores that constitutes the traditional BI environment. Additionally, it exceeds the capability of the traditional BI environment in being able to handle data streams and unstructured data, as well as large data volumes. If we consider the Actian Analytics Platform from the database perspective, it is clearly wellequipped to provide a comprehensive database capability for the Data Hub. The platform s support for SQL workloads and its Hadoop SQL Edition for larger data volumes can deliver excellent performance, and its analytical database can handle analytical queries. Its objectbased database is equipped to store data in the form of connected graphs or documents and can process the associated workloads. The Actian Analytics Platform can be deployed to provide a continuous data flow service from Hadoop to any of Actian s data stores, including Hadoop SQL Edition. As the data hub gradually expands over time, the ETL capabilities can be maintained and augmented. Ideally, flows of data within the Data Hub will be managed so that a full data lineage is known, recorded and continually monitored. This is an activity in which the Actian Analytics Platform will be a critical component; it does not just flow data to where it is required, but it also keeps track of data location and data lineage. With multiple database engines, it may be desirable for reasons of physical performance to replicate some data within The Hub, as when, for example, it is required both within a traditional query database and a graph database. 2

Figure 1: Actian Analytics Platform Deployed in a Data Refinery and Processing Hub Just as the Actian Analytics Platform would be deployed for data flow within the Hub, it would also be used for data pulled from external data sources or received directly as data streams. Similarly, it will be used for data export from The Hub, directly from Hadoop or any database within The Hub to feed data marts and export data to other environments. By employing the Actian Analytics Platform in this manner, all data movements to, from and within The Hub can execute in parallel. A fundamental idea of The Data Hub is that, as far as possible, all SQL queries that run on corporate data would execute there. There may be pragmatic reasons for exporting data from The Hub to data marts to feed other databases (for example, supplying data to an IBM mainframe environment), but these would be minimized. Because The Data Hub is built to be a fully scalable environment, as workloads grow, more commodity servers are configured into the environment to handle the expanding demand. BI and analytics applications that simply wish to access data would do so directly, connecting to one or another of the databases within The Hub to launch SQL queries, or possibly, directly harvesting the data. The Actian Analytics Platform can also play the role of an analytics engine, either by employing the KNIME suite of machine learning algorithms or by directly using analytic routines created in a language like R or Python. As such, it can supplement the query capabilities of Hadoop SQL Edition or even Hive and apply parallel analytical processing to the queried data. 3

Because the Actian Analytics Platform offers a development environment it can also be used both to develop and execute other activities that may take place within The Data Hub, such as data cleansing, metadata discovery and so on. It can also orchestrate the activities of other software tools that might be used within The Hub. The Actian Analytics Platform is an extraordinarily versatile solution, and organizations who select Actian to provide the foundation of their Big Data information architecture will no doubt make extensive use of it. Actian in Summary As far as we are aware, Actian is the only vendor that currently provides a broad line of software capabilities that include both a suite of database products that cater for multiple query types (SQL queries, analytical queries, graph queries, document queries) and also a data flow development environment and engine. As such, it has all the requisite components for building a Data Hub of the type described in our research report, and hence provides the foundation for a Big Data environment to initially supplement and ultimately replace the traditional data warehouse environment and support an extensive analytical capability. About The Bloor Group The Bloor Group is a consulting, research and technology analysis firm that focuses on open research and the use of modern media to gather knowledge and disseminate it to IT users. Visit both www.thebloorgroup.com and www.insideanalysis.com for more information. The Bloor Group is the sole copyright holder of this publication. Austin, TX 78720 512-524 3689 4