Unified access to all your data points. with Apache MetaModel



Similar documents
Table of contents. Reverse-engineers a database to Grails domain classes.

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Principal MDM Components and Capabilities

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Create a Database Driven Application

Linas Virbalas Continuent, Inc.

Querying Massive Data Sets in the Cloud with Google BigQuery and Java. Kon Soulianidis JavaOne 2014

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

B.1 Database Design and Definition

Apache Sqoop. A Data Transfer Tool for Hadoop

How, What, and Where of Data Warehouses for MySQL

Data Integration Checklist

SQL 2: GETTING INFORMATION INTO A DATABASE. MIS2502 Data Analytics

Data Grids. Lidan Wang April 5, 2007

Integrating VoltDB with Hadoop

Talend for Data Integration guide

Oracle EBS Interface Connector User Guide V1.4

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

SavvyDox Publishing Augmenting SharePoint and Office 365 Document Content Management Systems

Building a BI Solution in the Cloud

Systems Integration in the Cloud Era with Apache Camel. Kai Wähner, Principal Consultant

Java and RDBMS. Married with issues. Database constraints

Software Architecture Document

Bringing Business Objects into ETL Technology

Spring Data JDBC Extensions Reference Documentation

Unifying the Global Data Space using DDS and SQL

Search and Real-Time Analytics on Big Data

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Serious Threat. Targets for Attack. Characterization of Attack. SQL Injection 4/9/2010 COMP On August 17, 2009, the United States Justice

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Geodatabase Programming with SQL

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

A Brief Introduction to MySQL

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

SQL Injection. SQL Injection. CSCI 4971 Secure Software Principles. Rensselaer Polytechnic Institute. Spring

Comparing SQL and NOSQL databases

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Zend Framework Database Access

Relational Database Concepts

MYSQL DATABASE ACCESS WITH PHP

CSCC09F Programming on the Web. Mongo DB

Tuning Tableau and Your Database for Great Performance PRESENT ED BY

Enabling development teams to move fast. PostgreSQL at Zalando

How graph databases started the multi-model revolution

Integration Architecture & (Hybrid) Cloud Scenarios on the Microsoft Business Platform. Gijs in t Veld CTO BizTalk Server MVP BTUG NL, June 7 th 2012

Programming Autodesk PLM 360 Using REST. Doug Redmond Software Engineer, Autodesk

Harnessing the Potential Raj Nair

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Copyright 2013 Consona Corporation. All rights reserved

Big Data SQL and Query Franchising

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

Chapter 9 Java and SQL. Wang Yang wyang@njnet.edu.cn

SQL. Short introduction

Simba Apache Cassandra ODBC Driver

Introduction to Database. Systems HANS- PETTER HALVORSEN,

SQL injection: Not only AND 1=1. The OWASP Foundation. Bernardo Damele A. G. Penetration Tester Portcullis Computer Security Ltd

Database Migration Plugin - Reference Documentation

Developing SQL and PL/SQL with JDeveloper

II. PREVIOUS RELATED WORK

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

Reverse Engineering in Data Integration Software

Data storing and data access

sql-schema-comparer: Support of Multi-Language Refactoring with Relational Databases

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze

Wave Analytics External Data API Developer Guide

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

Preparing Your Data For Cloud

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

Introduction to Cloud Computing

Lag Sucks! R.J. Lorimer Director of Platform Engineering Electrotank, Inc. Robert Greene Vice President of Technology Versant Corporation

NoSQL Roadshow Berlin Kai Spichale

EDI Process Specification

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Getting Started with SandStorm NoSQL Benchmark

InfiniteGraph: The Distributed Graph Database

LSINF1124 Projet de programmation

Inside the Digital Commerce Engine. The architecture and deployment of the Elastic Path Digital Commerce Engine

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Software Architecture Document

DbSchema Tutorial with Introduction in MongoDB

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

Moving From Hadoop to Spark

Replicating to everything

Sisense. Product Highlights.

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Database migration. from Sybase ASE to PostgreSQL. Achim Eisele and Jens Wilke. 1&1 Internet AG

Integrating Big Data into the Computing Curricula

Using SQL Server Management Studio

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Transcription:

Unified access to all your data points with Apache MetaModel

Who am I? Kasper Sørensen, dad, geek, guitarist @kaspersor Long-time developer and PMC member of: Founder also of another nice open source project: Principal Software Engineer @

Session agenda - Introduction to Apache MetaModel Use case: Query composition The role of Apache MetaModel in Big Data (New) architectural possibilities with Apache MetaModel

Introduction to Apache MetaModel

Helicopter view You can look at Apache MetaModel by it's formal description: " a uniform connector and query API to many very different datastore types " But let's start with a problem to solve...

A problem How to process multiple data sources while: - Staying agnostic to the database engine. Respecting the metadata of the underlying database. Not repeating yourself. Avoiding fragility towards metadata changes. Mutating the data structure when needed. We used to rely on ORM frameworks to handle the bulk of these...

ORM - Queries via domain models Metadat a (assum public class Person { ed) public String getname() {...} } Query (t ype-safe ORM.query(Person.class).where(Person::getName).eq("John Doe"); )

Why an ORM might not work Requires a domain model - While many data-oriented apps are agnostic to a domain model. Sometimes the data itself is the domain. Model cannot change at runtime - Usually needed especially if your app creates new tables/entities Type-safety through metadata assumptions - This is quite common, yet it might be a problem that the model is statically mapped and cannot survive e.g. column renames etc.

Alternative to ORM: Use JDBC Metadata discoverable via java.sql.databasemetadata. Queries can be assembled safely with a bit of String magic: "SELECT " + columnnames + " FROM " + tablename +...

Alternative to ORM: Use JDBC Wrong! It turns out that not all databases have the same SQL "dialect". they also don't all implement DatabaseMetaData the same way. you cannot use this on much else than relational databases that use SQL. What about NoSQL, various file formats, web services etc.

MetaModel - Queries via metadata model Metadat DataContext dc = a (discov ered) Table[] tables = dc.getdefaultschema().gettables(); Table table = tables[0]; Column[] primarykeys = table.getprimarykeys(); Query (m etadata- safe) dc.query().from(table).selectall().where(primarykeys[0]).eq(42); dc.query().from("person").selectall().where("id").eq(42); Query (u nsafe)

MetaModel - Connectivity Sometimes we have SQL, sometimes we have another native query engine (e.g. Lucene) and sometimes we use MetaModel's own query engine. (A few more connectors available via the MetaModel-extras (LGPL) project)

MetaModel updates UpdateableDataContext dc = dc.executeupdate(new UpdateScript() { // multiple updates go here - transactional characteristics (ACID, synchronization etc.) // as per the capabilities of the concrete DataContext implementation. }); dc.executeupdate(new BatchUpdateScript() { // multiple "batch/bulk" (non atomic) updates go here }); // if I only want to do a single update, there are convenience classes for this InsertInto insert = new InsertInto(table).value(nameColumn, "John Doe"); dc.executeupdate(insert);

Use case: Query composition

MetaModel schema model

MetaModel query and schema model

Query representation for developers? "SELECT * FROM foo" Query q = new Query(); q.from(table).selectall(); Query as a string - Easy to write. - Easy to read. - Prone to parsing errors. - Prone to typos etc. - Fails at runtime. Query as an object - More involved code to read and write. - Lends itself to inspection and mutation. - Fails at compile time.

Context-based Query optimization You might not always need to pass data around between components Sometimes you can just pass the query around! This STATUS=DELIVERED filter never actually executes, it just updates the query on the ORDERS table.

The role of Apache MetaModel in Big Data

So how does this all relate to Big Data?

Big Data and the need for metadata Variety - Not only structured data Social Sensors Many new sources, but also the old : - Relational - NoSQL - Files - Cloud

Volume

Velocity Volume

Variety Velocity Volume

Veracity? Variety Velocity Volume

Query language examples SQL SELECT * FROM customers WHERE country_code = 'GB' OR country_code IS NULL MongoDB db.customers.find({ $or: [ {"country.code": "GB"}, {"country.code": {$exists: false}} ] }) CSV for (line : customers.csv) { values = parse(line); country = values[country_index]; if (country == null "GB".equals(country) { emit(line); } }

Query language examples Any datastore datacontext.query().from(customers).selectall().where(countrycode).eq("gb").or(countrycode).isnull();

Make it easy to ingest data in your lake

Metadata to enable automation Big Data Variety and Veracity means we will be handling: Different physical formats of the same data Different query engines Different quality-levels of data How can we automate data ingestion in such a landscape?

Metadata to enable automation Big Data Variety and Veracity means we will be handling: Different physical formats of the same data We need a uniform metamodel for all the datastores. And enough metadata to infer the ingestion transformations needed. Different query engines We need a uniform query API based on the metamodel. Different quality-levels of data We need our ingestion target to be aware of the ingestion sources.

Traditional view on metadata user id (pk) CHAR(32) java.lang.string not null username VARCHAR(64) java.lang.string not null, unique real_name VARCHAR(256) java.lang.string not null address VARCHAR(256) java.lang.string nullable age INT int nullable

Traditional view on metadata user id (pk) CHAR(32) java.lang.string not null username VARCHAR(64) java.lang.string not null, unique real_name VARCHAR(256) java.lang.string not null address VARCHAR(256) java.lang.string nullable age INT int nullable id (pk) BIGINT long not null firstname VARCHAR(128) java.lang.string not null lastname VARCHAR(128) java.lang.string not null street VARCHAR(64) java.lang.string not null house number INT int not null customer

A more elaborate metadata view user 32 char UUID id (pk) CHAR(32) java.lang.string not null username VARCHAR(64) java.lang.string not null, unique real_name VARCHAR(256) java.lang.string not null address VARCHAR(256) java.lang.string nullable age INT int nullable Address (unstructured) Person age Numeric ratio variable customer id (pk) Person full name BIGINT long not null Person first name firstname VARCHAR(128) java.lang.string not null lastname VARCHAR(128) java.lang.string not null street VARCHAR(64) java.lang.string not null house number INT int not null Person last name Address part - street Address part - house number Numeric nominal variable Same realworld entity?

Elaborate metadata and querying Static Manual SELECT firstname, lastname FROM customer SELECT real_name FROM user SELECT (columns with header 'name') FROM (customer and user) SELECT (person name columns) FROM (customer and user) Dynamic SELECT EXTRACT(full name FROM (name columns)) Automated FROM (tables containing person names) (or supervised)

Elaborate metadata and querying Static Manual SELECT street, house_number FROM customer SELECT address FROM user SELECT (Address part columns) FROM (customer and user) Dynamic SELECT EXTRACT(street, hno, city, zip) FROM (Address part columns)) Automated FROM (tables containing addresses) (or supervised)

Elaborate metadata and querying Static Individual/specific DB connectors Manual JDBC Apache MetaModel (today) Dynamic Automated (or supervised) Apache MetaModel (future)

What about speed? Performance - - Performance is important in development of MetaModel. But not in favor of uniformness. In some cases the metadata may benefit performance by automatically tweaking query parameters (e.g. fetching strategies). We usually expose the native connector objects too. Time to market - - MetaModel makes it easy to cover all the data sources with the same codebase. Typical 80/20 trade-off scenario. Avoid premature optimization.

(New) architectural possibilities with Apache MetaModel

Data Integration scenario Copy (ETL) ry Que

Data Federation scenario Qu ery

Let's see some code Query a remote CSV file https://github.com/kaspersorensen/apachebigdatametamodelexample

Thank you Questions?