Data sharing in the BigData era

Similar documents
Data sharing in the Big Data era

Big Data Challenges in Bioinformatics

Big Data and Analytics: Challenges and Opportunities

Why big data? Lessons from a Decade+ Experiment in Big Data

DISTRIBUTED MULTIMEDIA SYSTEMS

Increasing Business Productivity and Value in Financial Services with Secure Big Data Architecture

SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load.

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

Object Relational Database Mapping. Alex Boughton Spring 2011

Reference Architecture, Requirements, Gaps, Roles

Learning Management Redefined. Acadox Infrastructure & Architecture

CRM Magic with Data Migration & Integration

How To Run A Cloud Computer System

BSC vision on Big Data and extreme scale computing

Driving More Value From OpenVMS Critical Infrastructure in Local and Global Datacenters: A CASE STUDY. Presented by: J. Barry Thompson, CTO Tervela

Technology Megatrends & The Role for Geospatial

COMP9321 Web Application Engineering

You Have Your Data, Now What?

iservdb The database closest to you IDEAS Institute

SavvyDox Publishing Augmenting SharePoint and Office 365 Document Content Management Systems

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

To Java SE 8, and Beyond (Plan B)

Actionable information for security incident response

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

2.1.5 Storing your application s structured data in a cloud database

Emerging Geospatial Trends The Convergence of Technologies. Jim Steiner Vice President, Product Management

designing and exploiting BIG DATA PLATFORM BIG DATA PLATFORM SQL

CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Big Data Analytics. Chances and Challenges. Volker Markl

Storing Data: Disks and Files

: Application Layer. Factor the Content. Bernd Paysan. EuroForth 2011, Vienna

Geodatabase Programming with SQL

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

ENTERPRISE DOCUMENT MANAGEMENT SYSTEM

JOURNAL OF OBJECT TECHNOLOGY

SaaS or On-Premise? How to Select the Right Paths for Your Enterprise. David Linthicum

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

EDG Project: Database Management Services

Apache Hadoop. Alexandru Costan

Assignment # 1 (Cloud Computing Security)

Tuning Tableau Server for High Performance

Big Data Integration: A Buyer's Guide

Looking Ahead The Path to Moving Security into the Cloud

Build more and grow more with Cloudant DBaaS

REAL-TIME BIG DATA ANALYTICS

Should HR Care About Big Data? E D COHEN, L E A R NING I NDUSTRY CONSULTA NT

Overview. The Cloud. Characteristics and usage of the cloud Realities and risks of the cloud

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Building an Internal Cloud that is ready for the external Cloud

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Mobile Cloud Computing

Transactions and ACID in MongoDB

Use Cases for Argonaut Project. Version 1.1

Issues in Big-Data Database Systems

Using Object Database db4o as Storage Provider in Voldemort

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Towards Elastic Application Model for Augmenting Computing Capabilities of Mobile Platforms. Mobilware 2010

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

DATA WAREHOUSING AND OLAP TECHNOLOGY

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

The Next Wave of Data Management. Is Big Data The New Normal?

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Building Mobile APIs with Services

MongoDB Developer and Administrator Certification Course Agenda

"LET S MIGRATE OUR ENTERPRISE APPLICATION TO BIG DATA TECHNOLOGY IN THE CLOUD" - WHAT DOES THAT MEAN IN PRACTICE?

Fast Innovation requires Fast IT

Business Intelligence for Big Data

Positioning Performance Testing to Cut Costs of Cloud Computing

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

25 May Code 3C3 Peeling the Layers of the 'Performance Onion John Murphy, Andrew Lee and Liam Murphy

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Office 365 Migration Performance & Server Requirements

ANALYTICS IN BIG DATA ERA

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Living Objects: Towards Flexible Big Data Sharing

Java/Scala Engineer Internet of Iot Competitors

The end. Carl Nettelblad

The Cloud to the rescue!

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Making Optimal Use of JMX in Custom Application Monitoring Systems

Analysis of Cloud Solutions for Asset Management

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

Meet AkzoNobel Leading market positions delivering leading performance

Big Data Keep it Simple

Tufts University. Department of Computer Science. COMP 116 Introduction to Computer Security Fall 2014 Final Project. Guocui Gao

Distributed File Systems

Transcription:

www.bsc.es Data sharing in the BigData era Toni Cortes and Anna Queralt BigDataCloud 2014, Europar 2014, Porto

! Motivation and background! Vision! Technological details! Conclusions Agenda

Before everything started! What ignited our research bigbang Different data models: Persistent vs. non persistent New storage devices: byte addressable Sharing is what really matters! And then dataclay came to life (more details on how all fits together In the next 50 minutes)

! BigData problems! Big data problems (3Vs) Volume I know I am being over simplistic!! But! Variety Velocity! Volume: money. where is the problem then?!! Velocity: money (replication HW, model, )! Variety: more a data-model problem than storage technology

Let me be a bit more controversial! If we store the weigh of all rain drops! Is this big data? big volume, velocity (many drops per second), and variety (info from many sources)! Can we extract any valuable information?! No value, no big data! Real value comes from sharing and combining

Why sharing data is important?! Cooperation is the way to success! Key information comes from merging data from different sources! Data sources: public and open or private (not shared)

Some examples of what we want to achieve! Newspaper archival Control over data! Geospatial Share and enrich data! Health Privacy and flexibility

How is data shared today?! Real sharing: all actors have full access to infrastructure Huge trust alliances or irrelevant data Very flexible! Data copies: owner decides what can be copied Unnecessary data movement Stale data Owner loses control over data Very flexible! Data services: owner decides what and how data is shared Very restrictive Changes imply data provider involvement Owner keeps full control

! Motivation and background! Vision Agenda Self containment and enrichment New programming model! Technological details! Conclusions

Our vision! Enable all actors to Share an infrastructure Merge all data in a single data set Upload computations to be shared See different views of the data! Key idea: self-contained and data enrichment by 3 rd parties

Key technology: self-contained objects! Self-contained objects Data Code Behavior policies! But this looks much like a data service

Self-contained objects! Push the idea of data services to the limit Client App Client App Data service Data store Functions Security, Integrity, Functions Security,... Data Data store Data Data Data Data

Key-technology: 3 rd party enrichments! Self-contained objects seem to be a new technology to offer data services in a different way! Then we need something else something to make it really flexible!

3 rd -party data enrichment! By enrichment we understand: Adding new information to existing datasets Adding new code to existing datasets! This enrichment should Be possible during the life of data Not be limited to the data owner Enable different views of the data to different users/clients Not everybody should be forced to see the same enrichments Several enrichments should be available concurrently

Then! Data can be enriched both with data and code, in provider infrastructure! Code can be executed in the provider infrastructure Data-provider Infrastructure Enrichment Client App

Data integration in a single infrastructure?! Using a single infrastructure may become a bottleneck! Security and privacy policies should be part of the data! Thus, data could be offloaded to other infrastructures Without breaking the data policies! Data owner enables 3 rd party enrichment and! does not lose control

And now behavior policies! Behavior policies include Privacy Security Integrity Life cycle Etc.! Who manages them Traditionally: platform and/or infrastructure Self-contained objects: the object itself If each object manages its own behavior, the system is far more scalable

Implementing behavior policies! How it is implemented Policies implemented as part of object methods! How are policies defined Defined using traditional declarative languages (rules) Compiled and injected into class methods

Moreover! Efficient usage of resources Data and code can be offloaded to resources not accessible by the data provider Also for storage è reduce data movements if data is placed close to client Client Infrastructure Provider Infrastructure Functions Security,... Data Cloud

! Motivation and background! Vision Agenda Self containment and enrichment New programming model! Technological details! Conclusions

Why persistent data is different than volatile?! Today We have one data model for volatile data Traditional data structures and/or objects We have a different data model for the persistent data Relational database, NoSQL database, files! Future Store data in the same way as when volatile Store objects and relations

Data selection (Queries make no sense anymore)! Enrichment enables accessing persistent data as if in memory! In memory: Data never queried Data linked according to needs of program Next data item found by following a link, not a query! Persistent data should behave in a similar way Following a link is faster than a query over the whole dataset Programs do not need to make any differences whether Data is in memory or in persistent storage Enrichments enable data to be linked in different ways

Agenda! Motivation and background! Vision! Technological details! Conclusions

! dataclay: Storage platform based on objects Self-contained persistent objects (data and code) Currently a prototype for Java applications dataclay infrastructure Stubs Client Stubs Middleware Client 24

! Data API Make object persistent Delete persistent object Retrieve objects By oid By query (simple) Execute method! Management API New class New enrichment Get classes (stubs) 25

Management App A... dataclay.getclasses(mycredentials);... Data Provider App... dataclay.newclass( Molecule, path);... Molecule.class enrichme nt.class Management App B... dataclay.newenrichment(mycredentials, Molecule, enrichmentpath);... dataclay.getclasses(mycredentials);... Molecule.class Original class Enrichment Molecule.class Stub class (original) dataclay Stub class (enriched) User A User B Consumer App A... Molecule m;... m.makepersistent();... Consumer App B... Molecule m;... result = m.mymethod();... 26

Original Molecule class public class Molecule { public void setnext(molecule nextmolecule) { this.next = nextmolecule; } public Molecule getnext() { return this.next(); } public Point getcenterofmass() { return this.centerofmass(); } } Molecule stub class public class Molecule { public Molecule(ObjectID oid) { this = (Molecule) dataclay.getobject(oid); } public ObjectID makepersistent() { return dataclay.makepersistent(this); } public Molecule[] getalike() { return dataclay.querybyexample(this); } //original Molecule methods public void setnext(molecule nextmolecule) { dataclay.executeremoteimpl( ); } } 27

//We create the atoms of a molecule Atom[] atoms = new Atom[2]; atoms[0] = new Atom("H",0,1,0,1); atoms[1] = new Atom("O",0,0,1,1); Molecule newmolecule = new Molecule("NewWater", atoms); objectid = newmolecule.makepersistent(); New object and subobjects are stored Molecule lastmolecule = new Molecule(previousID); //Add the new molecule as a member of the list lastmolecule.setnext(newmolecule); Updated object is implicitly stored Get persistent object by OID

// Query for molecules Molecule samplemolecule = new Molecule(); samplemolecule.setname("water ); Molecule[] molecules = samplemolecule.getalike(); Molecule currentmolecule = molecules[0]; while (currentmolecule!= null) { Get persistent object by query //Calculate center of mass of molecule i Point centerofmass = currentmolecule.getcenterofmass(); sumx += centerofmass.getx() * centerofmass.getmass(); sumy += centerofmass.gety() * centerofmass.getmass(); sumz += centerofmass.getz() * centerofmass.getmass(); summassofmols += centerofmass.getmass(); System.out.println("Getting next molecule..."); All methods of a persistent class currentmolecule = currentmolecule.getnext(); are executed remotely } centerx = sumx / summassofmols; centery = sumy / summassofmols ; centerz = sumz / summassofmols; Get persistent object through a method

// Query for molecules Molecule samplemolecule = new Molecule(); samplemolecule.setname("water ); Molecule[] molecules = samplemolecule.getalike(); Molecule currentmolecule = molecule[0]; while (currentmolecule!= null) { Class enriched by user U public class Molecule { public Point getcentroid(){ } } //Calculate center of mass of molecule i Point centroid = currentmolecule.getcentroid(); sumx += centroid.getx(); sumy += centroid.gety(); sumz += centroid.getz(); nummols++; System.out.println("Getting next molecule..."); currentmolecule = currentmolecule.getnext(); } centerx = sumx / nummols; centery = sumy / nummols; centerz = sumz / nummols; The new method is available (only for U)

Index! Motivation and background! Vision! Technological details! Conclusions

New business models to share data! Today In order to exploit data you need To have the data And the infrastructure to store and compute it Or subcontract it, but may lose control over the data! Our proposal enables to decouple The data provider The infrastructure provider And the different service providers

dataclay! Storage platform that provides flexible big data sharing! Today Basic enrichment functionality Reasonable performance Similar to state of the art OODB! Near future Higher performance Security Scalability

Thanks to! Original team Anna Queralt Jonathan Martí Daniel Gasull! Recently joined Juanjo Costa Alex Barceló! Former team members Ernest Artiaga