BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &



Similar documents
How To Store Data In Nosql

Applications for Big Data Analytics

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Data & Security. Aljosa Pasic 12/02/2015

How To Handle Big Data With A Data Scientist

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

REAL-TIME BIG DATA ANALYTICS

Open source large scale distributed data management with Google s MapReduce and Bigtable

How To Use Big Data For Telco (For A Telco)

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Text Analytics and Big Data

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Lecture Data Warehouse Systems

Big Data and Data Science: Behind the Buzz Words

Databases 2 (VU) ( )

How To Scale Out Of A Nosql Database

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Choosing The Right Big Data Tools For The Job A Polyglot Approach

NoSQL for SQL Professionals William McKnight

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL Data Base Basics

Luncheon Webinar Series May 13, 2013

Infrastructures for big data

An Approach to Implement Map Reduce with NoSQL Databases

Cloud Computing at Google. Architecture

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Reference Architecture, Requirements, Gaps, Roles

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

The 4 Pillars of Technosoft s Big Data Practice

Preparing Your Data For Cloud

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Sentimental Analysis using Hadoop Phase 2: Week 2

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Big Data and Analytics (Fall 2015)

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Transforming the Telecoms Business using Big Data and Analytics

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Challenges for Data Driven Systems

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Slave. Master. Research Scholar, Bharathiar University

An Open Source NoSQL solution for Internet Access Logs Analysis

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Data Modeling for Big Data

Big Data With Hadoop

Hadoop. Sunday, November 25, 12

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Large Scale/Big Data Federation & Virtualization: A Case Study

Business Intelligence for Big Data

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data Technologies Compared June 2014

secure intelligence collection and assessment system Your business technologists. Powering progress

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

NoSQL Systems for Big Data Management

Big Data Analytics Nokia

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Oracle Database 12c Plug In. Switch On. Get SMART.

BIRT in the World of Big Data

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

GigaSpaces Real-Time Analytics for Big Data

Performance and Scalability Overview

Data Refinery with Big Data Aspects

How To Write A Database Program

Architecting for the Internet of Things & Big Data

NoSQL Database Systems and their Security Challenges

Cloud & Big Data a perfect marriage? Patrick Valduriez

Hadoop Ecosystem B Y R A H I M A.

Can the Elephants Handle the NoSQL Onslaught?

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Predictive Research Inc., Predict & Benefit

Transcription:

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation Powering progress

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 2

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 3

A definition of Big Data Big Data is a trend related to the 2000 1800 1600 1400 explosion of available data 1200 1000 800 600 Available storage Information created due to the augmentation of potential data sources beyond traditional structured data 400 200 0 2006 2007 2008 2009 2010 2011 Evolution of data created and available storage capacity (Source: IDC)

New sources of Big Data Unstructured data: multi-format, from multiple sources Real time data: value in the real time aspects of the data, often coming from sensors Exhaust data: tech logs that had not been previously considered Linked and shared data: data located in multiple data silos which get a meaning through relationships Social data: Data coming form social networks and related systems Open data: Data made publicly available by governments or public institutions

Big Data technical challenges Storing data Traditionnal RDBMS not suitable Requires massively distributed storage (including NoSQL databases) Processing data Requires massively distributed computing framework: MapReduce, Hadoop Real-time Big Data processing: still emergent Analysing data Data Mining / Machine Learning / Data visualization Requires new, specific implementation over distriuted computing frameworks Global issues (clear needs in data usage & exploitation): Tools are still immature Require specific, rare skills / knowledge

Storing Big Data the NoSQL movement NoSQL «Not Only SQL» Give up the relationnal model 4 kinds of databases: Key-value stores: distributed, persistent hashtables. Data stored by small chunks of data. Voldemort, Membase. Column oriented data stores: use of bi-dimensional keys (rows and columns), data is stored in column. Optimized for accessing consecutive elements. Google BigTable, Cassandra, HBase, Hypertables. Document oriented data stores: key-value stores, data is a semi-structured document (typically JSON or XML). MongoDB, CouchDB. Graph oriented databases: store and query graph structured data. Neo4j.

A comparison of open source NoSQL storage solutions Cassandra HBase Voldemort Membase MongoDB CouchDB Type Column Column Document Document Key-value Key-value Oriented Oriented Oriented Oriented Maturity *** *** *** * *** ** Support **** **** *** * *** *** Typical use Massive Massive Rich querying, amount of amountof Heavy load Heavy load Rich querying Heavy load data data Typical cluster size Up to hundreds of servers Up to hundredsof servers Up totensof Up totensof Up to tens of servers servers servers Some servers Specific multisite management Yes Yes No No Yes No Map-reduce processing Yes Yes No No Yes Yes support Performance *** *** ***** *** *** *

Batch computations on Big Data the MapReduce approach MapReduce is a framework to distribute processing tasks to a large cluster of computers. Two steps: Map step: division into independent subtasks executed in parallel by Map processes Reduce step: results recursively merged until the final result is computed. Map processes are executed where the data is data transfer reduction Adaptation of the algorithms to the MapReduce method Hadoop: main MapReduce framework

Towards Real time Big Data Complex Event Processing Complex Event Processing consists in the processing, filtering, aggregation and combination of a very large number of events (>100,000/s) in real time 2 categories of Complex event processing Computation-oriented CEP: Processing of both historical and real time data Detection-oriented CEP: Filtering and rules matching on real time data 3 concepts associated to Complex Event Processing Events: occurrence of an action, threshold Requests: used to trim, correlate and aggregate events Listener: called when a set of criteria is met to send proper notifications Main products in the field: Yahoo! S4, Esper, Streambase

Big Data elements together A Data brokerage platform Application Application Read/ Write / Update / Delete APIs Analytics (all data) Data brokerage platform (analytics & decision) Preparing Caching external data unstructured & non commited data distribute content and / or policy based in band Private cloud Internal DBMS Cache Self data Index Search Metadata Relational / No SQL DBMS Cache Self data Index Search Metadata Distributed FS RT Data API Static data API private / public cloud

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 12

Business potential Source: Big data: The next frontier for innovation, competition, and productivity, McKinsey 13

Big Data vs. Application domains/sectors 14

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 15

Open Source Big Data Management at Atos (I) Atos Worldline (HTTS) Atos Worldline provides e-commerce solutions to several major e-sellers, in particular to the majority of French mass retail sellers One of the main features of this kind of customer is the massive amount of users that their e-commerce websites have to handle simultaneously. For its next e-commerce solution, Massilia, Atos Worldline is investigating the use of NoSQL solutions for some data storage tasks 16

Open Source Big Data Management at Atos (II) Atos Research & Innovation BUSCAMEDIA main objective is to provide a semantic tool to index, access, and locate media content. In order to perform the automatic categorization and annotation of media content, the technologies being applied are based on Google s MapReduce concept, using Hadoop to implement this functionality (http://www.cenitbuscamedia.es/) BEinGRID Project 25 business experiments and a toolset repository of grid middleware upper layers TAF-GRIDs experiment: to detect Telecom Fraud where customers usage records were exchanged, captured, stored, and analyzed among different telecom operators. The technologies used to perform the experiment were Globus Toolkit, for computational grid jobs and, OGSA-DAI, for data management. http://www.beingrid.eu Atos Worldgrid Open Node: definition of a NoSQL storage mechanism for SmartGrid. In the context of Open Architecture for Secondary Nodes of the Electricity SmartGrid, a repository is being implemented using an Open Source Big Data solution. Cassandra has been chosen as the best solution for this implementation. 17

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 18

Atos and the International Olympic Committee AtoS is the official IT provider for the International Olympic Committee since Barcelona 1992 Flawless IT on time, on budget Designing, building, and operating the critical IT infrastructure and systems that ensure the smooth and secure running of the Olympic Games IT and the immediate availability of results Every two years a new organization With 4 billion customers Operating 24/7 Creating a cohesive team of highly trained professionals out of ever changing partners Successfully operating in a new territory 19

Olympic Games: facts and figures 20

Technological complexity: demanding data storage and processing 21

Organizational complexity: the preparatory phase also manages a lot of data 22

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 23

The FI PPP: Positioning in the FI landscape 24

FI PPP Programme Implementation 25

FI-WARE: Some figures and data Main data 26 partners 5 Universities 4248 Person Months (excl. open calls) Total Funding 41 M Open calls 12,3 M Total budget 66,4 M Three years duration

FI-WARE: Objectives

FI-WARE: Collaboration with Usage Areas

FI-WARE and Big Data The FI-WARE project addresses, among others, the following Data Management topics: Publish/ Subscription Complex Even Processing (CEP) Multimedia analysis Unstructured data analysis Meta-data pre-processing Semantic annotation and application support Big Data analysis, based on Hadoop (heterogeneous MapReduce platform mainly used for ad-hoc data exploration of large sets of data.) MongoDB: a scalable, high-performance, open source, document-oriented database. SAMSON Platform: high-performance streaming MapReduce platform that is used for the near-real time analysis of streaming data. 29

Additional resources Project website: http://www.fi-ware.eu/ The wiki: http://forge.fi-ware.eu/plugins/mediawiki/wiki/fiware/ The Vision of FI-WARE The FI-WARE Chapters and Respective Technical Content The FI-WARE Agile Development Methodology, Status, and Progress FI-WARE High Level Description: public deliverable available in https://forge.fi-ware.eu/ FI PPP in general: http://www.fi-ppp.eu/ 30

Table of Contents 1. Setting the scene: introduction to Big Data 2. Business potential 3. Open Source Big Data Management at Atos 4. New demanding environments for technology adoption: the Olympic Games 5. The opportunity of Future Internet The Future Internet Public Private Partnership The FI-WARE Project (Technology Foundation of the FI PPP) 6. Final remarks 31

Final Remarks Combine long term strategy for Big Data research with well focused short term projects/ experiments focused on : Industrial sectors (public administration, manufacturing, health, retail, etc) Evaluation of RoI Gathering new requirements (instead of starting with a technology-push approach) Look (& solve) at issues such as privacy & security, other regulatory aspects, economic elements like suitability of organizational structures Define incentives for data usage and added value creation (ex. http://ourservices.eu/ showcasing open government and open data) Collaborate with other initiatives that could bring new resources, FI PPP Technology Transfer to industrial organizations (energy, manufacturing) 32

Thanks For more information please contact: Nuria de Lama Representative of Atos Research & Innovation to the EC T+ 34 91214 9321 nuria.delama@atosresearch.eu Atos (Spain) Albarracín 25 E-28037 Madrid www.atos.net Atos, the Atos logo, Atos Consulting, Atos Worldline, Atos Sphere, Atos Cloud and Atos WorldGrid are registered trademarks of Atos SA. July 2011 2011 Atos. Confidential information owned by Atos, to be used by the recipient only. This document, or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from Atos. Your business technologists. Powering progress For internal use only