BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe



Similar documents
Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

MarkLogic 8: Samplestack

Linked Data Publishing with Drupal

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

LDIF - Linked Data Integration Framework

TopBraid Insight for Life Sciences

How to avoid building a data swamp

GetLOD - Linked Open Data and Spatial Data Infrastructures

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

DISCOVERING RESUME INFORMATION USING LINKED DATA

Publishing Linked Data Requires More than Just Using a Tool

From Distributed Computing to Distributed Artificial Intelligence

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

MASHUPS FOR THE INTERNET OF THINGS

DBpedia German: Extensions and Applications

HadoopRDF : A Scalable RDF Data Analysis System

An industry perspective on deployed semantic interoperability solutions

PICASSO Big Data Expert Group

BYODs & FAIR Data Stewardship

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Collaborative Open Market to Place Objects at your Service

CREATING AN INTERNAL CLOUD: EPAM DEVELOPS A CUSTOM SOLUTION. Time-consuming infrastructure configuration and maintenance

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Hadoop & Spark Using Amazon EMR

RDF Dataset Management Framework for Data.go.th

Models and Architecture for Smart Data Management

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

D5.4.4 Integrated SemaGrow Stack API components

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Data Publishing with DaPaaS

Amit Sheth & Ajith Ranabahu, Presented by Mohammad Hossein Danesh

D5.3.2b Automatic Rigorous Testing Components

MarkLogic Semantics in Healthcare and Life Sciences for LIDER COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Big Data Analytics Nokia

ORACLE FINANCIAL SERVICES ANALYTICAL APPLICATIONS INFRASTRUCTURE

Openbus Documentation

MongoDB Developer and Administrator Certification Course Agenda

ASTERIX: An Open Source System for Big Data Management and Analysis (Demo) :: Presenter :: Yassmeen Abu Hasson

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Developing Windows Azure and Web Services

Geospatial Data and the Semantic Web. The GeoKnow Project. Sebastian Hellmann AKSW/KILT research group, Leipzig University & DBpedia Association

Data collection architecture for Big Data

Deploying a Geospatial Cloud

Automation & Open Source. How to tame the Cloud?

Service Oriented Architecture

Apache Sentry. Prasad Mujumdar

How To Use Semantics In A System

An Enhanced Visualization Service based on Geospatial and Statistical Linked Open Data

Scope. Cognescent SBI Semantic Business Intelligence

Introduction to Cloud Computing

Low-cost Open Data As-a-Service in the Cloud

Fast Innovation requires Fast IT

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Software Architecture Document

Cloud application services (SaaS) Multi-Tenant Data Architecture Shailesh Paliwal Infosys Technologies Limited

ATLAS job monitoring in the Dashboard Framework

SaaS & Cloud Application Development & Delivery

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Epimorphics Linked Data Publishing Platform

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Linked Statistical Data Analysis

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

Semantic Web Success Story

bigdata Managing Scale in Ontological Systems

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Introduction to Arvados. A Curoverse White Paper

FIWARE Lab Solution for Managing Resources & Services in a Cloud Federation

Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web. NLA Gordon Mohr March 28, 2012

Cloud and Big Data Standardisation

SAP HANA Cloud Platform for SuccessFactors High Level Overview August 2013

COMPONENTS in a database environment

OpenStack CI: flow, tools and more

VIVO Dashboard A Drupal-based tool for harvesting and executing sophisticated queries against data from a VIVO instance

EIDA WFCatalog Service!!! Luca Trani and the EIDA Team

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Federated, Generic Configuration Management for Engineering Data

Redefining Static Analysis A Standards Approach. Mike Oara CTO, Hatha Systems

Drupal.

Transcription:

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE Big Data Europe

The Big Data Aggregator The Big Data Aggregator: o A general-purpose architecture for processing Big Data o An implementation of the core architecture v Integrating existing mature components o An ecosystem of tools around the core system v Driven by our use cases across all Horizon 2020 challenges

Conceptual basis Big Data Aggregator architecture builds upon the Lambda Architecture o Generic, scalable and fault-tolerant data processing architecture Batch layer o Time-consuming computations o Physically available data o Typically in large chunks

Conceptual basis Big Data Aggregator architecture builds upon the Lambda Architecture o Generic, scalable and fault-tolerant data processing architecture Speed layer o Computations expected to provide results in real time o Smaller amounts of data o Often streams

Conceptual basis Big Data Aggregator architecture builds upon the Lambda Architecture o Generic, scalable and fault-tolerant data processing architecture Data serving layer o Data input and data consumption o Results offered for consumption as views, predefined queries required by the application o Views combine batch and speed layer results to offer a unified view to the application

Big Data Aggregator Conceptual Architecture A Lambda Architecture for the Semantic Web o Generic, scalable and fault-tolerant data processing architecture o In the presence of semantic knowledge about the data o Maintaining metadata about provenance v Especially when pooling together multiple data sources v Including non-trivial data integration where substantial transformations are carried out

Big Data Aggregator Conceptual Architecture Background knowhow Bulk database Reporting API Background aggregator Bulk Data aggregator Real time aggregator Data Search index Dataset Meta data Data serving stores Data Serving API

Big Data Aggregator Conceptual Architecture System Admin User Interface Producer UI Background knowhow Bulk database Reporting API Background aggregator Bulk Data aggregator Real time aggregator Data Search index Dataset Meta data Data serving stores Data Serving API End user UI

Semantic Web aspects: Background Background knowledge o Integrating different pieces of background o Making it available to data processing Vertical links: o E.g., stream processor receives aggregated background

Semantic Web aspects: Provenance Provenance and other metadata o Metadata about data sources providing to this computation o Metadata travels down the processing pipeline without getting disassociated from the data it describes o Metadata is available as a data serving view Granularity o Per result tuple can become Big Data by itself o Per resultset can be less useful v Invalidates enormous processing for the slightest now-invalid input o Something inbetween or user configuration? v To be discussed

Semantic Web aspects: Data Semantic Web data o All component interfaces exchange RDF o Data serving API supports LD/SW formats v JSON, SPARQL & co o Besides ingesting RDF data and LD, run-time accessing SPARQL endpoints v At lease for the purposes of dynamically ingesting

Big Data Aggregator Conceptual Architecture Background knowhow Bulk database Reporting API Background aggregator Bulk Data aggregator RDF data procedding Real time aggregator Knowledge Knowledge Data Search index Dataset Meta data Data serving stores Data Serving API SPARQL JSON JSON-LD

From source code to deployed instance Source code management Github Packaging Debian Provisioning Docker Auto-deployment Chef Puppet Salt Ansible Swarm mesos

Starting Point Horton Works, http://hortonworks.com o Integrated suite of Big Data processing tools LOD2 stack, http://www.lod2.eu o Tools for the Data Web o Ontologies o Automatically interlinking and fusing Web data o Provenance, privacy, security, quality o Searching, browsing, and authoring of Linked Data. SemaGrow, http://www.semagrow.eu o Federated SPARQL querying o Data integration o Optimized query execution v Including over uncooperative endpoints o Provenance metadata GeoKnow, http://www.geoknow.eu o Geospatial and LD integration o Data provenance o Adaptive geospatial exploration, authoring and curation

Thank you for your attention Questions?