Big Data Standardisation in Industry and Research

Similar documents
Cloud and Big Data Standardisation

Overview NIST Big Data Working Group Activities

Big Data Architecture Research at UvA

Instructional Model for Building effective Big Data Curricula for Online and Campus Education

Defining Architecture Components of the Big Data Ecosystem

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool)

Big Security for Big Data: Addressing Security Challenges for the Big Data Infrastructure

Defining the Big Data Architecture Framework (BDAF)

Defining Architecture Components of the Big Data Ecosystem

NIST Big Data Phase I Public Working Group

Instructional Model for Building effective Big Data Curricula for Online and Campus Education

Architecture Framework and Components for the Big Data Ecosystem

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer

Survey of Big Data Architecture and Framework from the Industry

Big Data and Data Intensive Science:

DRAFT NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey

Open Cloud exchange (OCX)

Big Data and Data Intensive (Science) Technologies

NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey

NIST Big Data Public Working Group

Big Data Challenges for e-science Infrastructure

ISO/IEC JTC1 SC32. Next Generation Analytics Study Group

EDISON Education for Data Intensive Science to Open New science frontiers

EDISON: Coordination and cooperation to establish new profession of Data Scientist for European Research and Industry

Big Data Systems and Interoperability

Standard Big Data Architecture and Infrastructure

Towards a common definition and taxonomy of the Internet of Things. Towards a common definition and taxonomy of the Internet of Things...

Cloud Essentials for Architects using OpenStack

NIST Big Data PWG & RDA Big Data Infrastructure WG: Implementation Strategy: Best Practice Guideline for Big Data Application Development

Big Data in Clouds. Cloud based platforms and tools for Big Data applications. Data Centric Computing UvA workshop 3 November 2015.

The InterNational Committee for Information Technology Standards INCITS Big Data

Data Centric Systems (DCS)

Cloud Computing Standards: Overview and ITU-T positioning

OpenAIRE Research Data Management Briefing paper

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Reference Architecture, Requirements, Gaps, Roles

Defining Generic Architecture for Cloud Infrastructure as a Service Model

Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics

EL Program: Smart Manufacturing Systems Design and Analysis

Potential standardization items for the cloud computing in SC32

Cloud Computing Standards: Overview and first achievements in ITU-T SG13.

Azure Data Lake Analytics

Cloud Standards - A Telco Perspective

locuz.com Big Data Services

Addressing Big Data Issues in Scientific Data Infrastructure

Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

The challenge of managing research data. Axel Berg

ITU- T Focus Group Cloud Compu2ng

Big Data & Security. Aljosa Pasic 12/02/2015

Agenda 4/21/ Big Data Level Set 2. Who are we? 3. What do we do? 4. What have we done so far? 5. What are we working on? 6.

Synergies between the Big Data Value (BDV) Public Private Partnership and the Helix Nebula Initiative (HNI)

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

NIST Cloud Computing Program Activities

A Big Picture for Big Data

Horizon Research e-infrastructures Excellence in Science Work Programme Wim Jansen. DG CONNECT European Commission

Big Data, Integration and Governance: Ask the Experts

Building an Ecosystem to Accelerate Data-Driven Innovation

A Roadmap for Future Architectures and Services for Manufacturing. Carsten Rückriegel Road4FAME-EU-Consultation Meeting Brussels, May, 22 nd 2015

Environment Canada Data Management Program. Paul Paciorek Corporate Services Branch May 7, 2014

Data collection architecture for Big Data

The Platform is the Planet

Data Science Body of Knowledge (DS-BoK): Approach and Initial

Addressing Big Data Issues in Scientific Data Infrastructure

Standardised SLAs: how far can we go? DIHC, Euro-Par 2013, Aachan John Kennedy Intel Labs Europe

Securing Big Data Learning and Differences from Cloud Security

European Data Infrastructure - EUDAT Data Services & Tools

The standards landscape in cloud

Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America

Future Trends in Big Data

Key Challenges in Cloud Computing to Enable Future Internet of Things

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

Big Data Analytics Nokia

Public Cloud Workshop Offerings

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Logical Data Models for Cloud Computing Architectures

DGE /DG Connect

Global Research Data Infrastructure: Path Forward for Progress. Dr. Chris Greer Senior Executive for Cyber Physical Systems

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing

Big Data for Government Symposium

Big Data and Education: Education and Skills Development in Big Data and Data Intensive Science

H2020-EUJ-2016: EU-Japan Joint Call. EUJ : IoT/Cloud/Big Data platforms in social application contexts

Requirements Specifications for: The Management Action Record System (MARS) for the African Development Bank

Open Data and Big Data at NOAA

The role of standards in driving cloud computing adoption

Towards a Big Data Taxonomy. Bill Mandrick, PhD Data Tactics Version 26_August_2013

Architecting Your Company. Ann Winblad Co-Founder and Managing Director

Big Data in the Cloud for Education Institutions

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

PHEME Veracity: The 4 th Challenge of Big Data

How to avoid building a data swamp

ICT : Internet of Things and Platforms for Connected Smart Objects

DEFINING GENERIC ARCHITECTURE FOR CLOUD IAAS PROVISIONING MODEL

Standards for Big Data in the Cloud

OUTSIDE-IN Transforming Enterprise IT

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Research Data Alliance - Research Data Sharing without barriers Big Data & Open Data Workshop 2014, Brussels 7-8 May 2014

Exploiting the power of Big Data

BIG DATA-AS-A-SERVICE

Transcription:

Big Data Standardisation in Industry and Research EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam

Outline Standardisation on Big Data Overview Research Data Alliance (RDA) and related initiatives PID and ORCID NIST Big Data Working Group (NBD-WG) activities and deliverables Conceptual approach: Big Data Architecture Framework (BDAF) by UvA 15 October 2013, ICS2013 Big Data Standardisation Slide_2

Big Data Standardisation Initiatives First attempts by industry associations: ODCA, TMF Big Data and Data Analytics architectures By the major data analytics providers, e.g. IBM, LexisNexis By the major Cloud Service Providers: AWS Big Data Services, Microsoft Azure HDInsight, LexisNexis HPCC Systems Research Data Alliance (RDA) Valuable work on Data Models, Metadata Registries, Trusted Registries Research community initiatives PID (Persistent Identifier) ORCID (Open Researcher and Contributor ID) Open Access to Research Data and Information NIST Big Data Working Group (NBD-WG) Big Data Definition and Reference Architecture Big Data technology roadmap Standardisation goals Common vocabulary Capabilities Stakeholders and actors Technology Roadmap 15 October 2013, ICS2013 Big Data Standardisation 3

Research Data Alliance First Steps http://www.rd-alliance.org/ Joint initiative EC, NSF, NIST: launched October 2012 RDA1 March 2013 (Gothenburg), RDA2 Sept 2013 (Washington), RDA3 March 2014 (Dublin), RDA4 Sept 2014 (Amsterdam) Positioned as a community forum and not standardisation body Working Groups created Data Foundation and Terminology Harmonization and Use of PID Information Types Data Type Registries Metadata Practical Policy (based on irods community practice) UPC (Universal Product Code) Code for Data Publication/Data Citation/Linking Repository Audit and Certification, Legal Interoperability Big Data Analytics (evaluation and study) Data Intensive Science Education and Skills development Number of application domains 15 October 2013, ICS2013 Big Data Standardisation 4

Persistent Identifier (PID) PID Persistent Identifier for Digital Objects Managed by European PID Consortium (EPIC) http://www.pidconsortium.eu/ Superset of DOI - Digital Object Identifier (http://www.doi.org/) Handle System by CNRI (Corporation for National Research Initiatives) for resolving DOI (http://www.handle.net/) PID provides a mechanism to link data during the whole research data transformation (life)cycle EPIC RESTful Web Service API published May 2013 15 October 2013, ICS2013 Big Data Standardisation 5

NIST Big Data Working Group (NBD-WG) First deliverables target September 2013 30 September Workshop and F2F meeting Activities: Conference calls every day 17-19:00 (CET) by subgroup - http://bigdatawg.nist.gov/home.php Big Data Definition and Taxonomies Requirements and use cases Big Data Security Reference Architecture Technology Roadmap BigdataWG mailing list and useful documents Input documents http://bigdatawg.nist.gov/show_inputdoc2.php Big Data Reference Architecture http://bigdatawg.nist.gov/_uploadfiles/m0226_v2_1885676266.docx Big Data definition and taxonomy http://bigdatawg.nist.gov/_uploadfiles/m0024_v1_6763872254.docx Prospective ISO Big Data Study Committee to be started 15 October 2013, ICS2013 Big Data Standardisation 6

Data Provider NIST Big Data Reference Architecture Draft version 0.7, 26 Sept 2013 DATA SW Data Consumer S e c u r i t y & P r i v a c y M a n a g e m e n t I T VA L U E C H A I N DATA SW Big Data Application Provider Collection I N F O R M AT I O N VA L U E C H A I N Curation System Orchestrator Analytics Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Horizontally Scalable Platforms (databases, etc.) Horizontally Scalable Infrastructures Horizontally Scalable (VM clusters) Visualization Vertically Scalable Vertically Scalable Vertically Scalable Access DATA SW Main Component Data Provider Big Data Application Provider Big Data Framework Provider Data Consumer System Orchestrator Physical and Virtual Resources (networking, computing, etc.) K E Y : Service Use DATA SW Big Data Information Flow SW Tools and Algorithms Transfer 15 October 2013, ICS2013 Big Data Standardisation 7

http://mattturck.com/2012/10/15/a-chart-of-the-big-data-ecosystem-take-2/ [Dec 2012] 15 October 2013, ICS2013 Big Data Standardisation 8

Conceptual approach: Big Data Architecture Framework (BDAF) by UvA Big Data definition: From 5+1Vs to 5 parts Big Data Architecture Framework (BDAF) components Data Lifecycle Management model Partly implemented in NIST Big Data definition and Architecture Big Data Architecture Framework (BDAF) by UvA Architecture Framework and Components for the Big Data Ecosystem. SNE Technical Report 2013-02, Version 0.2, 12 September http://www.uazone.org/demch/worksinprogress/sne-2013-02-techreport-bdaf-draft02.pdf 15 October 2013, ICS2013 Big Data Standardisation 9

Improved: 5+1 V s of Big Data Volume Variety Structured Unstructured Multi-factor Probabilistic Linked Dynamic Changing data Changing model Linkage Variability Terabytes Records/Arch Tables, Files Distributed 6 Vs of Big Data Trustworthiness Authenticity Origin, Reputation Availability Accountability Velocity Batch Real/near-time Processes Streams Value Correlations Statistical Events Hypothetical Generic Big Data Properties Volume Variety Velocity Acquired Properties (after entering system) Value Veracity Variability Commonly accepted 3V s of Big Data Veracity 15 October 2013, ICS2013 Big Data Standardisation 10

Big Data Definition: From 5+1V to 5 Parts (1) (1) Big Data Properties: 5V Volume, Variety, Velocity, Value, Veracity Additionally: Data Dynamicity (Variability) (2) New Data Models Data linking, provenance and referral integrity Data Lifecycle and Variability/Evolution (3) New Analytics Real-time/streaming analytics, interactive and machine learning analytics (4) New Infrastructure and Tools High performance Computing, Storage, Network Heterogeneous multi-provider services integration New Data Centric (multi-stakeholder) service models New Data Centric security models for trusted infrastructure and data processing and storage (5) Source and Target High velocity/speed data capture from variety of sensors and data sources Data delivery to different visualisation and actionable systems and consumers Full digitised input and output, (ubiquitous) sensor networks, full digital control 15 October 2013, ICS2013 Big Data Standardisation 11

Defining Big Data Architecture Framework Architecture vs Ecosystem Big Data undergo a number of transformations during their lifecycle Big Data fuel the whole transformation/value chain Data sources and data consumers, target data usage Multi-dimensional relations between Data models and data driven processes Infrastructure components and data centric services Architecture vs Architecture Framework (Stack) To separate concerns and factors Control and Management functions, orthogonal factors Architecture Framework components are inter-related 15 October 2013, ICS2013 Big Data Standardisation 12

Big Data Architecture Framework (BDAF) for Big Data Ecosystem (BDE) (1) Data Models, Structures, Types Data formats, non/relational, file systems, etc. (2) Big Data Management Big Data Lifecycle (Management) Model Big Data transformation/staging Provenance, Curation, Archiving (3) Big Data Analytics and Tools Big Data Applications Target use, presentation, visualisation (4) Big Data Infrastructure (BDI) Storage, Compute, (High Performance Computing,) Network Sensor network, target/actionable devices Big Data Operational support (5) Big Data Security Data security in-rest, in-move, trusted processing environments 15 October 2013, ICS2013 Big Data Standardisation 13

Big Data Infrastructure and Analytics Tools Big Data Infrastructure Heterogeneous multi-provider inter-cloud infrastructure Data management infrastructure Federated Access and Delivery Infrastructure (FADI) Advanced high performance (programmable) network Security infrastructure Big Data Analytics High Performance Computer Clusters (HPCC) Analytics/processing: Realtime, Interactive, Batch, Streaming Big Data Analytics tools and applications 15 October 2013, ICS2013 Big Data Standardisation 14

Summary and Topics for discussion Big Data is a multifaceted technology domain with complex ecosystem like relations between components There is no currently consistent Big Data definition and standardisation activity NIST BD-WG is a first such attempt Cloud Computing is a natural platform for Big Data infrastructure and services and yet to evolve to meet Big Data requirements Big Data technologies yet to move closer to a general user and create more Open Source products 15 October 2013, ICS2013 Big Data Standardisation 15