Overview of state of art in Data management. Stefano Cozzini CNR/IOM and exact lab srl

Similar documents
Data-Intensive Science and Scientific Data Infrastructure

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, May, 2011

How To Useuk Data Service

OpenAIRE Research Data Management Briefing paper

Big Data in the context of Preservation and Value Adding

Part I Courses Syllabus

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Report of the DTL focus meeting on Life Science Data Repositories

SURFsara Data Services

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Data Intensive Research Initiative for South Africa (DIRISA)

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Data Management Plans - How to Treat Digital Sources

BIG DATA: DATA EVERYWHERE

LIBER Case Study: Author: Mijke Jetten, University Library, Radboud University,

Research Data Management Policy. Glasgow School of Art

Exploitation of ISS scientific data

SCAR report. SCAR Data Policy ISSN International Council for Science. No 39 June Scientific Committee on Antarctic Research

Databases & Data Infrastructure. Kerstin Lehnert

Considerations for Research Data Management

Horizon Research e-infrastructures Excellence in Science Work Programme Wim Jansen. DG CONNECT European Commission

European Data Infrastructure - EUDAT Data Services & Tools

Jochen Schirrwagen, Najko Jahn. Bielefeld University Library, Germany. Research in Context

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

A Policy Framework for Canadian Digital Infrastructure 1

Research Data Management Services. Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012

Scholarly Use of Web Archives

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

SHARING RESEARCH DATA POLICY, INFRASTRUCTURE, PEOPLE

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

Environment Canada Data Management Program. Paul Paciorek Corporate Services Branch May 7, 2014

Archive I. Metadata. 26. May 2015

Metadata as the Rosetta Stone to Data Management

EDISON Education for Data Intensive Science to Open New science frontiers

Digital Preservation and Access - The 5 Opens Project

Manjula Ambur NASA Langley Research Center April 2014

ESA Earth Observation Big Data R&D Past, Present, & Future Activities

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

Rationale and vision for E2E data standards: the need for a MDR

RESEARCH DATA MANAGEMENT POLICY

e-irg workshop Dublin May 2013 Track 1: Coordination of e-infrastructures

Library and University Collections Vision, Values, Strategic Goals and Key Performance Areas,

Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen

CIP s Open Data & Data Management Guidelines and Procedures

Software Design Proposal Scientific Data Management System

Long-term preservation in Europe. The strategy of the Alliance for Permanent Access

ODUM INSTITUTE ARCHIVE SERVICES OVERVIEW IASSIST 2015

data.bris: collecting and organising repository metadata, an institutional case study

curation, analyses and interpretation of massive datasets opportunities are varied across disciplines

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

Service Road Map for ANDS Core Infrastructure and Applications Programs

Cambridge University Library. Working together: a strategic framework

Workspaces Concept and functional aspects

THE EFFECTS OF BIG DATA ON INFRASTRUCTURE. Sakkie Janse van Rensburg Dr Dale Peters UCT 1

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

Open Access to Manuscripts, Open Science, and Big Data

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

Master Class 4 - Introduction to Intellectual Property and Copyright Wilna Macmillan, Director of Client Services, Library

Internet Technologies for Digital Libraries

Biodiversity Data Exchange Using PRAGMA Cloud

USGS Community for Data Integration

Data Management at UT

Outgoing VDI Gateways:

February 22, 2013 MEMORANDUM FOR THE HEADS OF EXECUTIVE DEPARTMENTS AND AGENCIES

Research Data Management Guide

Environmental Data Management:

2014 NASCIO Recognition Award Submission

How To Understand And Understand The Science Of Astronomy

Standard Big Data Architecture and Infrastructure

DMBI: Data Management for Bio-Imaging.

A Capability Maturity Model for Scientific Data Management

The mission of academic libraries is to support research, education,

Benefits of managing and sharing your data

INDIGO DataCloud. Technical Overview RIA INFN-Bari

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Image Data, RDA and Practical Policies

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations

The National Consortium for Data Science (NCDS)

HARNESSING BIG DATA WITHIN THE FEDERAL GOVERNMENT FINDINGS AND RECOMMENDATIONS OF ATARC S BIG DATA INNOVATION LAB DECEMBER, 2015

Using Message Brokering and Data Mediation to use Distributed Data Networks of Earth Science Data to Enhance Global Maritime Situational Awareness.

Considering the Way Forward for Data Science and International Climate Science

An Introduction to Managing Research Data

Enhanced Research Data Management and Publication with Globus

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future

Research Data Management - The Essentials

Service-Oriented Architectures

Sy Computing Services, Inc. TOP REASONS TO MOVE TO MICROSOFT EXCHANGE Prepared By:

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

RFI Summary: Executive Summary

Reaping the Rewards of Big Data

SAP Agile Data Preparation

The distribution of marine OpenData via distributed data networks and Web APIs. The example of ERDDAP, the message broker and data mediator from NOAA

University-wide Data Management Services: Cross-campus Collaborations at Indiana University

31 December Dear Sir:

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Transcription:

Overview of state of art in Data management Stefano Cozzini CNR/IOM and exact lab srl

AIM of this short talk Frame the problem and the discussion around DATA: What are big data? Which kind of challenges ahead of us? Disclaimer: Slides and numbers are taken around: there are a lot of data discussing big data J

Big Data: a buzzword..

Big Data: general definikon

The 3 V s of big data.. Velocity Data are produced at speed higher than the speed you are able to move/analyze and understand them.. Variety Data range from simulakon to remote sensing informakon, from instruments to market analysis etc.. datasets come in a variety of data formats and span a variety of metadata standards Volume The amount of data will increase of factor 61 in the next 10 years The amount of data is eskmated to exceed the size of available data infrastructure to store them by 60%. [from The 2011 IDC Digital Universe Study]

Data- intensive science A fourth paradigm a[er experiment, theory, and computakon.. It involves colleckng, exploring, visualizing, combining, subse\ng, analyzing, and using huge data colleckons Scientific Data Infrastructure 6 4

Challenges & Requirements Challenges: Deluge of observakonal data, exaflood of simulakon model outputs Need for collaborakon among groups, disciplines, communikes Finding insights and discoveries in a Sea of Data Requirements: New tools, techniques, and infrastructure Standards for interoperability InsKtuKonal support for data stewardship, curakon

IOPS vs FLOPS HPC is today compute- centric scienkfic compukng needs data accessibility rather than compukng speed Source: IDC DirecKon 2013

Roles in Data- intensive Science Scien&sts/researchers: acquire, generate, analyze, check, organize, format, document, share, publish research data Data users: access, understand, integrate, visualize, analyze, subset, and combine data Data scien&sts: develop infrastructure, standards, convenkons, frameworks, data models, Web- based technologies So2ware developers: develop tools, formats, interfaces, libraries, services Data curators: preserve data content and integrity of science data and metadata in archives Research funding agencies, professional socie&es, governments: encourage free and open access to research data, advocate eliminakon of most access restrickons Scientific Data Infrastructure 9

Infrastructure for sharing scienkfic ApplicaKons depend on lower layers Sharing requires agreements formats protocols convenkons Data needs metadata Data Scientific Data Infrastructure 10 6

Data infrastructure crucial aspects Large datasets: Deal with large datasets and large data rates from experiments Reliability : Increase the level of QoS and SLA, e.g. increasing reliability by replicakng data sources and increasing accessibility by copying source to several places AccounKng Allow monitoring and checking of resource usage IntegraKon Provide the same set of services that are understandable (compakble) between domains InteroperaKon InteroperaKon through common standard schemes Access Broadband data access Allow transparent and secure remote access to data Data preservakon Allow long- term availability of data High quality Quality of data to enable advanced and cross- disciplinary access and enrichment operakons Economic juskficakon As the scienkfic community is operakng on increasingly larger datasets and want to preserve the informakon concerned, the infrastructure provided should have a clear roadmap of technology exchange and backwards compakbility. Access control Provide the infrastructure to allow for fine- grained access control Source: e- irg blue paper on data management 2012 hhp://www.e- irg.eu/images/stories/disseminakon/e- irg- blue_paper_on_data_management_v_final.pdf

Data are not just for science.. High- end commercial analykcs pushing up into academia. The journey from science data to industry& commerce can be relakvely short..and plenty of ethical legal and societal implicakons