Data sharing and Big Data in the physical sciences. 2 October 2015

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Data sharing and Big Data in the physical sciences. 2 October 2015"

Transcription

1 Data sharing and Big Data in the physical sciences 2 October 2015

2 Content Digital curation: Data and metadata Why consider the physical sciences? Astronomy: Video Physics: LHC for example. Video The Research Data Alliance (RDA) ITNPD4: Applications of Big Data 2

3 Digital curation (from JISC Digital Curation Centre) Digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle. The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. Meanwhile, curated data in trusted digital repositories may be shared among the wider UK research community. As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research. See ITNPD4: Applications of Big Data 3

4 Data and metadata Metadata is Data about data That is, it is descriptions of data. May be structural metadata Data about the format of some other (large set of) data Structure of files, structures of containers of data and/or descriptive metadata E.g. where and when was this record created May be subclassed into Guide metadata: helps people to find particular data Business metadata: further information about the dataset content The descriptive metadata required is very data type and task dependent ITNPD4: Applications of Big Data 4

5 Data without metadata is generally not usable at all. Structural metadata is required to enable syntactically correct reading of the data Descriptive metadata is required to enable semantically correct processing of data Metadata standards See and despair! Metadata usage Everywhere e.g. photographs, music, scientific datasets, Long-term usage of data requires explicit metadata. ITNPD4: Applications of Big Data 5

6 Why the physical sciences? Concerned with external reality Lots of scientists working on related problems Working on the same systems Interested in similar problems (Possibly) not concerned with secrecy Depending whether they are in a University or at a military establishment! Long history of sharing data and papers Which was the root of the WWW itself Very early adopters of big data and data sharing technologies And of data curation and metadata as well ITNPD4: Applications of Big Data 6

7 Astronomy Long history of sharing datasets. See slides from documents/webpage/pga_ pdf ITNPD4: Applications of Big Data 7

8 Physics Berners-Lee created the WWW (essentially a web application for the Internet) In order to allow physicists at CERN and elsewhere to share papers And later, datasets as well Physicists are used to dealing with large data volumes And helped to develop many of the computational techniques associated with Big Data. ITNPD4: Applications of Big Data 8

9 Big data and the Large Hadron Collider (LHC) The LHC produces millions of collisions every second in each detector, generating approximately one petabyte of data per second. None of today s computing systems are capable of recording such rates, so sophisticated selection systems are used for a first fast electronic pre-selection, only passing one out of 10,000 events. Tens of thousands of processor cores then select 1% of the remaining events for analysis. Even after such drastic data reduction, the four big experiments, ALICE, ATLAS, CMS and LHCb, together need to store over 25 petabytes per year. The LHC data are aggregated in the CERN Data Centre, which performs initial data reconstruction is performed, and a copy is archived to long-term tape storage. Another copy is sent to several large data centres around the world. ITNPD4: Applications of Big Data 9

10 LHC and big data About 20 to 25PBytes/year (after throwing away 99% of what s recorded) 150 datacenters worldwide Structured data: ROOT structured files Compressed binary files: efficient storage Self-documenting: helps with changing requirements Open-source: not proprietary at all Backwards-compatible: data will be re-analysed years hence Including metadata ITNPD4: Applications of Big Data 10

11 Research Data Alliance (RDA) Cross-disciplinary international organisation The Research Data Alliance (RDA) builds the social and technical bridges that enable open sharing of data. The RDA vision is researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society. Breadth: see ITNPD4: Applications of Big Data 11

Big Data and Storage Management at the Large Hadron Collider

Big Data and Storage Management at the Large Hadron Collider Big Data and Storage Management at the Large Hadron Collider Dirk Duellmann CERN IT, Data & Storage Services Accelerating Science and Innovation CERN was founded 1954: 12 European States Science for Peace!

More information

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 (Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:

More information

US NSF s Scientific Software Innovation Institutes

US NSF s Scientific Software Innovation Institutes US NSF s Scientific Software Innovation Institutes S 2 I 2 awards invest in long-term projects which will realize sustained software infrastructure that is integral to doing transformative science. (Can

More information

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from

More information

Research Data - Basics and Results of ETD s. Prof. Dr. Peter Schirmbacher

Research Data - Basics and Results of ETD s. Prof. Dr. Peter Schirmbacher Humboldt-Universität zu Berlin Research Data - Basics and Results of ETD s Prof. Dr. Peter Schirmbacher http://www.schirmbacher.de schirmbacher@cms.hu-berlin.de Humboldt-Universität zu Berlin Agenda: Research

More information

OpenAIRE Research Data Management Briefing paper

OpenAIRE Research Data Management Briefing paper OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement

More information

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer Research Data Alliance: Current Activities and Expected Impact SGBD Workshop, May 2014 Herman Stehouwer The Vision 2 Researchers and innovators openly share data across technologies, disciplines, and countries

More information

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil Volker Büge 1, Marcel Kunze 2, OIiver Oberst 1,2, Günter Quast 1, Armin Scheurer 1 1) Institut für Experimentelle

More information

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking

More information

Solving the Mysteries of the Universe with Big Data

Solving the Mysteries of the Universe with Big Data Solving the Mysteries of the Universe with Big Data Sverre Jarp Former CERN openlab CTO Big Data Innovation Summit, Stockholm, 8 th May 2014 Accelerating Science and Innovation 1 What is CERN? The European

More information

Invenio: A Modern Digital Library for Grey Literature

Invenio: A Modern Digital Library for Grey Literature Invenio: A Modern Digital Library for Grey Literature Jérôme Caffaro, CERN Samuele Kaplun, CERN November 25, 2010 Abstract Grey literature has historically played a key role for researchers in the field

More information

Shared Computing Driving Discovery: From the Large Hadron Collider to Virus Hunting. Frank Würthwein

Shared Computing Driving Discovery: From the Large Hadron Collider to Virus Hunting. Frank Würthwein Shared Computing Driving Discovery: From the Large Hadron Collider to Virus Hunting Frank Würthwein Professor of Physics University of California San Diego February 14th, 2015 The Science of the LHC The

More information

Data And Software Preservation for Open Science (DASPOS)

Data And Software Preservation for Open Science (DASPOS) Data And Software Preservation for Open Science (DASPOS) For the past few years, the worldwide High Energy Physics (HEP) Community has been developing the background principles and foundations for a community-wide

More information

Solving the Mysteries of the Universe with Big Data

Solving the Mysteries of the Universe with Big Data Solving the Mysteries of the Universe with Big Data Sverre Jarp CERN openlab CTO Big Data Innovation Summit, Boston, 12 th September 2013 Accelerating Science and Innovation 1 What is CERN? The European

More information

LJMU Research Data Policy: information and guidance

LJMU Research Data Policy: information and guidance LJMU Research Data Policy: information and guidance Prof. Director of Research April 2013 Aims This document outlines the University policy and provides advice on the treatment, storage and sharing of

More information

Computing at the HL-LHC

Computing at the HL-LHC Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory Group ALICE: Pierre Vande Vyvre, Thorsten Kollegger, Predrag Buncic; ATLAS: David Rousseau, Benedetto Gorini,

More information

CMS: Challenges in Advanced Computing Techniques (Big Data, Data reduction, Data Analytics)

CMS: Challenges in Advanced Computing Techniques (Big Data, Data reduction, Data Analytics) CMS: Challenges in Advanced Computing Techniques (Big Data, Data reduction, Data Analytics) With input from: Daniele Bonacorsi, Ian Fisk, Valentin Kuznetsov, David Lange Oliver Gutsche CERN openlab technical

More information

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster business decisions 8 The ATLAS experiment

More information

From Distributed Computing to Distributed Artificial Intelligence

From Distributed Computing to Distributed Artificial Intelligence From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms

More information

USGS Guidelines for the Preservation of Digital Scientific Data

USGS Guidelines for the Preservation of Digital Scientific Data USGS Guidelines for the Preservation of Digital Scientific Data Introduction This document provides guidelines for use by USGS scientists, management, and IT staff in technical evaluation of systems for

More information

dcache, Software for Big Data

dcache, Software for Big Data dcache, Software for Big Data Innovation Day 2013, Berlin Patrick Fuhrmann dcache Innovation Day Berlin Patrick Fuhrmann 10 December 2013 1 About Technology and further roadmap Collaboration and partners

More information

Benefits of managing and sharing your data

Benefits of managing and sharing your data Benefits of managing and sharing your data Research Data Management Support Services UK Data Service University of Essex April 2014 Overview Introduction to the UK Data Archive What is data management?

More information

Linking raw data with scientific workflow and software repository: some early

Linking raw data with scientific workflow and software repository: some early Linking raw data with scientific workflow and software repository: some early experience in PanData-ODI Erica Yang, Brian Matthews Scientific Computing Department (SCD) Rutherford Appleton Laboratory (RAL)

More information

An Introduction to Managing Research Data

An Introduction to Managing Research Data An Introduction to Managing Research Data Author University of Bristol Research Data Service Date 1 August 2013 Version 3 Notes URI IPR data.bris.ac.uk Copyright 2013 University of Bristol Within the Research

More information

NT1: An example for future EISCAT_3D data centre and archiving?

NT1: An example for future EISCAT_3D data centre and archiving? March 10, 2015 1 NT1: An example for future EISCAT_3D data centre and archiving? John White NeIC xx March 10, 2015 2 Introduction High Energy Physics and Computing Worldwide LHC Computing Grid Nordic Tier

More information

Capitalizing on Big Data

Capitalizing on Big Data Capitalizing on Big Data CARL s response to the consultation document Capitalizing on Big Data: Toward a Policy Framework for Advancing Digital Scholarship in Canada December 12 th 2013 Who we are The

More information

Shibbolized irods (and why it matters)

Shibbolized irods (and why it matters) Shibbolized irods (and why it matters) 3 rd TERENA Storage Meeting, Dublin, February 12 th -13 th 2009 David Corney, for Jens Jensen, e-science centre, Rutherford Appleton Lab, UK Overview Introduction

More information

Building an Open Data Infrastructure for Science: Turning Policy into Practice

Building an Open Data Infrastructure for Science: Turning Policy into Practice Building an Open Infrastructure for Science: Turning Policy into Practice Juan Bicarregui Head of Services Division STFC Department of Scientific Computing Franco-British Workshop on Big in Science, November

More information

Long-term preservation in Europe. The strategy of the Alliance for Permanent Access

Long-term preservation in Europe. The strategy of the Alliance for Permanent Access Long-term preservation in Europe The strategy of the Alliance for Permanent Access Hans Jansen Director R&D, National Library of the Netherlands Rome, 29 October 2007 The first objective of the i2010 strategy

More information

GRID computing at LHC Science without Borders

GRID computing at LHC Science without Borders GRID computing at LHC Science without Borders Kajari Mazumdar Department of High Energy Physics Tata Institute of Fundamental Research, Mumbai. Disclaimer: I am a physicist whose research field induces

More information

Information Management

Information Management Information Management Digital Asset Management (DAM) Si Jung Jun Kim, PhD University of Central Florida This slide was modified based on ppt files made by Prof. J. Michael Moshell Original image* by Moshell

More information

CERN s Scientific Programme and the need for computing resources

CERN s Scientific Programme and the need for computing resources This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available at

More information

Evolution of Database Replication Technologies for WLCG

Evolution of Database Replication Technologies for WLCG Home Search Collections Journals About Contact us My IOPscience Evolution of Database Replication Technologies for WLCG This content has been downloaded from IOPscience. Please scroll down to see the full

More information

CERN analysis preservation (CAP) - Use Cases. Sünje Dallmeier Tiessen, Patricia Herterich, Peter Igo-Kemenes, Tibor Šimko, Tim Smith

CERN analysis preservation (CAP) - Use Cases. Sünje Dallmeier Tiessen, Patricia Herterich, Peter Igo-Kemenes, Tibor Šimko, Tim Smith CERN analysis preservation (CAP) - Use Cases Sünje Dallmeier Tiessen, Patricia Herterich, Peter Igo-Kemenes, Tibor Šimko, Tim Smith Created in April 2015, published in November 2015 Abstract In this document

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative

More information

Introduction to DAM Essay 1 November 2014

Introduction to DAM Essay 1 November 2014 Introduction to DAM Essay 1 November 2014 A digital asset management (DAM) project to properly curate, organize and control photographic assets for the University Communications (UC) department has been

More information

Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version

Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version The Particle Physics Experiment Consolidated Grant proposals now being submitted

More information

Considerations for Research Data Management

Considerations for Research Data Management Considerations for Research Data Management Andrew Dean - OCF adean@ocf.co.uk - 07508 033894 Wednesday 3 rd December 2014 What is an RDM solution? Research Data Management A method of effectively managing

More information

Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre

Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre Canadian Astronomy Data Centre Séverin Gaudet David Schade Canadian Astronomy Data Centre Data Activities in Astronomy Features of the astronomy data landscape Multi-wavelength datasets are increasingly

More information

The Australian National Data Service. Ross Wilkinson For the University of New England March 24 th, 2010

The Australian National Data Service. Ross Wilkinson For the University of New England March 24 th, 2010 The Australian National Data Service Ross Wilkinson For the University of New England March 24 th, 2010 1 Training to Climb an Everest of Digital Data October 11, 2009 MOUNTAIN VIEW, Calif. It is a rare

More information

Open access to data and analysis tools from the CMS experiment at the LHC

Open access to data and analysis tools from the CMS experiment at the LHC Open access to data and analysis tools from the CMS experiment at the LHC Thomas McCauley (for the CMS Collaboration and QuarkNet) University of Notre Dame, USA thomas.mccauley@cern.ch! 5 Feb 2015 Outline

More information

BIG DATA: DATA EVERYWHERE

BIG DATA: DATA EVERYWHERE Line Pouchard, PhD Purdue Libraries, Research Data 03/10/2015 BIG DATA INTEREST GROUP Issues in Big Data Cura/on BIG DATA: DATA EVERYWHERE DEFINITIONS OF DATA CURATION Data curation is a term used to indicate

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EISCAT User Meeting, Uppsala,6 May 2013 2 Exponential growth Data trends Zettabytes

More information

Human Brain Project -

Human Brain Project - Human Brain Project - Scientific goals, Organization, Our role Wissenswerte, Bremen 26. Nov 2013 Prof. Sonja Grün Insitute of Neuroscience and Medicine (INM-6) & Institute for Advanced Simulations (IAS-6)

More information

Data Management Considerations for the Data Life Cycle

Data Management Considerations for the Data Life Cycle Data Management Considerations for the Data Life Cycle NRC STS Panel 2011 November 17, 2011, Washington DC Peter Fox (RPI) foxp@rpi.edu, pfox@cs.rpi.edu Tetherless World Constellation http://tw.rpi.edu

More information

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020 EUROPEAN COMMISSION Directorate-General for Research & Innovation Guidelines on Data Management in Horizon 2020 Version 2.0 30 October 2015 1 Introduction In Horizon 2020 a limited and flexible pilot action

More information

Life Sciences and Large Data Challenges

Life Sciences and Large Data Challenges Life Sciences and Large Data Challenges David Fergusson Head of Scientific Computing The Francis Crick Institute WHAT IS THE CRICK? The Francis Crick Institute Sir Paul Nurse Nobel Prize with Hartwell

More information

USGS Data Management Training Modules the Value of Data Management

USGS Data Management Training Modules the Value of Data Management USGS Data Management Training Modules the Value of Data Management Welcome to the USGS Data Management Training Modules, a three part training series that will guide you in understanding and practicing

More information

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,

More information

Status and Evolution of ATLAS Workload Management System PanDA

Status and Evolution of ATLAS Workload Management System PanDA Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The

More information

Jean Sykes The UK Research Data Service Project

Jean Sykes The UK Research Data Service Project Jean Sykes The UK Research Data Service Project Keynote Item Original citation: Originally presented at ESRDS Social Science Week, 13 March 2009, London, UK. This version available at: http://eprints.lse.ac.uk/25373/

More information

LIBER Case Study: University of Oxford Research Data Management Infrastructure

LIBER Case Study: University of Oxford Research Data Management Infrastructure LIBER Case Study: University of Oxford Research Data Management Infrastructure AuthorS: Dr James A. J. Wilson, University of Oxford, james.wilson@it.ox.ac.uk Keywords: generic, institutional, software

More information

LHC Databases on the Grid: Achievements and Open Issues * A.V. Vaniachine. Argonne National Laboratory 9700 S Cass Ave, Argonne, IL, 60439, USA

LHC Databases on the Grid: Achievements and Open Issues * A.V. Vaniachine. Argonne National Laboratory 9700 S Cass Ave, Argonne, IL, 60439, USA ANL-HEP-CP-10-18 To appear in the Proceedings of the IV International Conference on Distributed computing and Gridtechnologies in science and education (Grid2010), JINR, Dubna, Russia, 28 June - 3 July,

More information

Context-aware cloud computing for HEP

Context-aware cloud computing for HEP Department of Physics and Astronomy, University of Victoria, Victoria, British Columbia, Canada V8W 2Y2 E-mail: rsobie@uvic.ca The use of cloud computing is increasing in the field of high-energy physics

More information

Digital Preservation: the need for an open source digital archival and preservation system for small to medium sized collections,

Digital Preservation: the need for an open source digital archival and preservation system for small to medium sized collections, Digital Preservation: the need for an open source digital archival and preservation system for small to medium sized collections, Kevin Bradley ABSTRACT: Though the solution to all of the problems of digital

More information

Performance Monitoring of the Software Frameworks for LHC Experiments

Performance Monitoring of the Software Frameworks for LHC Experiments Proceedings of the First EELA-2 Conference R. mayo et al. (Eds.) CIEMAT 2009 2009 The authors. All rights reserved Performance Monitoring of the Software Frameworks for LHC Experiments William A. Romero

More information

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/ An Integrated CyberSecurity Approach for HEP Grids Workshop Report http://hpcrd.lbl.gov/hepcybersecurity/ 1. Introduction The CMS and ATLAS experiments at the Large Hadron Collider (LHC) being built at

More information

HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1.

HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1. Computer Science Vol. 9 2008 Andrzej Olszewski HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS The demand for computing resources used for detector simulations and data analysis in High Energy

More information

QUALITY ASSURANCE PLAN CONTENTS AND STATUS

QUALITY ASSURANCE PLAN CONTENTS AND STATUS CERN CH-1211 Geneva 23 Switzerland LHC Project Document No. CERN Div./Group or Supplier/Contractor Document No. the Large Hadron Collider project EDMS Document No. 103545 Date:1999-11-17 Quality Assurance

More information

US Korea Joint Workshop on Digital Libraries August 10-11, 2000 San Diego Supercomputer Center San Diego, California

US Korea Joint Workshop on Digital Libraries August 10-11, 2000 San Diego Supercomputer Center San Diego, California US Korea Joint Workshop on Digital Libraries August 10-11, 2000 San Diego Supercomputer Center San Diego, California 1. Executive Summary There are many barriers to the worldwide development of digital

More information

A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant

A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant A Physics Approach to Big Data Adam Kocoloski, PhD CTO Cloudant 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Solenoidal Tracker at RHIC (STAR) The life of LHC data Detected by experiment Online

More information

The BEAR Management Group will report to the University Research Committee.

The BEAR Management Group will report to the University Research Committee. Research Data Storage Policy Introduction The University has committed to provide mechanisms and services for storage, backup, registration, deposit and retention of research data assets in support of

More information

We invented the Web. 20 years later we got Drupal.

We invented the Web. 20 years later we got Drupal. We invented the Web. 20 years later we got Drupal. CERN s perspective on adopting Drupal as a platform. DrupalCon, London 2011 Today we ll look at. What is CERN? Challenges of the web at CERN Why Drupal

More information

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011 Global Scientific Data Infrastructures: The Big Data Challenges Capri, 12 13 May, 2011 Data-Intensive Science Science is, currently, facing from a hundred to a thousand-fold increase in volumes of data

More information

Electronic Records Management, Preservation, and Best Practices in Indiana Government

Electronic Records Management, Preservation, and Best Practices in Indiana Government Electronic Records Management, Preservation, and Best Practices in Indiana Government Indiana Commission on Public Records Today s Agenda Updates and Initiatives from ICPR Retention Requirements for Electronic

More information

Procurement Innovation for Cloud Services in Europe

Procurement Innovation for Cloud Services in Europe Procurement Innovation for Cloud Services in Europe CERN 14 May 2014 Bob Jones (CERN) This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0

More information

Designated Communities, OAIS and Assessment

Designated Communities, OAIS and Assessment Designated Communities, OAIS and Assessment Hervé L Hours Preservation Manager UK Data Service at the UK Data Archive DPC: Practical Preservation and People PRONI 2015-12-03 UK Data Service Me History

More information

Computing Strategic Review. December 2015

Computing Strategic Review. December 2015 Computing Strategic Review December 2015 Front cover bottom right image shows a view of dark matter at redshift zero in the Eagle simulation, by the Virgo Consortium using the DiRAC Facility (Durham Data

More information

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration Revised Proposal from The National Academies Summary An NRC-appointed committee will plan and organize a cross-disciplinary

More information

Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín

Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín Big Data Analytics for the Exploitation of the CERN Accelerator Complex Antonio Romero Marín Milan 11/03/2015 Oracle Big Data and Analytics @ Work 1 What is CERN CERN - European Laboratory for Particle

More information

Research Data Management: The library s role

Research Data Management: The library s role CILIP Executive Briefings 2014 Research Data Management: The library s role Tuesday 20 May 2014 Organised by #CILIPRDM @CILIPEvents Engaging with students and researchers: the case of the social sciences

More information

State Records Guideline No 15. Recordkeeping Strategies for Websites and Web pages

State Records Guideline No 15. Recordkeeping Strategies for Websites and Web pages State Records Guideline No 15 Recordkeeping Strategies for Websites and Web pages Table of Contents 1 Introduction... 4 1.1 Purpose... 4 1.2 Authority... 5 2 Recordkeeping business requirements... 5 2.1

More information

New Jersey Big Data Alliance

New Jersey Big Data Alliance Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation New Jersey Big Data Alliance Manish Parashar Director, Rutgers Discovery Informatics Institute (RDI 2 ) Professor,

More information

DATA MIGRATION IN ARCHIVES OF SERBIA AND MONTENEGRO CONCEPT AND EXAMPLE

DATA MIGRATION IN ARCHIVES OF SERBIA AND MONTENEGRO CONCEPT AND EXAMPLE Преглед НЦД 5 (2004), 83 88 (Archives of Serbia and Montenegro) DATA MIGRATION IN ARCHIVES OF SERBIA AND MONTENEGRO CONCEPT AND EXAMPLE Abstract. We present organization of database used to store metadata

More information

Applying the OAIS standard to CCLRC s British Atmospheric Data Centre and the Atlas Petabyte Storage Service

Applying the OAIS standard to CCLRC s British Atmospheric Data Centre and the Atlas Petabyte Storage Service Applying the OAIS standard to CCLRC s British Atmospheric Centre and the Atlas Petabyte Storage Service Corney, D.R., De Vere, M., Folkes, T., Giaretta, D., Kleese van Dam, K., Lawrence, B. N., Pepler,

More information

Performance monitoring of the software frameworks for LHC experiments

Performance monitoring of the software frameworks for LHC experiments Performance monitoring of the software frameworks for LHC experiments William A. Romero R. wil-rome@uniandes.edu.co J.M. Dana Jose.Dana@cern.ch First EELA-2 Conference Bogotá, COL OUTLINE Introduction

More information

The challenges of becoming a Trusted Digital Repository

The challenges of becoming a Trusted Digital Repository The challenges of becoming a Trusted Digital Repository Annemieke de Jong is Preservation Officer at the Netherlands Institute for Sound and Vision (NISV) in Hilversum. She is responsible for setting out

More information

Science Gateways in the US. Nancy Wilkins-Diehr wilkinsn@sdsc.edu

Science Gateways in the US. Nancy Wilkins-Diehr wilkinsn@sdsc.edu Science Gateways in the US Nancy Wilkins-Diehr wilkinsn@sdsc.edu NSF vision for cyberinfrastructure in the 21st century Software is critical to today s scientific advances Science is all about connections

More information

ADVANCEMENTS IN BIG DATA PROCESSING IN THE ATLAS AND CMS EXPERIMENTS 1. A.V. Vaniachine on behalf of the ATLAS and CMS Collaborations

ADVANCEMENTS IN BIG DATA PROCESSING IN THE ATLAS AND CMS EXPERIMENTS 1. A.V. Vaniachine on behalf of the ATLAS and CMS Collaborations ADVANCEMENTS IN BIG DATA PROCESSING IN THE ATLAS AND CMS EXPERIMENTS 1 A.V. Vaniachine on behalf of the ATLAS and CMS Collaborations Argonne National Laboratory, 9700 S Cass Ave, Argonne, IL, 60439, USA

More information

Implementing selection and appraisal policies at the UK Data Archive

Implementing selection and appraisal policies at the UK Data Archive Implementing selection and appraisal policies at the UK Data Archive K. Schürer UK Data Archive and Economic and Social Data Service DCC and DPC Workshop, Oxford 4 th July 2006 UKDA history & overview

More information

How to avoid building a data swamp

How to avoid building a data swamp How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make

More information

Research Data Management Plan

Research Data Management Plan Please consult the guidance notes for examples of answers to help you to complete this template Overview Researchers Name: Title of Research Project: Length of Project: e.g. 6 months, 3 years, etc A brief

More information

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP Archive and Preservation in the Cloud - Business Case, Challenges and Best Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP SNIA Legal Notice The material contained in this tutorial is copyrighted

More information

Site specific monitoring of multiple information systems the HappyFace Project

Site specific monitoring of multiple information systems the HappyFace Project Home Search Collections Journals About Contact us My IOPscience Site specific monitoring of multiple information systems the HappyFace Project This content has been downloaded from IOPscience. Please scroll

More information

E-mail: guido.negri@cern.ch, shank@bu.edu, dario.barberis@cern.ch, kors.bos@cern.ch, alexei.klimentov@cern.ch, massimo.lamanna@cern.

E-mail: guido.negri@cern.ch, shank@bu.edu, dario.barberis@cern.ch, kors.bos@cern.ch, alexei.klimentov@cern.ch, massimo.lamanna@cern. *a, J. Shank b, D. Barberis c, K. Bos d, A. Klimentov e and M. Lamanna a a CERN Switzerland b Boston University c Università & INFN Genova d NIKHEF Amsterdam e BNL Brookhaven National Laboratories E-mail:

More information

CiCS INFORMATION TECHNOLOGY STRATEGY 2008 2013 1. PURPOSE 2. CONTEXT. 2.1 Information Technology. 2.2 Changing Environment. 2.

CiCS INFORMATION TECHNOLOGY STRATEGY 2008 2013 1. PURPOSE 2. CONTEXT. 2.1 Information Technology. 2.2 Changing Environment. 2. CiCS INFORMATION TECHNOLOGY STRATEGY 2008 2013 1. PURPOSE This strategy replaces the previous IT strategy and is an integral part of the overall strategy being developed by Corporate Information and Computing

More information

IBM Solution Framework for Lifecycle Management of Research Data. 2008 IBM Corporation

IBM Solution Framework for Lifecycle Management of Research Data. 2008 IBM Corporation IBM Solution Framework for Lifecycle Management of Research Data Aspects of Lifecycle Management Research Utilization of research paper Usage history Metadata enrichment Usage Pattern / Citation Collaboration

More information

SHARING RESEARCH DATA POLICY, INFRASTRUCTURE, PEOPLE

SHARING RESEARCH DATA POLICY, INFRASTRUCTURE, PEOPLE SHARING RESEARCH DATA POLICY, INFRASTRUCTURE, PEOPLE... VEERLE VAN DEN EYNDEN... MANAGER RESEARCH DATA MANAGEMENT SUPPORT SERVICES UNIVERSITY OF ESSEX, UK... CHPC NATIONAL MEETING 2010 7 DECEMBER 2010

More information

RESEARCH DATA MANAGEMENT POLICY

RESEARCH DATA MANAGEMENT POLICY Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular

More information

Service Road Map for ANDS Core Infrastructure and Applications Programs

Service Road Map for ANDS Core Infrastructure and Applications Programs Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external

More information

Research Data Management Guide

Research Data Management Guide Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that

More information

The challenge of managing research data. Axel Berg

The challenge of managing research data. Axel Berg The challenge of managing research data Axel Berg Context The data deluge cannot be stopped Without adequate data management: - the ever-growing amounts and complexity of data will be non-controllable

More information

Digital Asset Management Developing your Institutional Repository

Digital Asset Management Developing your Institutional Repository Digital Asset Management Developing your Institutional Repository Manny Bekier Director, Biomedical Communications Clinical Instructor, School of Public Health SUNY Downstate Medical Center Why DAM? We

More information

The Key Elements of Digital Asset Management

The Key Elements of Digital Asset Management The Key Elements of Digital Asset Management The last decade has seen an enormous growth in the amount of digital content, stored on both public and private computer systems. This content ranges from professionally

More information

IFI Irish Film Archive Digital Preservation & Access Strategy

IFI Irish Film Archive Digital Preservation & Access Strategy IFI Irish Film Archive Digital Preservation & Access Strategy Acknowledgements: This strategy document has been produced by the IFI Irish Film Archive team, and was written up by Kasandra O Connell, following

More information

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise WHITE PAPER A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise NOTICE This White Paper may contain proprietary information

More information

Library & Academic Computing Committee 18 October 2012 PAPER 05. College of Humanities and Social Sciences. Research data storage and management

Library & Academic Computing Committee 18 October 2012 PAPER 05. College of Humanities and Social Sciences. Research data storage and management College of Humanities and Social Sciences Research data storage and management Amendment & Authorisation History Ver Date Changes Name Author A Initial version FM FM E:\today\LACC\RDS-M (2).docx Created

More information

Research Support and European Research Libraries

Research Support and European Research Libraries Research Support and European Research Libraries Hans Geleijnse Library Strategy Adviser former Director Library and IT Services TiU past President of LIBER 18 April 2012 HG 1 Highlights from the report

More information

From Big Data to Smart Data Thomas Hahn

From Big Data to Smart Data Thomas Hahn Siemens Future Forum @ HANNOVER MESSE 2014 From Big to Smart Hannover Messe 2014 The Evolution of Big Digital data ~ 1960 warehousing ~1986 ~1993 Big data analytics Mining ~2015 Stream processing Digital

More information