Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen



Similar documents
Image Data, RDA and Practical Policies

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

Running the scientific data archive

NT1: An example for future EISCAT_3D data centre and archiving?

The cloud storage service bwsync&share at KIT

Steinbuch Centre for Computing (SCC) The Information Technology Centre of KIT

Big Data Architectures and Technologies

Software Entwicklungen für das LSDF Datenmanagement

SURFsara Data Services

Big Data and Storage Management at the Large Hadron Collider

Storage CloudSim: A Simulation Environment for Cloud Object Storage Infrastructures

The challenge of managing research data. Axel Berg

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Why long time storage does not equate to archive

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

dcache, list of topics

GridKa Database Services

Smart Data Innovation Lab (SDIL)

ITIL and Grid services at GridKa CHEP 2009, March, Prague

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

CMIP6 Data Management at DKRZ

Workprogramme

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Global Grid User Support - GGUS - within the LCG & EGEE environment

Delivering a Campus Research Data Service with Globus. GlobusWorld 2014 Keynote

Digital Preservation Lifecycle Management

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

A Service for Data-Intensive Computations on Virtual Clusters

European Data Infrastructure - EUDAT Data Services & Tools

Scalable Services for Digital Preservation

Workspaces Concept and functional aspects

This vision will be accomplished by targeting 3 Objectives that in time are further split is several lower level sub-objectives:

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

A Binary Tree SMART CTO Webinar. Analyzing and Rationalizing Big Data in Messaging

How To Build A Cloud Storage System

The ANKA Archiving System

How To Use Bwgrid

Overview of state of art in Data management. Stefano Cozzini CNR/IOM and exact lab srl

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Integrating a heterogeneous and shared Linux cluster into grids

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

Cloud Computing mit mathematischen Anwendungen

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

SUPPORT FOR CMS EXPERIMENT AT TIER1 CENTER IN GERMANY

Human Brain Project -

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

and Deployment Roadmap for Satellite Ground Systems

NextGen Infrastructure for Big DATA Analytics.

CHAMELEON: A LARGE-SCALE, RECONFIGURABLE EXPERIMENTAL ENVIRONMENT FOR CLOUD RESEARCH

Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version

Introduction to Cloud Computing. Lecture 02 History of Enterprise Computing Kaya Oğuz

Exploitation of ISS scientific data

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Big Workflow: More than Just Intelligent Workload Management for Big Data

Protecting Big Data Data Protection Solutions for the Business Data Lake

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

Virtual InfiniBand Clusters for HPC Clouds

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

Global Grid User Support - GGUS - in the LCG & EGEE environment

irods at CC-IN2P3: managing petabytes of data

Patrick Fuhrmann. The DESY Storage Cloud

Linking raw data with scientific workflow and software repository: some early

Interagency Science Working Group. National Archives and Records Administration

Computing Strategic Review. December 2015

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, May, 2011

IBM ELASTIC STORAGE SEAN LEE

Key Considerations for Managing Big Data in the Life Science Industry

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Cray: Enabling Real-Time Discovery in Big Data

Invenio: A Modern Digital Library for Grey Literature

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Leveraging OpenStack Private Clouds

Computing at the HL-LHC

Shibbolized irods (and why it matters)

Preview of a Novel Architecture for Large Scale Storage

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Scaling up to Production

Project Number: Project Title: Human Brain Project. HBP_SP13_EPFL_ _D13.3.2_Final.docx

A Web-based Portal to Access and Manage WNoDeS Virtualized Cloud Resources

High Availability Databases based on Oracle 10g RAC on Linux

dcache, Software for Big Data

CMS: Challenges in Advanced Computing Techniques (Big Data, Data reduction, Data Analytics)

Scientific Cloud Computing Infrastructure for Europe Strategic Plan. Bob Jones,


Data Requirements from NERSC Requirements Reviews

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18

Data management challenges in todays Healthcare and Life Sciences ecosystems

Grid Computing vs Cloud

Considerations for Research Data Management

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

How to avoid building a data swamp

Italian Scientific Big Data Initiative

GridKa site report. Manfred Alef, Andreas Heiss, Jos van Wezel. Steinbuch Centre for Computing

Simple. Extensible. Open.

ADVANCEMENTS IN BIG DATA PROCESSING IN THE ATLAS AND CMS EXPERIMENTS 1. A.V. Vaniachine on behalf of the ATLAS and CMS Collaborations

Workshop Agenda Feb 25th 2015

Globus Research Data Management: Introduction and Service Overview

Architecture & Experience

Cluster, Grid, Cloud Concepts

Transcription:

Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen Achim Streit Steinbuch Centre for Computing (SCC) KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft www.kit.edu

The Challenge of Big Data in Science security volume variety theory insight big exploration 4th paradigm DATA SCIENCE in analysis federation algorithms visualisation simulation value velocity experiment veracity education life cycle management facility 2 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

The Challenge of Big Data in Science theory insight big security How to shape curricula analysis for future data scientists? volume 4th paradigm DATA SCIENCE in How to learn from data with novel methods? algorithms How to archive and preserve data? value education How to advance visualisation algorithms for large-scale data? exploration How to address preservation and curation with tools and processes? visualisation How to enhance generic techniques by orders of magnitude? velocity life cycle variety experiment federation simulation management How to protect valuable data? facility How to easily share data in global collaborations? How to optimise meta-data handling? veracity 3 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Collaboration is key 4 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

GridKa German Tier-1 Center in WLCG Supports all LHC experiments + Belle II + several small {HE/AP}P communities Benchmarked reliability of 99.5% Resources >10,000 cores, utilization >95% Disk space: 12 PB, tape space: 17 PB 6x10 Gbit/s network connectivity 14% of LHC data permanently stored at GridKa Serves > 20 T2 centres in 6 countries Services File transfer Regional workload management, file catalog Annual international GridKa School Global Grid User Support (GGUS) for WLCG 5 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

GridKa Experiences Evolving demands and usage patterns No common workflows Hardware commodity, software not Hierarchical storage with tape challenging On-site experiment representation highly useful Adoption of grid computing outside of HEP rare Data access a central issue Random data access by compute jobs Reprocessing Hierarchical model relaxed Hierarchy Mesh Courtesy of Ian Bird, CERN 6 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Large Scale Data Facility Main goals Provision of storage for multiple research groups Systemsbiology, climatology, synchrotron research/materials science, humanities, geo/earth science, Basis for BW-wide data services Resources and access 6 PB of on-line storage (ext. to > 10 PB) 6 PB of archival storage 100 GbE connection between LSDF@KIT and U-Heidelberg Hadoop analysis cluster of 58*8 cores Connection to HPC clusters Jointly funded by Helmholtz Association and state of Baden-Württemberg 7 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

LSDF Experiences High demand for storage, analysis and archival Research groups vary in Research topics (from genetic sequencing to geophysics) Size IT expertise Need for services and protocols Important needs common to many user groups Sharing data with other groups Data safety and preservation Consulting Many small groups depend on LSDF 8 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

bwlsdf Services for users in the state of Baden-Württemberg bwsync&share For employees and students at Universities in BaWü Dropbox-like data storage KIT cloud on-premise solution Access via webbrowser, clients for windows, linux and mobile phones Web based SAML authentification (Shibboleth) - bwidm Data can be shared with users not registered for the service Service started on 1.1.2014 Available to 450.000 users bwfilestorage Via SFTP/SCP SAML authentication 9 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Large-Scale Data Management and Analysis Dual Approach Data Life Cycle Labs Joint R&D with scientific user communities Optimization of the data life cycle Community-specific data analysis tools and services Data Services Integration Team Generic methods R&D Data analysis tools and services common to several DLCLs Interface between federated data infrastructures and DLCLs/communities 10 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Facts and Figures Helmholtz portfolio extension Initial project duration: 2012-2016 Partners: Sustainability Inclusion of activities into Helmholtz programme-oriented funding (PoF) Supercomputing & Big Data from 2015 onwards Cross-programme iniative with other Helmholtz research fields Annual international symposium 11 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Selected Scientific Highlights DLCL Key Technologies (KIT, U-Heidelberg, U-Dresden) Light Sheet Microscopy Mikut et al. (2013), Automated processing of zebrafish imaging data - a survey, Zebrafish 10(3), DOI:10.1089/zeb.2013.0886 Experimental Microscope 16 TB/d Data Acquisition System Data Ingest Client 400 MB/s Repository Automatic Processing Analysis DLCL Energy (KIT, U-Ulm) Secure data management for tech. & eco. data analysis Electrical Data Recorder: 3-phase voltage measurement EDR-Device 1 EDR-Device n 8.7 GB/d 8.7 GB/d Web Server Generic Data Services LSDF (HDFS/GPFS) Metadata Storage Data Analysis (easimov modeling and analysis toolbox) 12 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

LSDMA Experiences Communities differ in Previous knowledge Level of specification of the data life cycle Tools and services used Within communities Focus on data analysis High fluctuation of computing experts Needs driven by Increasing amount of data Cooperation between groups Policies Open access/data Long-term preservation Lessons learned Interoperable AAI crucial Data privacy very challenging, both legally and technically Communities need evolution, not revolution Needs can be very specific 13 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Re3data.org by KIT-BIB, Frank Scholze Goals Linking existing research data repositories In-depth analysis of quality requirements for research data repositories Define a draft set of criteria for their quality assurance Currently 634 research data repositories from around the world covering all academic disciplines are listed 586 of these are described using the re3data.org schema, http://doi.org/10.2312/re3.005 14 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing

Smart Data innovation Lab: A Data Hub for Industry Governance Data Innovation Communities Industry 4.0 Energy Smart Cities Medicine Working Group Data Curation Working Group Legal Affairs Cross Topic Communities Working Group Facility Operation and Tools 15 26.3.2014 ZKI Frühjahrstagung 2014 Steinbuch Centre for Computing