Software Entwicklungen für das LSDF Datenmanagement

Size: px
Start display at page:

Download "Software Entwicklungen für das LSDF Datenmanagement"

Transcription

1 Software Entwicklungen für das LSDF Datenmanagement Rainer Stotzka, V. Hartmann, T. Jejkal,, P. Neuberger, S. Ochsenreither, F. Rindone, T. Schmidt, H. Pasic J. van Wezel, A. Garcia, R. Kupsch, S. Bourov, M. Hardt Steinbuch Centre for Computing KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

2 Access to Data Infrastructures Virtual Research Communities share resources across borders (computing centers, countries): Computing Storage Networking Facilities Services, etc. KIT LSDF NEW: Data as a Service 2

3 Requirements Storage Availability and reliability (24/7) Scalability (5-10 PB/a) Sustainability (>> 10 a) Performance and throughput (> 1 TB/h per application) Collaborative data networks Distribution Accessibility Security (worldwide) (multiple protocols: Grid, Cloud, Web, ) (X.509 certificates) Tools and applications (software) Flexibility Programming interfaces (API) User interfaces (multiple communities with a huge variety of requirements) (easy-to-use) 3

4 LSDF objectives Dedicated for science data ExaByte scale data To archive data, long term sustainability (10 yrs.?) To enable scientists to gain better scientific results by providing Data intensive analysis Added value services for data intensive processing To provide high performance access, high throughput Barrier free access (easy-to-use) Sustainability and interoperability 4 Guidelines : PARADE White Paper (2009): Strategy for a European Data Infrastructure ESFRI Data Management Task Force (2009): e-irg Report on Data Management OAIS (2002): Reference Model for an Open Archival Information System High Level Experts Group (2010): Riding the Wave European Commission Report on Scientific Data HLEG-SD (2010?): Note on Data Services infrastructure Microsoft Research (2009): The Fourth Paradigm: Data-Intensive Scientific Discovery

5 Software Development Infrastructure LSDF Development of software, technologies and algorithms LSDF Software and Service Development ADALAPI DataBrowser Meta data Data intensive applications ADALAPI DataBrowser Meta data Workflow Scientific experiments, applications, communities Development of services to support scientific communities 5

6 ADALAPI ADALAPI Abstract Data Access Layer Application Programming Interface Java class library Seamless application access to LSDF Independent of transfer protocol and location Protocols and filesystems local files, gsiftp sftp http(s) hdfs Authentifikation: X.509 certificates, user/passwd Performance up to 85 MB/s, 1 GE, gsiftp Client software Applications Tools Scientific exp. DataBrowser DAQ Visualization LSDF Storage Infrastructure Grid Cloud Workstations 6

7 DataBrowser DataBrowser API: GUI: Data and meta data organization File, data and project explorer Easy-to-use Extensible World-wide access Stable Functions: Data management Queries in meta data cataloges Up-/Download Control of data analysis + vis. workflows 7

8 Example: Adapted DataBrowser for Toxicology 8

9 Why is meta data necessary? Meta data Meta data describe the contents of data Everybody uses meta data: File name and extension (e.g. rainer.jpg, budget.xls, Readme.doc) Location (e.g. / /EU-projects/2010/Fishy/budget.xls) Personal know-how Sufficient for small file systems Have you ever tried to locate a file or info-somewhere-in-a-file-system 15 years old? in the file system of a colleague? in a 100 PetaByte file system? 9

10 Model of the LSDF meta data management Idea: Clear separation between Data (files), Meta data File Logical Logical File Catalog File Catalogs DB DB DB DB DB Storage 10

11 Model of the LSDF meta data management Idea: Clear separation between Data (files), Data organization (directory structure) Meta data My project dir dir dir dir dir Logical Directory Catalog DB File Logical Logical File Catalog File Catalogs DB DB DB DB DB 11

12 Model of the LSDF meta data management Idea: Clear separation between Data (files), Data organization (directory structure) and Associated meta data Logical Project Catalog Logical Directory Catalog File Logical Logical File Catalog File Catalogs DB DB DB DB DB DB DB Meta data name owners access rights date community (sub)subcommunity measurement type device, instrument Meta data structure depends on project, instruments, time, 12

13 Hierarchical Catalog System (Repository) APIs and Tools Meta data Sustainable Easily extensible Independent of data formats Enhanced performance: distribution of access Safety by redundancy Use of open standards Catalogs Meta data scheme repository Zebrafish I Zebrafish II ANKA BL1 Material research Digital objects in Arts and Humanities Generic file tree Logical Project Catalog LPN LDN, meta data Logical Directory Catalog LDN LDN, LFN File Logical Logical File Catalog File Catalogs LFN LFN Physical File File Name LFN LFN Physical File File Name LFN Physical File Name DB DB DB DB DB DB DB LSDF Systems Computing Storage 13

14 Additional Data Services How do I insert a new scientific project? Data and meta data organization experts for projects with specific needs Generic meta data format for simple file trees How do I transfer my data to a different location? Do I loose my meta data? Import-export to standard data and meta data formats Archive-in-a-box (Web installer or DVD, zip-archive, etc.) 14

15 Results Community Services Complex image analysis chain: DataBrowser Meta data Workflow 3D image stack, time series, Leica Image Format data set size: 100 GB Transfer to LSDF Automatic data conversion to RAW LSDF storage and online processing Storage Image processing and analysis Offline processing Computing Computing 15

16 Data Intensive Computing Workflow Visualization of huge data sets: Maximum projection, arbitrary viewpoint HeadNode - Job preparation and distribution Computing Nodes - Load data, compute rotation and projection, write results 1 projection 36 projections 2.8 TB read, 1.7 TB write, rotations and projections, 2 h 16

17 Scientific communities Systems biology (ITG, BioQuant, Immunogenetics) Vertebrate development studies and Deconvolution (5000 data sets <180 min.) Synchroton facilities and beamlines ANKA data storage HGF Programme Photon-Neutron-Ion High Data Rate Initiative Climate research Material research Arts and humanities»il Cenacolo«von Da Vinci ( )»L ultima cena«von Julius Romanus (1754) 17

18 Conclusions LSDF is a powerful structure more than data storage and cluster computing Design for future requirements R&D in progress ExaByte storage + interactivity LSDF offers Sustainability and safety Flexibility for future requirements Support Interactivity Software and tools Community-specific services To gain faster and better scientific results 18

Image Data, RDA and Practical Policies

Image Data, RDA and Practical Policies Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Data Life Cycle Lab

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

CHESS DAQ* Introduction

CHESS DAQ* Introduction CHESS DAQ* Introduction Werner Sun (for the CLASSE IT group), Cornell University * DAQ = data acquisition https://en.wikipedia.org/wiki/data_acquisition Big Data @ CHESS Historically, low data volumes:

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Flexible Scalable Hardware independent. Solutions for Long Term Archiving

Flexible Scalable Hardware independent. Solutions for Long Term Archiving Flexible Scalable Hardware independent Solutions for Long Term Archiving More than 20 years of experience in archival storage 2 OA HPA 2010 1992 2000 2004 2007 Mainframe Tape Libraries Open System Tape

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent

More information

Using Data Mining and Machine Learning in Retail

Using Data Mining and Machine Learning in Retail Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune

More information

Product Brief: XenData X2500 LTO-6 Digital Video Archive System

Product Brief: XenData X2500 LTO-6 Digital Video Archive System Product Brief: XenData X2500 LTO-6 Digital Video Archive System Updated: March 21, 2013 Overview The XenData X2500 system includes XenData6 Workstation software which provides the archive, restore and

More information

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

2012 LABVANTAGE Solutions, Inc. All Rights Reserved. LABVANTAGE Architecture 2012 LABVANTAGE Solutions, Inc. All Rights Reserved. DOCUMENT PURPOSE AND SCOPE This document provides an overview of the LABVANTAGE hardware and software architecture. It is written

More information

Processing big data by WS- PGRADE/gUSE and Data Avenue

Processing big data by WS- PGRADE/gUSE and Data Avenue Processing big data by WS- PGRADE/gUSE and Data Avenue http://www.sci-bus.eu Peter Kacsuk, Zoltan Farkas, Krisztian Karoczkai, Istvan Marton, Akos Hajnal, Tamas Pinter MTA SZTAKI SCI-BUS is supported by

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence The Rise of Industrial Big Data Brian Courtney General Manager Industrial Data Intelligence Agenda Introduction Big Data for the industrial sector Case in point: Big data saves millions at GE Energy Seeking

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Deploying a distributed data storage system on the UK National Grid Service using federated SRB Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications

More information

The cloud storage service bwsync&share at KIT

The cloud storage service bwsync&share at KIT The cloud storage service bwsync&share at KIT Alexander Yasnogor, Nico Schlitter, Andreas Petzold @CERN, Workshop on Cloud Services for File Synchronisation and Sharing STEINBUCH CENTRE FOR COMPUTING -

More information

Why long time storage does not equate to archive

Why long time storage does not equate to archive Why long time storage does not equate to archive Jos van Wezel HUF Toronto 2015 STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information Data Storage Vendor Neutral Data Archiving May 2015 Sue Montagna Imagination at work GE Proprietary Information Vendor Neutral Archiving Storing data in a standard format with a standard interface, such

More information

Jitterbit Technical Overview : Microsoft Dynamics CRM

Jitterbit Technical Overview : Microsoft Dynamics CRM Jitterbit allows you to easily integrate Microsoft Dynamics CRM with any cloud, mobile or on premise application. Jitterbit s intuitive Studio delivers the easiest way of designing and running modern integrations

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

EREBOS: CosmoSim Database. CLUES Research Environment. Harry Enke (Kristin Riebe, Jochen Klar, Adrian Partl) CLUES Meeting 2015, Copenhagen

EREBOS: CosmoSim Database. CLUES Research Environment. Harry Enke (Kristin Riebe, Jochen Klar, Adrian Partl) CLUES Meeting 2015, Copenhagen EREBOS: CLUES Research Environment CosmoSim Database Harry Enke (Kristin Riebe, Jochen Klar, Adrian Partl) CLUES Meeting 2015, Copenhagen Collaborative Research Environment (CRE) Elements: - huge data

More information

Introduction to Arvados. A Curoverse White Paper

Introduction to Arvados. A Curoverse White Paper Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Power Grid Time Series Data Analysis with Pig on a Hadoop Cluster compared to Multi Core Systems

Power Grid Time Series Data Analysis with Pig on a Hadoop Cluster compared to Multi Core Systems Power Grid Time Series Data Analysis with Pig on a Hadoop Cluster compared to Multi Core Systems Felix Bach *, Hueseyin K. Çakmak *, Heiko Maass, Uwe Kuehnapfel Institute for Applied Computer Sciences

More information

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja

More information

European Data Infrastructure - EUDAT Data Services & Tools

European Data Infrastructure - EUDAT Data Services & Tools European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.3 Selected Standards

More information

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline

More information

HPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor

HPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor HPC Storage Solutions at transtec Parallel NFS with Panasas ActiveStor HIGH PERFORMANCE COMPUTING AT TRANSTEC More than 30 Years of Experience in Scientific Computing 1980: transtec founded, a reseller

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved. THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai GSAW 2010 Working Group Session 11D Eucalyptus-Based Event Correlation Nehal Desai Member of the Tech. Staff, CSD/CSTS/CSRD, The Aerospace Corporation Dr. Craig A. Lee, [email protected] Senior Scientist, CSD/CSTS/CSRD,

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

Software-defined Storage Architecture for Analytics Computing

Software-defined Storage Architecture for Analytics Computing Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture

More information

A Survey Study on Monitoring Service for Grid

A Survey Study on Monitoring Service for Grid A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide

More information

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database

More information

Advancements in Storage QoS Management in National Data Storage

Advancements in Storage QoS Management in National Data Storage Advancements in Storage QoS Management in National Data Storage Darin Nikolow 1, Renata Słota 1, Stanisław Polak 1 and Jacek Kitowski 1,2 1 AGH University of Science and Technology, Faculty of Computer

More information

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015 Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015 Table of Contents Introduction... 4 Certified Products... 4 Key Findings... 5 Solution

More information

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise

More information

Product Overview. Contents

Product Overview. Contents Contents Product Overview Copyright 2002 - Xware AB. All rights reserved. xtrade is a registered trademark of Xware AB. Symphonia is a trademark of Orion Systems New Zealand Ltd. All rights reserved. 2

More information

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms ICOODB 2010 - Frankfurt, Deutschland The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms Leon Guzenda - Objectivity, Inc. 1 AGENDA Historical Overview Inherent

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud

More information

IBM WebSphere Enterprise Service Bus, Version 6.0.1

IBM WebSphere Enterprise Service Bus, Version 6.0.1 Powering your service oriented architecture IBM WebSphere Enterprise Service Bus, Version 6.0.1 Highlights Supports a variety of messaging Requires minimal standards including JMS, Version 1.1 programming

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Case Study : 3 different hadoop cluster deployments

Case Study : 3 different hadoop cluster deployments Case Study : 3 different hadoop cluster deployments Lee moon soo [email protected] HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer

More information

Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A Identifier: Date: Activity: Authors: Status: Link: Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A J O I N T A C T I O N ( S A 1, J R A 3 ) F I

More information

NEXT GENERATION ARCHIVE MIGRATION TOOLS

NEXT GENERATION ARCHIVE MIGRATION TOOLS NEXT GENERATION ARCHIVE MIGRATION TOOLS Cloud Ready, Scalable, & Highly Customizable - Migrate 6.0 Ensures Faster & Smarter Migrations EXECUTIVE SUMMARY Data migrations and the products used to perform

More information

Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke

Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke Kalabie: Extending the OpenLAB Architecture Agilent User Interface

More information

New Features in Oracle Application Express 4.1. Oracle Application Express Websheets. Oracle Database Cloud Service

New Features in Oracle Application Express 4.1. Oracle Application Express Websheets. Oracle Database Cloud Service Date and Time- Europe/Middle East/Africa Tuesday June 12, 2012 09:00 13:00 BST 10:00 14:00 CEST 12:00 16:00 GST Corresponding UTC (GMT) 08:00:00 Agenda Time Track and Keynote/Session Title 9:00 AM Keynote

More information

Open Cirrus: Towards an Open Source Cloud Stack

Open Cirrus: Towards an Open Source Cloud Stack Open Cirrus: Towards an Open Source Cloud Stack Karlsruhe Institute of Technology (KIT) HPC2010, Cetraro, June 2010 Marcel Kunze KIT University of the State of Baden-Württemberg and National Laboratory

More information

Percipient StorAGe for Exascale Data Centric Computing

Percipient StorAGe for Exascale Data Centric Computing Percipient StorAGe for Exascale Data Centric Computing per cip i ent (pr-sp-nt) adj. Having the power of perceiving, especially perceiving keenly and readily. n. One that perceives. Introducing: Seagate

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Analisi di un servizio SRM: StoRM

Analisi di un servizio SRM: StoRM 27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The

More information

Evaluating MapReduce and Hadoop for Science

Evaluating MapReduce and Hadoop for Science Evaluating MapReduce and Hadoop for Science Lavanya Ramakrishnan [email protected] Lawrence Berkeley National Lab Computation and Data are critical parts of the scientific process Three Pillars of

More information

Object Storage: Out of the Shadows and into the Spotlight

Object Storage: Out of the Shadows and into the Spotlight Technology Insight Paper Object Storage: Out of the Shadows and into the Spotlight By John Webster December 12, 2012 Enabling you to make the best technology decisions Object Storage: Out of the Shadows

More information

XenData Archive Series Software Technical Overview

XenData Archive Series Software Technical Overview XenData White Paper XenData Archive Series Software Technical Overview Advanced and Video Editions, Version 4.0 December 2006 XenData Archive Series software manages digital assets on data tape and magnetic

More information

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with

More information

Performance Testing of Big Data Applications

Performance Testing of Big Data Applications Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies [email protected] Shirish Bhale: Director of Engineering

More information

A Grid Architecture for Manufacturing Database System

A Grid Architecture for Manufacturing Database System Database Systems Journal vol. II, no. 2/2011 23 A Grid Architecture for Manufacturing Database System Laurentiu CIOVICĂ, Constantin Daniel AVRAM Economic Informatics Department, Academy of Economic Studies

More information

Aspera Direct-to-Cloud Storage WHITE PAPER

Aspera Direct-to-Cloud Storage WHITE PAPER Transport Direct-to-Cloud Storage and Support for Third Party April 2014 WHITE PAPER TABLE OF CONTENTS OVERVIEW 3 1 - THE PROBLEM 3 2 - A FUNDAMENTAL SOLUTION - ASPERA DIRECT-TO-CLOUD TRANSPORT 5 3 - VALIDATION

More information

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth

More information

DataNet Flexible Metadata Overlay over File Resources

DataNet Flexible Metadata Overlay over File Resources 1 DataNet Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC Cyfronet AGH, 2 AGH University of Science and Technology,

More information

Data Management System - Developer Guide

Data Management System - Developer Guide Data Management System - Developer Guide Marcin Wolski Pawel Spychala Maciej Labedzki Data Management System - Developer Guide by Marcin

More information

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Data Warehouse and Hive Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Decision support systems Decision Support Systems allowed managers, supervisors, and executives to once again see the

More information

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Das HappyFace Meta-Monitoring Framework

Das HappyFace Meta-Monitoring Framework Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and

More information

Big Data Services at DKRZ

Big Data Services at DKRZ Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information