Software Entwicklungen für das LSDF Datenmanagement
|
|
|
- Jonathan Holmes
- 10 years ago
- Views:
Transcription
1 Software Entwicklungen für das LSDF Datenmanagement Rainer Stotzka, V. Hartmann, T. Jejkal,, P. Neuberger, S. Ochsenreither, F. Rindone, T. Schmidt, H. Pasic J. van Wezel, A. Garcia, R. Kupsch, S. Bourov, M. Hardt Steinbuch Centre for Computing KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
2 Access to Data Infrastructures Virtual Research Communities share resources across borders (computing centers, countries): Computing Storage Networking Facilities Services, etc. KIT LSDF NEW: Data as a Service 2
3 Requirements Storage Availability and reliability (24/7) Scalability (5-10 PB/a) Sustainability (>> 10 a) Performance and throughput (> 1 TB/h per application) Collaborative data networks Distribution Accessibility Security (worldwide) (multiple protocols: Grid, Cloud, Web, ) (X.509 certificates) Tools and applications (software) Flexibility Programming interfaces (API) User interfaces (multiple communities with a huge variety of requirements) (easy-to-use) 3
4 LSDF objectives Dedicated for science data ExaByte scale data To archive data, long term sustainability (10 yrs.?) To enable scientists to gain better scientific results by providing Data intensive analysis Added value services for data intensive processing To provide high performance access, high throughput Barrier free access (easy-to-use) Sustainability and interoperability 4 Guidelines : PARADE White Paper (2009): Strategy for a European Data Infrastructure ESFRI Data Management Task Force (2009): e-irg Report on Data Management OAIS (2002): Reference Model for an Open Archival Information System High Level Experts Group (2010): Riding the Wave European Commission Report on Scientific Data HLEG-SD (2010?): Note on Data Services infrastructure Microsoft Research (2009): The Fourth Paradigm: Data-Intensive Scientific Discovery
5 Software Development Infrastructure LSDF Development of software, technologies and algorithms LSDF Software and Service Development ADALAPI DataBrowser Meta data Data intensive applications ADALAPI DataBrowser Meta data Workflow Scientific experiments, applications, communities Development of services to support scientific communities 5
6 ADALAPI ADALAPI Abstract Data Access Layer Application Programming Interface Java class library Seamless application access to LSDF Independent of transfer protocol and location Protocols and filesystems local files, gsiftp sftp http(s) hdfs Authentifikation: X.509 certificates, user/passwd Performance up to 85 MB/s, 1 GE, gsiftp Client software Applications Tools Scientific exp. DataBrowser DAQ Visualization LSDF Storage Infrastructure Grid Cloud Workstations 6
7 DataBrowser DataBrowser API: GUI: Data and meta data organization File, data and project explorer Easy-to-use Extensible World-wide access Stable Functions: Data management Queries in meta data cataloges Up-/Download Control of data analysis + vis. workflows 7
8 Example: Adapted DataBrowser for Toxicology 8
9 Why is meta data necessary? Meta data Meta data describe the contents of data Everybody uses meta data: File name and extension (e.g. rainer.jpg, budget.xls, Readme.doc) Location (e.g. / /EU-projects/2010/Fishy/budget.xls) Personal know-how Sufficient for small file systems Have you ever tried to locate a file or info-somewhere-in-a-file-system 15 years old? in the file system of a colleague? in a 100 PetaByte file system? 9
10 Model of the LSDF meta data management Idea: Clear separation between Data (files), Meta data File Logical Logical File Catalog File Catalogs DB DB DB DB DB Storage 10
11 Model of the LSDF meta data management Idea: Clear separation between Data (files), Data organization (directory structure) Meta data My project dir dir dir dir dir Logical Directory Catalog DB File Logical Logical File Catalog File Catalogs DB DB DB DB DB 11
12 Model of the LSDF meta data management Idea: Clear separation between Data (files), Data organization (directory structure) and Associated meta data Logical Project Catalog Logical Directory Catalog File Logical Logical File Catalog File Catalogs DB DB DB DB DB DB DB Meta data name owners access rights date community (sub)subcommunity measurement type device, instrument Meta data structure depends on project, instruments, time, 12
13 Hierarchical Catalog System (Repository) APIs and Tools Meta data Sustainable Easily extensible Independent of data formats Enhanced performance: distribution of access Safety by redundancy Use of open standards Catalogs Meta data scheme repository Zebrafish I Zebrafish II ANKA BL1 Material research Digital objects in Arts and Humanities Generic file tree Logical Project Catalog LPN LDN, meta data Logical Directory Catalog LDN LDN, LFN File Logical Logical File Catalog File Catalogs LFN LFN Physical File File Name LFN LFN Physical File File Name LFN Physical File Name DB DB DB DB DB DB DB LSDF Systems Computing Storage 13
14 Additional Data Services How do I insert a new scientific project? Data and meta data organization experts for projects with specific needs Generic meta data format for simple file trees How do I transfer my data to a different location? Do I loose my meta data? Import-export to standard data and meta data formats Archive-in-a-box (Web installer or DVD, zip-archive, etc.) 14
15 Results Community Services Complex image analysis chain: DataBrowser Meta data Workflow 3D image stack, time series, Leica Image Format data set size: 100 GB Transfer to LSDF Automatic data conversion to RAW LSDF storage and online processing Storage Image processing and analysis Offline processing Computing Computing 15
16 Data Intensive Computing Workflow Visualization of huge data sets: Maximum projection, arbitrary viewpoint HeadNode - Job preparation and distribution Computing Nodes - Load data, compute rotation and projection, write results 1 projection 36 projections 2.8 TB read, 1.7 TB write, rotations and projections, 2 h 16
17 Scientific communities Systems biology (ITG, BioQuant, Immunogenetics) Vertebrate development studies and Deconvolution (5000 data sets <180 min.) Synchroton facilities and beamlines ANKA data storage HGF Programme Photon-Neutron-Ion High Data Rate Initiative Climate research Material research Arts and humanities»il Cenacolo«von Da Vinci ( )»L ultima cena«von Julius Romanus (1754) 17
18 Conclusions LSDF is a powerful structure more than data storage and cluster computing Design for future requirements R&D in progress ExaByte storage + interactivity LSDF offers Sustainability and safety Flexibility for future requirements Support Interactivity Software and tools Community-specific services To gain faster and better scientific results 18
Image Data, RDA and Practical Policies
Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Data Life Cycle Lab
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
CHESS DAQ* Introduction
CHESS DAQ* Introduction Werner Sun (for the CLASSE IT group), Cornell University * DAQ = data acquisition https://en.wikipedia.org/wiki/data_acquisition Big Data @ CHESS Historically, low data volumes:
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
Flexible Scalable Hardware independent. Solutions for Long Term Archiving
Flexible Scalable Hardware independent Solutions for Long Term Archiving More than 20 years of experience in archival storage 2 OA HPA 2010 1992 2000 2004 2007 Mainframe Tape Libraries Open System Tape
Client Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
Using Data Mining and Machine Learning in Retail
Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune
Product Brief: XenData X2500 LTO-6 Digital Video Archive System
Product Brief: XenData X2500 LTO-6 Digital Video Archive System Updated: March 21, 2013 Overview The XenData X2500 system includes XenData6 Workstation software which provides the archive, restore and
2012 LABVANTAGE Solutions, Inc. All Rights Reserved.
LABVANTAGE Architecture 2012 LABVANTAGE Solutions, Inc. All Rights Reserved. DOCUMENT PURPOSE AND SCOPE This document provides an overview of the LABVANTAGE hardware and software architecture. It is written
Processing big data by WS- PGRADE/gUSE and Data Avenue
Processing big data by WS- PGRADE/gUSE and Data Avenue http://www.sci-bus.eu Peter Kacsuk, Zoltan Farkas, Krisztian Karoczkai, Istvan Marton, Akos Hajnal, Tamas Pinter MTA SZTAKI SCI-BUS is supported by
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence
The Rise of Industrial Big Data Brian Courtney General Manager Industrial Data Intelligence Agenda Introduction Big Data for the industrial sector Case in point: Big data saves millions at GE Energy Seeking
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
The cloud storage service bwsync&share at KIT
The cloud storage service bwsync&share at KIT Alexander Yasnogor, Nico Schlitter, Andreas Petzold @CERN, Workshop on Cloud Services for File Synchronisation and Sharing STEINBUCH CENTRE FOR COMPUTING -
Why long time storage does not equate to archive
Why long time storage does not equate to archive Jos van Wezel HUF Toronto 2015 STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
NextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information
Data Storage Vendor Neutral Data Archiving May 2015 Sue Montagna Imagination at work GE Proprietary Information Vendor Neutral Archiving Storing data in a standard format with a standard interface, such
Jitterbit Technical Overview : Microsoft Dynamics CRM
Jitterbit allows you to easily integrate Microsoft Dynamics CRM with any cloud, mobile or on premise application. Jitterbit s intuitive Studio delivers the easiest way of designing and running modern integrations
A very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
EREBOS: CosmoSim Database. CLUES Research Environment. Harry Enke (Kristin Riebe, Jochen Klar, Adrian Partl) CLUES Meeting 2015, Copenhagen
EREBOS: CLUES Research Environment CosmoSim Database Harry Enke (Kristin Riebe, Jochen Klar, Adrian Partl) CLUES Meeting 2015, Copenhagen Collaborative Research Environment (CRE) Elements: - huge data
Introduction to Arvados. A Curoverse White Paper
Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Power Grid Time Series Data Analysis with Pig on a Hadoop Cluster compared to Multi Core Systems
Power Grid Time Series Data Analysis with Pig on a Hadoop Cluster compared to Multi Core Systems Felix Bach *, Hueseyin K. Çakmak *, Heiko Maass, Uwe Kuehnapfel Institute for Applied Computer Sciences
Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery
Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.3 Selected Standards
DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study
DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource
IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
HPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor
HPC Storage Solutions at transtec Parallel NFS with Panasas ActiveStor HIGH PERFORMANCE COMPUTING AT TRANSTEC More than 30 Years of Experience in Scientific Computing 1980: transtec founded, a reseller
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.
THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai
GSAW 2010 Working Group Session 11D Eucalyptus-Based Event Correlation Nehal Desai Member of the Tech. Staff, CSD/CSTS/CSRD, The Aerospace Corporation Dr. Craig A. Lee, [email protected] Senior Scientist, CSD/CSTS/CSRD,
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Software-defined Storage Architecture for Analytics Computing
Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
Advancements in Storage QoS Management in National Data Storage
Advancements in Storage QoS Management in National Data Storage Darin Nikolow 1, Renata Słota 1, Stanisław Polak 1 and Jacek Kitowski 1,2 1 AGH University of Science and Technology, Faculty of Computer
Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015
Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015 Table of Contents Introduction... 4 Certified Products... 4 Key Findings... 5 Solution
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
Product Overview. Contents
Contents Product Overview Copyright 2002 - Xware AB. All rights reserved. xtrade is a registered trademark of Xware AB. Symphonia is a trademark of Orion Systems New Zealand Ltd. All rights reserved. 2
The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms
ICOODB 2010 - Frankfurt, Deutschland The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms Leon Guzenda - Objectivity, Inc. 1 AGENDA Historical Overview Inherent
SURFsara Data Services
SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud
Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud
IBM WebSphere Enterprise Service Bus, Version 6.0.1
Powering your service oriented architecture IBM WebSphere Enterprise Service Bus, Version 6.0.1 Highlights Supports a variety of messaging Requires minimal standards including JMS, Version 1.1 programming
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
Case Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo [email protected] HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A
Identifier: Date: Activity: Authors: Status: Link: Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A J O I N T A C T I O N ( S A 1, J R A 3 ) F I
NEXT GENERATION ARCHIVE MIGRATION TOOLS
NEXT GENERATION ARCHIVE MIGRATION TOOLS Cloud Ready, Scalable, & Highly Customizable - Migrate 6.0 Ensures Faster & Smarter Migrations EXECUTIVE SUMMARY Data migrations and the products used to perform
Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke
Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke Kalabie: Extending the OpenLAB Architecture Agilent User Interface
New Features in Oracle Application Express 4.1. Oracle Application Express Websheets. Oracle Database Cloud Service
Date and Time- Europe/Middle East/Africa Tuesday June 12, 2012 09:00 13:00 BST 10:00 14:00 CEST 12:00 16:00 GST Corresponding UTC (GMT) 08:00:00 Agenda Time Track and Keynote/Session Title 9:00 AM Keynote
Open Cirrus: Towards an Open Source Cloud Stack
Open Cirrus: Towards an Open Source Cloud Stack Karlsruhe Institute of Technology (KIT) HPC2010, Cetraro, June 2010 Marcel Kunze KIT University of the State of Baden-Württemberg and National Laboratory
Percipient StorAGe for Exascale Data Centric Computing
Percipient StorAGe for Exascale Data Centric Computing per cip i ent (pr-sp-nt) adj. Having the power of perceiving, especially perceiving keenly and readily. n. One that perceives. Introducing: Seagate
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Analisi di un servizio SRM: StoRM
27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The
Evaluating MapReduce and Hadoop for Science
Evaluating MapReduce and Hadoop for Science Lavanya Ramakrishnan [email protected] Lawrence Berkeley National Lab Computation and Data are critical parts of the scientific process Three Pillars of
Object Storage: Out of the Shadows and into the Spotlight
Technology Insight Paper Object Storage: Out of the Shadows and into the Spotlight By John Webster December 12, 2012 Enabling you to make the best technology decisions Object Storage: Out of the Shadows
XenData Archive Series Software Technical Overview
XenData White Paper XenData Archive Series Software Technical Overview Advanced and Video Editions, Version 4.0 December 2006 XenData Archive Series software manages digital assets on data tape and magnetic
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with
Performance Testing of Big Data Applications
Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies [email protected] Shirish Bhale: Director of Engineering
A Grid Architecture for Manufacturing Database System
Database Systems Journal vol. II, no. 2/2011 23 A Grid Architecture for Manufacturing Database System Laurentiu CIOVICĂ, Constantin Daniel AVRAM Economic Informatics Department, Academy of Economic Studies
Aspera Direct-to-Cloud Storage WHITE PAPER
Transport Direct-to-Cloud Storage and Support for Third Party April 2014 WHITE PAPER TABLE OF CONTENTS OVERVIEW 3 1 - THE PROBLEM 3 2 - A FUNDAMENTAL SOLUTION - ASPERA DIRECT-TO-CLOUD TRANSPORT 5 3 - VALIDATION
Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000
Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth
DataNet Flexible Metadata Overlay over File Resources
1 DataNet Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC Cyfronet AGH, 2 AGH University of Science and Technology,
Data Management System - Developer Guide
Data Management System - Developer Guide Marcin Wolski Pawel Spychala Maciej Labedzki Data Management System - Developer Guide by Marcin
Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze
Data Warehouse and Hive Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Decision support systems Decision Support Systems allowed managers, supervisors, and executives to once again see the
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Das HappyFace Meta-Monitoring Framework
Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and
Big Data Services at DKRZ
Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
