Scalable Services for Digital Preservation
|
|
- Margaret Adams
- 8 years ago
- Views:
Transcription
1 Scalable Services for Digital Preservation A Perspective on Cloud Computing Rainer Schmidt, Christian Sadilek, and Ross King
2 Digital Preservation (DP) Providing long-term access to growing collections of digital assets. Not just a question of storage Not just a question of files Software preservation rather than hardware preservation Prevent objects from becoming uniterpretable bit-streams. Requires establishment of research infrastructures and networks.» Not an out-of-the-box solution A number of large EU projects and initiatives are dealing with the implications of digital preservation. FP6/FP7: Planets, CASPAR, DPE, SHAMAN 2
3 Planets Permanent Long-term Access through NETworked Services Addresses the problem of digital preservation driven by National Libraries and Archives Project instrument: FP6 Integrated Project 5. IST Call Consortium: 16 organisations from 7 countries Duration: 48 months, June 2006 May 2010 Budget: 14 Million Euro 3
4 Why does DP need HPC resources? Digital object management systems, repositories, or archives are designed for storing and providing access to large amounts of data. Many different data streams and metadata models Not designed to support continuous data modification. Focus on storage, no support for HPC. Digital preservation is an e-science problem Processing vast amounts of complex data (e.g. analyse, migrate), Experiments in distributed and heterogeneous environments, Crossing administrative and institutional boundaries. 4
5 Towards Grid and Cloud Computing Planets preservation architecture is based on services. Supports interoperability and a distributed environment Sufficient for a controlled experiments (Testbed) Not sufficient for handling a production environment Massively, uncontrolled user requests Mass migration of hundreds of TBytes of data Content Holders are faced with loosing vast amounts of data Holding not sufficient computational resources in-house There is a clear requirement for incorporating HPC facilities -> Grid and Cloud Computing 5
6 Execution Tiered Architecture Preservation Planning and Workflow Generation Workbench for Testbed Experiments Web Portal Browser Clients Service Registry Workflow Def. reference lookup Data + Metadata Registry Execution Engine maintain Planets Instance Tools + Format Registry Data Model Planets App. Services 3rd Party Tool Services Grid/Cloud Execution Services Web/Grid Services Experimental Environment (qualitative results) Production Environment (quantitative results) Resources 6
7 Integrating Clouds and Virtual Clusters Basic Idea: Extending Planets SOA with Grid Services The Planets IF Job Submission Services Allow Job Submission to a PC cluster (e.g. Hadoop, Condor) standard Grid protocols/interfaces (SOAP, HPC-BP, JSDL) Cluster nodes are instantiated from pre-configured system images Most Preservation Tools are 3rd party applications Software need to be preinstalled on cluster nodes Cluster and JSS be instantiated in-house (e.g. a PC lab) or on top of (leased) cloud resources (AWS EC2). Computation be moved to data or vice-versa 7
8 Experimental Architecture Virtual Cluster (Apache Hadoop) Virtual Node (Xen) HPCBP Cloud Infrastructure (EC2) JSDL Job Job Description File JSS Storage Infrastructure (S3) Raw Data Data Transfer Service 8
9 Mass Migration of Digital Objects Map-Reduce implements a framework and prog. model for processing large documents (Sorting, Searching, Indexing) on multiple nodes. Automated decomposition (split) Mapping to intermediary pairs (map), optionally (combine) Merge output (reduce) Provides implementation for data parallel problems, i/o intensive, Example: Conversion digital object (e.g website, folder, archive) Decompose into atomic pieces (e.g. file, image, movie) On each node, convert piece to target format Merge pieces and create new data object 9
10 Experimental Results - Setup Amazon Elastic Compute Cloud (EC2) cluster nodes Custom image based on Fedora 8 i386 Amazon Simple Storage Service (S3) max. 1TB I/O, ~32,5MB/s download / ~13,8MB/s upload (cloud internally) Apache Hadoop (v.0.18) MapReduce Implementation Preinstalled command line tools (e.g, ps2pdf ) 10
11 Experimental Results 1 Scaling Job Size time [min] 10,00 9,00 8,00 7,00 6,00 5,00 4,00 3,00 2,00 1,00 0,00 x(1k) = 3,5 x(1k) = 4,4 x(1k) = 3, number of nodes = 5 EC2 0,07 MB EC2 7,5 MB EC2 250 MB SLE 0,07 MB SLE 7,5 MB SLE 250 MB tasks x(1k) = t_seq / t_parallel and tasks =
12 Experimental Results 2 Scaling #nodes 40,00 35,00 X n=1, t=36, s1 = 1, e=100% time [min] 30,00 25,00 20,00 15,00 10,00 5,00 0,00 X X n=1 (local), t=26 n=5, t=4.5, s=4.5, e=90% n=10, t=4.5, s=7.6, e=75% X X n=50, t=1.68, s=21, e=43% X n=100, t=1.03, s=35, e=35% EC x 0,07 MB SLE 1000 x 0,07 MB nodes 12
13 Conclusion Preservation systems need to employ HPC resources. Content holders and data repository systems are not ready to utilize computational Grids. There is a need to bridge research communities in the areas of digital preservation and e-science. Cloud Computing provides a powerful solution for getting on-demand access to appropriate HPC resources. Many integration issues: Security, Legal Aspects, Reliability, Standardization. Planets IF Job Submission Service, a first step. Submission to virtual cluster of DP nodes based on Grid protocols/interfaces. 13
14 Fin 14
15 The Planets Service Framework Defines an Service-Oriented Architecture for Digital Preservation Set of Preservation Services, Interfaces, a common Data Model Implements Common Services Authentication and Authorization, Monitoring, Logging, Notification, Service Registration and Lookup Provides Workflow Enactment Service and Engine Components-based, XML serialization APIs for Applications that use Planets Testbed Experiments, Executing Preservation Plans 15
16 <?xml version="1.0" encoding="utf-8"?> <jsdl:jobdefinition xmlns=" xmlns:jsdl=" xmlns:jsdl-posix=" xmlns:xsi=" xsi:schemalocation=" jsdl.xsd "> <jsdl:jobdescription> <jsdl:jobidentification> <jsdl:jobname>start vi</jsdl:jobname> </jsdl:jobidentification> <jsdl:application> <jsdl:applicationname>ls</jsdl:applicationname> <jsdl-posix:posixapplication> <jsdl-posix:executable>/bin/ls</jsdl-posix:executable> <jsdl-posix:argument>-la file.txt</jsdl-posix:argument> <jsdl-posix:environment name="ld_library_path">/usr/local/lib</jsdl-posix:environment> <jsdl-posix:input>/dev/null</jsdl-posix:input> <jsdl-posix:output>stdout.${job_id}</jsdl-posix:output> <jsdl-posix:error>stderr.${job_id}</jsdl-posix:error> </jsdl-posix:posixapplication> </jsdl:application> </jsdl:jobdescription> </jsdl:jobdefinition> 16
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
More informationCloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
More informationThe Quest for Conformance Testing in the Cloud
The Quest for Conformance Testing in the Cloud Dylan Yaga Computer Security Division Information Technology Laboratory National Institute of Standards and Technology NIST/ITL Computer Security Division
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationLeveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
More informationCorso di Reti di Calcolatori L-A. Cloud Computing
Università degli Studi di Bologna Facoltà di Ingegneria Corso di Reti di Calcolatori L-A Cloud Computing Antonio Corradi Luca Foschini Some Clouds 1 What is Cloud computing? The architecture and terminology
More informationBenchmarking Amazon s EC2 Cloud Platform
Benchmarking Amazon s EC2 Cloud Platform Goal of the project: The goal of this project is to analyze the performance of an 8-node cluster in Amazon s Elastic Compute Cloud (EC2). The cluster will be benchmarked
More informationThe Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project
The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project Alastair Duncan STFC Pre Coffee talk STFC July 2014 SCAPE Scalable Preservation Environments The
More informationInteroperability between Sun Grid Engine and the Windows Compute Cluster
Interoperability between Sun Grid Engine and the Windows Compute Cluster Steven Newhouse Program Manager, Windows HPC Team steven.newhouse@microsoft.com 1 Computer Cluster Roadmap Mainstream HPC Mainstream
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationDeploying Business Virtual Appliances on Open Source Cloud Computing
International Journal of Computer Science and Telecommunications [Volume 3, Issue 4, April 2012] 26 ISSN 2047-3338 Deploying Business Virtual Appliances on Open Source Cloud Computing Tran Van Lang 1 and
More informationHigh Performance Computing MapReduce & Hadoop. 17th Apr 2014
High Performance Computing MapReduce & Hadoop 17th Apr 2014 MapReduce Programming model for parallel processing vast amounts of data (TBs/PBs) distributed on commodity clusters Borrows from map() and reduce()
More informationCluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
More informationElastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus
Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus International Symposium on Grid Computing 2009 (Taipei) Christian Baun The cooperation of and Universität Karlsruhe (TH) Agenda
More informationMigration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud
Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud Use case Figure 1: Company C Architecture (Before Migration) Company C is an automobile insurance claim processing company with
More informationOrbiter Series Service Oriented Architecture Applications
Workshop on Science Agency Uses of Clouds and Grids Orbiter Series Service Oriented Architecture Applications Orbiter Project Overview Mark L. Green mlgreen@txcorp.com Tech-X Corporation, Buffalo Office
More informationCloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad
Cloud Computing: Computing as a Service Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad Abstract: Computing as a utility. is a dream that dates from the beginning from the computer
More informationBuilding Platform as a Service for Scientific Applications
Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationFinal Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
More informationCiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationAn Overview of the Open Cloud Consortium
www.opencloudconsortium.org An Overview of the Open Cloud Consortium Robert Grossman Open Cloud Consortium OMG Cloud Computing Interoperability Workshop July 13, 2009 This talk represents my personal opinions
More informationPlug-and-play Virtual Appliance Clusters Running Hadoop. Dr. Renato Figueiredo ACIS Lab - University of Florida
Plug-and-play Virtual Appliance Clusters Running Hadoop Dr. Renato Figueiredo ACIS Lab - University of Florida Advanced Computing and Information Systems laboratory Introduction You have so far learned
More information5 SCS Deployment Infrastructure in Use
5 SCS Deployment Infrastructure in Use Currently, an increasing adoption of cloud computing resources as the base to build IT infrastructures is enabling users to build flexible, scalable, and low-cost
More informationAmazon Web Services. Elastic Compute Cloud (EC2) and more...
Amazon Web Services Elastic Compute Cloud (EC2) and more... I don t work for Amazon I do however, have a small research grant from Amazon (in AWS$) Portions of this presentation are reproduced from slides
More informationLong-term archiving and preservation planning
Long-term archiving and preservation planning Workflow in digital preservation Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands The Challenge: Long-term Preservation
More informationfor my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste
Which infrastructure Which infrastructure for my computation? Stefano Cozzini Democrito and SISSA/eLAB - Trieste Agenda Introduction:! E-infrastructure and computing infrastructures! What is available
More informationCPET 581 Cloud Computing: Technologies and Enterprise IT Strategies
CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments Part 1 of 2 Spring 2013 A Specialty Course for Purdue University s M.S. in Technology
More informationOpenNebula Leading Innovation in Cloud Computing Management
OW2 Annual Conference 2010 Paris, November 24th, 2010 OpenNebula Leading Innovation in Cloud Computing Management Ignacio M. Llorente DSA-Research.org Distributed Systems Architecture Research Group Universidad
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationCSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
More informationIMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications
Open System Laboratory of University of Illinois at Urbana Champaign presents: Outline: IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications A Fine-Grained Adaptive
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationDistributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG
Distributed File Systems An Overview Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG Introduction A distributed file system allows shared, file based access without sharing disks History starts in 1960s
More informationPerformance Analysis of a Numerical Weather Prediction Application in Microsoft Azure
Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure Emmanuell D Carreño, Eduardo Roloff, Jimmy V. Sanchez, and Philippe O. A. Navaux WSPPD 2015 - XIII Workshop de Processamento
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationCloud Computing. Adam Barker
Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles
More informationHDFS Cluster Installation Automation for TupleWare
HDFS Cluster Installation Automation for TupleWare Xinyi Lu Department of Computer Science Brown University Providence, RI 02912 xinyi_lu@brown.edu March 26, 2014 Abstract TupleWare[1] is a C++ Framework
More informationHow To Build A Cloud Storage System
Reference Architectures for Digital Libraries Keith Rajecki Education Solutions Architect Sun Microsystems, Inc. 1 Agenda Challenges Digital Library Solution Architectures > Open Storage/Open Archive >
More informationNIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards
More informationAnalysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of
More informationImproving MapReduce Performance in Heterogeneous Environments
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationCloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using
Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using Amazon Web Services rather than setting up a physical server
More informationAvailability Digest. www.availabilitydigest.com. @availabilitydig. HPE Helion Private Cloud and Cloud Broker Services February 2016
the Availability Digest @availabilitydig HPE Helion Private Cloud and Cloud Broker Services February 2016 HPE Helion is a complete portfolio of cloud products and services that offers enterprise security,
More informationEfficient Cloud Management for Parallel Data Processing In Private Cloud
2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Efficient Cloud Management for Parallel Data Processing In Private
More informationTECHNOLOGY WHITE PAPER Jun 2012
TECHNOLOGY WHITE PAPER Jun 2012 Technology Stack C# Windows Server 2008 PHP Amazon Web Services (AWS) Route 53 Elastic Load Balancing (ELB) Elastic Compute Cloud (EC2) Amazon RDS Amazon S3 Elasticache
More information2) Xen Hypervisor 3) UEC
5. Implementation Implementation of the trust model requires first preparing a test bed. It is a cloud computing environment that is required as the first step towards the implementation. Various tools
More informationScientific and Technical Applications as a Service in the Cloud
Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationWhat is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,
More informationOpenNebula Cloud Innovation and Case Studies for Telecom
Telecom Cloud Standards Information Day Hyatt Regency, Santa Clara, CA, USA 6-7 December, 2010 OpenNebula Cloud Innovation and Case Studies for Telecom Constantino Vázquez Blanco DSA-Research.org Distributed
More informationCloud Computing Summary and Preparation for Examination
Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination
More informationINTRODUCTION TO CLOUD MANAGEMENT
CONFIGURING AND MANAGING A PRIVATE CLOUD WITH ORACLE ENTERPRISE MANAGER 12C Kai Yu, Dell Inc. INTRODUCTION TO CLOUD MANAGEMENT Oracle cloud supports several types of resource service models: Infrastructure
More informationWeb Services and Service Oriented Architectures. Thomas Soddemann, RZG
Web Services and Service Oriented Architectures, RZG Delaman Workshop 2004 Overview The Garching Supercomputing Center - RZG Diving into the world of Web Services Service Oriented Architectures And beyond
More informationEconomic Cloud Computing What to keep in mind when using the Cloud...
Economic Cloud Computing What to keep in mind when using the Cloud... Gary A. McGilvary edinburgh data.intensive research 1 OUTLINE 1. Introduction 2. Uncontrollable Cloud Costs 3. Human Factors Affecting
More informationGeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
More informationSilviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationBackground on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
More informationData Services @neurist and beyond
s @neurist and beyond Siegfried Benkner Department of Scientific Computing Faculty of Computer Science University of Vienna http://www.par.univie.ac.at Department of Scientific Computing Parallel Computing
More informationProfessional Hadoop Solutions
Brochure More information from http://www.researchandmarkets.com/reports/2542488/ Professional Hadoop Solutions Description: The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise
More informationGeneric Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationManaging the Data Center Using the JBoss Enterprise SOA Platform
Managing the Data Center Using the JBoss Enterprise SOA Platform Isaac Christoffersen Contributor, incommon, Inc 3 September 2009 1 Service Architectures Cloud Infrastructure SaaS Cloud Infrastructure
More informationDigital Preservation The Planets Way: Annotated Reading List
Digital Preservation The Planets Way: Annotated Reading List Compiled April 2010 The documents listed below, a variety of leaflets, papers and reports, supply an overview and introduction to the technical,
More informationLog Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationSoftware Development In the Cloud Cloud management and ALM
Software Development In the Cloud Cloud management and ALM First published in Dr. Dobb's Journal, February 2009: http://www.ddj.com/development-tools/212900736 Nick Gulrajani is a Senior Solutions Architect
More informationRenderbot Tutorial. Intro to AWS
Renderbot Tutorial Thanks for choosing to render your Blender projects in the cloud using Renderbot. This guide will introduce Amazon AWS, walk you through the setup process, and help you render your first
More information<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises
Infrastructure as a Service (IaaS) Cloud Computing for Enterprises Speaker Title The following is intended to outline our general product direction. It is intended for information
More informationCloud Computing. Summary
Cloud Computing Lecture 1 2011-2012 https://fenix.ist.utl.pt/disciplinas/cn Summary Teaching Staff. Rooms and Schedule. Goals. Context. Syllabus. Reading Material. Assessment and Grading. Important Dates.
More informationDigital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
More informationAquaLogic Service Bus
AquaLogic Bus Wolfgang Weigend Principal Systems Engineer BEA Systems 1 What to consider when looking at ESB? Number of planned business access points Reuse across organization Reduced cost of ownership
More informationMr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo
Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network
More information1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto
More informationA Big Data-driven Model for the Optimization of Healthcare Processes
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
More informationMyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration
MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science
More informationDuke University http://www.cs.duke.edu/starfish
Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists
More informationA. Document repository services for EU policy support
A. Document repository services for EU policy support 1. CONTEXT Type of Action Type of Activity Service in charge Associated Services Project Reusable generic tools DG DIGIT Policy DGs (e.g. FP7 DGs,
More informationIBM 000-281 EXAM QUESTIONS & ANSWERS
IBM 000-281 EXAM QUESTIONS & ANSWERS Number: 000-281 Passing Score: 800 Time Limit: 120 min File Version: 58.8 http://www.gratisexam.com/ IBM 000-281 EXAM QUESTIONS & ANSWERS Exam Name: Foundations of
More informationEuro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
More informationCloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi
Cloud Platforms, Challenges & Hadoop Aditee Rele Karpagam Venkataraman Janani Ravi Cloud Platform Models Aditee Rele Microsoft Corporation Dec 8, 2010 IT CAPACITY Provisioning IT Capacity Under-supply
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationHow To Cloud Compute At The Cloud At The Cyclone Center For Cnc
Cloud Computing at CDC Current Status and Future Plans Earl Baum March, 2014 1 Background Current Activities Agenda Use Cases, Shared Services and Other Considerations What s Next 2 Background Cloud Definition
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationand Deployment Roadmap for Satellite Ground Systems
A Cloud-Based Reference Model and Deployment Roadmap for Satellite Ground Systems 2012 Ground System Architectures Workshop February 29, 2012 Dr. Craig A. Lee The Aerospace Corporation The Aerospace Corporation
More informationUPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationOverview Motivation MapReduce/Hadoop in a nutshell Experimental cluster hardware example Application areas at the Austrian National Library
Overview Motivation MapReduce/Hadoop in a nutshell Experimental cluster hardware example Application areas at the Austrian National Library Web Archiving Austrian Books Online SCAPE at the Austrian National
More informationCloud Computing Training
Cloud Computing Training TechAge Labs Pvt. Ltd. Address : C-46, GF, Sector 2, Noida Phone 1 : 0120-4540894 Phone 2 : 0120-6495333 TechAge Labs 2014 version 1.0 Cloud Computing Training Cloud Computing
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationCloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A
Identifier: Date: Activity: Authors: Status: Link: Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A J O I N T A C T I O N ( S A 1, J R A 3 ) F I
More information