Hierarchy storage in Tachyon.
|
|
|
- Philippa Wiggins
- 9 years ago
- Views:
Transcription
1 Hierarchy storage in Tachyon Hierarchy storage in Tachyon... 1 Introduction... 1 Design consideration... 2 Feature overview... 2 Usage design... 2 System architecture... 3 Read/Write workflow in hierarchy storage... 3 Components... 5 Non-goal... 6 Constraints and Design tradeoffs... 7 Future plans... 7 Introduction Usually in most of the cases, memory space is insufficient to fit in all those hot data, which means some of the caching data will be flushed to next storage level while running out of space. Currently, Tachyon solves this problem by using CACHE_TRHOUGH write type, saving partial data in the memory space and entire data set persistent in the underlying file system with big enough space. This mechanism ensures the entire data set accessible and reliability while the memory is not big enough to hold them, but has two restrictions or constraints regarding the cost and performance by doing so. 1. To introduce certain network overhead potentially if using some distributed underlying file system with more replications. 2. Another constraint, if flushing all data set (including caching and non-caching parts) to the HDD in local file system mode, it causes huge performance penalty. The ideal way is to break data into two parts, one is on main memory and the rest is on certain high speed secondary storage devices with higher capacity. It would be better to leverage the hierarchy storage idea to provide some trade-off between capacity and performance. Some quick storage devices (like SSD) can be the secondary tier to provide certain balance between memory and HDD. The cache data can be flushed out to not only memory but also some external storage hierarchically. All the latest incoming data will be cached onto first tier. When its storage space is full, it will swap out the old data to its succession storage tier hierarchy. These are the major reasons to have hierarchy storage in Tachyon.
2 Design consideration Feature overview In general, the hierarchy storage in Tachyon introduces several more storage layers besides the existing single memory cache tier. The newly coming data is always cached onto top level storage (like memory) for quick speed. And if it runs out of space, the old data will be swap-out to its successor. The successor is recommended to have more storage spaces but with less read/write performance. And to retrieve the cached data, the end user can read those block files from any storage layer in the hierarchy storage of Tachyon. It helps to increase the cache spaces, and also have some read/write performance tradeoffs. Mem SSD HDD... Usage design In order to support the hierarchy storage in Tachyon, here introduces 2 sets of configuration items, which are related to worker conf and common conf respectively. Worker Conf 1. To configure the storage tiers and its storage directory list. The admin specifies the hierarchy storage tier by following configurations for each worker. Each storage level must have at least one storage folder. The maximum storage level for each worker is 5. You can enable 0 N(N < 5) storage tier in turn. Any non-configured layer will be omitted. The storage directories under single layer are delimited by comma. If those folders are not created, the worker tries to pre-create them accordingly during the initialization phase. data.level1.dirs= dir1, dir2 ; #high speed layer / small capacity data.level2.dirs= dir1, dir2 ; #medium speed layer / medium capacity data.level3.dirs= dir1, dir2 ; #low speed layer / large capacity 2. To configure the upper bound of storage capacity. Each storage layer has specific
3 quota for every storage dir which is separated by comma. If the cap configuration items mis-matched with storage folder number, it will be an invalid configuration. data.level1.dir.quota= 100G, 200G ; #high speed layer / small capacity data.level2.dir.quota= 500G,400G ; #medium speed layer / medium capacity data.level3.dir.quota= 1T, 1T ; #low speed layer / large capacity 3. To configure the storage tier s alias. The hierarchy storage introduces several storage layers, some of them may local, and some of them may leverage memory or some quick storage device. In order to tell if the current block is in memory or not, and make it compatible with the previous Tachyon version. It must specify storage alias accordingly. The default alias for each tier is unknown. Besides, by reading this alias, it somehow can translate the storage level to some readable words. data.level1.dir.alias= mem ; data.level2.dir.alias= ssd ; data.level3.dir.alias= hdd ; data.level4.dir.alias= hdfs ; data.level5.dir.alias= s3 ; The storage aliases are pre-defined in the Tachyon (mem, ssd, hdd, hdfs, s3, clusterfs). If you specify some unknown type, you need to register it beforehand. Otherwise, tachyon hierarchy store won t be recognize it. All those data stored in mem layer will be regarded as caching in memory, and IsInMemory() returns true. System architecture Read/Write workflow in hierarchy storage Every storage layer is expressed as StorageTier. Each worker maintains an array of StorageTier, and its children tier can be retrieved from array element. Every StorageTier consists of several StorageDirs, and every StorageDir requires either a StorageBlockReader or StorageBlockWriter to do the block file reading and writing. Different Reader and Writer implementation defines the concrete read/write operation behavior. The StorageDir can get proper StorageBlockReader or StorageBlockWriter by analyzing its dir path s scheme. Write While writing the cache data onto the local worker, the client firstly requests the blockid from master side. And then requests the storage space to that local worker with that blockid. The worker side always tries to request that space in the very first storage tier (level0). a) Randomly pickup one StorageDir to see if it meets the target, b) If success, just return back that StorageDir to client side.
4 c) Else, find the next neighbor StorageDir until one can fit that requested block size, and go back with that stop point. d) If no available spaces in all StorageDir of that layer, worker swap out some block files (based on certain elimination algorithm) to its child storage tier, until the free space is big enough for the new coming block file. e) Then the successor layer does the similar behavior starting from a). f) If it reaches the last storage layer, those eliminated files will be evicted. (or throw out some OutOfSpace exception) 1. RequestSpace and get StorageDir 2. getblockwriter().appendcurrentbuffer Worker Client Master StorageTier_0 StorageTier_1 Swap out heartbeat space counter StorageTier_2... Distributed File System StorageTier_3 Eviction Read While reading the data from certain StorageTier, the client needs to obtain the blockinfo(extended*) to get the location and storage information(i.e., storageids). The client requests that storage information along with the blockid to the local worker, and gets back StorageDir. Then obtain the StorageBlockReader accordingly and read that requested data. If no data found in local node, it sends the information to remote node, and receives the data remotely as usual. The remote worker will prepare the requested data and send it out by DataServer. If there needs further re-cache in the local worker while reading data from the network, the workflow follows the write operation mentioned previously.
5 Get clientblockinfo Worker Client Get blockinfo Master getblockreader().readbytebuffer StorageTier_0 StorageTier_1 StorageTier_2... Distributed File System StorageTier_3 Components Data store 1. StroageTier the storage tier which manages the caching blocks, the storage containers and their corresponding information (e.g., the capacity, the used/free space and etc.). The StorageTier is a linked-list-like structure. Every StorageTier points to its next StorageTier instance, unless it is the last level storage layer. The WorkerStorage only saves the reference of the frontier (i.e., the top level StorageTier), and requests the storage space and cache/free block through it. 2. StorageDir the storage container in every storage hierarchical layer, which provides the basic data manipulations and migrations. All the cache data reader and writer can be initiated based on its scheme in the StorageDir (more details can be found in next bullet - StorageBlockReader & StorageBlockWriter). And it also maintains the data migration between different layers or different containers in the same storage tier. For example, to move/copy a block file from one layer to another is quite common in the swap-out and eviction. So StorageDir provides some common way to support the data migration from one to the other. Basically, if the implementations of StorageDir are the same, it can call existing move/copy APIs in file system directly. If not, it will try to read from one StorageDir and write that input data to the destination. 3. StorageBlockReader & StorageBlockWriter the generic cache reader and writer interface, which defines the block read/write APIs. To read the cached block, the successor needs to implement ByteBuffer readbytebuffer(int offset, long length) ; To
6 write the block file, the successor needs to implement int appendcurrentbuffer(byte[] buf, int offset, int length). It is available to customize that StorageBlockReader and StorageBlockWriter to support different storage systems, like local file system, shared file system and etc. StorageTier -mworkerconf -mstoragelevel -mstoragedirs -mstoragedircapcities -mstoragedirfreespace -mstoragedirusedspace -mnexstoragetier +requestspace() +getstoragedir() +getstoragefile() +freeblocks() +storagetiereviction() StorageDir -mstoragedirname +getblockwriter() +getblockreader() +getfilepath() +_copyblock() +getblocklength() +existsblock() +deleteblock() +moveblock() +copyblock() «uses» «uses» «interface» StorageBlockReader +readbytebuffer() «interface» StorageBlockWriter +appendcurrentbuffer() StorageBlockReaderLocalFS StorageBlockReaderLocalFS Storage level information Hierarchy storage returns back more storage level information to hint if the block is mem-local, or ssd-local and so on. It helps the computation scheduler to allocate the resources more purposefully with better performance. Here to extend the exiting blockinfo by: 1. BlockInfo.getStorageIds - returns a list of storage ids for all storing nodes. Each storage id consists of the storage level and its alias, formatted as storagelevel_alias. For example, storageid of 0_mem means this block is stored in the first storage tier, and it s the memory storage layer. There can be other wrapper to feedback the storage information to certain computation framework based on their APIs. For example, following some API design in either Hadoop or Spark to report out if the block file is memory local based on the StorageId. 2. BlockInfo.IsInMemory - returns true if the block is stored in the storage tier named as mem. Non-goal The hierarchy won t replace the underlying file system which plays as the persistent storage pool beneath the Tachyon caching tier. Even the hierarchy storage layer setups on the persistent device, it is still regarded as a temp caching storage.
7 Constraints and Design tradeoffs 1. The blockinfo only knows which storage layer the file goes to. While reading the data from that storage layer, the worker needs to find certain block by checking all StorageDirs one by one. It may not so that efficient, but really avoids certain burdens in master node. Since all blockinfo are saved on the master node. More details will bring higher memory usage on that master. 2. Currently, master node only maintains the memory usage status for each worker. After adding the hierarchy storage mechanism, we need to add more usage messages for each storage layer on every worker. One solution is to maintain the total usage information as before. The other might be putting more details for each node. Future plans All the following future plans won t be covered in the first phase s implementation. 1. Async-eviction: Add threshold to avoid long time blocking during the space sweeping, for example if the free space is under 20%, do the data sweeping asynchronously. 2. Add more elimination algorithms besides LRU, and make it pluggable 3. Work as read cache. The user can decide the promote strategy, either none, exclusively or inclusive. None means the user read from that storage layer where it lives originally. Exclusively means there is no data overlap between each storage tier, if any data is promoted to the memory, it will be removed from its origin residence; Vice versa, we can keep some re-cached (promoted) data in its origin place, which means there will be 2 copies by choosing inclusive strategy. a) NONE, just read from the existing storage tier directly b) KEEP(inclusively), promote to ram and also keep original copy there c) SWAPIN(exclusively), promote to ram and also delete the original copy 4. Dynamic data movement in different storage layers from the statistics of hot and cold data. This helps to increase the cache hit ratio in the higher layers and decrease the data exchange between different layers. 5. Write/Read to/from the user specified storage layer
An Open Source Memory-Centric Distributed Storage System
An Open Source Memory-Centric Distributed Storage System Haoyuan Li, Tachyon Nexus [email protected] September 30, 2015 @ Strata and Hadoop World NYC 2015 Outline Open Source Introduction to Tachyon
Hadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
The Data Placement Challenge
The Data Placement Challenge Entire Dataset Applications Active Data Lowest $/IOP Highest throughput Lowest latency 10-20% Right Place Right Cost Right Time 100% 2 2 What s Driving the AST Discussion?
HDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
Integrating CoroSoft Datacenter Automation Suite with F5 Networks BIG-IP
Integrating CoroSoft Datacenter Automation Suite with F5 Networks BIG-IP Introducing the CoroSoft BIG-IP Solution Configuring the CoroSoft BIG-IP Solution Optimizing the BIG-IP configuration Introducing
Tivoli Storage Manager Explained
IBM Software Group Dave Cannon IBM Tivoli Storage Management Development Oxford University TSM Symposium 2003 Presentation Objectives Explain TSM behavior for selected operations Describe design goals
InterWorx Clustering Guide. by InterWorx LLC
InterWorx Clustering Guide by InterWorx LLC Contents 1 What Is Clustering? 3 1.1 What Does Clustering Do? What Doesn t It Do?............................ 3 1.2 Why Cluster?...............................................
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
RS MDM. Integration Guide. Riversand
RS MDM 2009 Integration Guide This document provides the details about RS MDMCenter integration module and provides details about the overall architecture and principles of integration with the system.
89 Fifth Avenue, 7th Floor. New York, NY 10003. www.theedison.com 212.367.7400. White Paper. HP 3PAR Adaptive Flash Cache: A Competitive Comparison
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 White Paper HP 3PAR Adaptive Flash Cache: A Competitive Comparison Printed in the United States of America Copyright 2014 Edison
Lecture 17: Virtual Memory II. Goals of virtual memory
Lecture 17: Virtual Memory II Last Lecture: Introduction to virtual memory Today Review and continue virtual memory discussion Lecture 17 1 Goals of virtual memory Make it appear as if each process has:
ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management
ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management Lab Exercise 1 Deploy 3x3 NoSQL Cluster into single Datacenters Objective: Learn from your experience how simple and intuitive
Ultimus and Microsoft Active Directory
Ultimus and Microsoft Active Directory May 2004 Ultimus, Incorporated 15200 Weston Parkway, Suite 106 Cary, North Carolina 27513 Phone: (919) 678-0900 Fax: (919) 678-0901 E-mail: [email protected]
Hadoop Distributed File System. Dhruba Borthakur June, 2007
Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle
Mambo Running Analytics on Enterprise Storage
Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin 1, Gokul Soundararajan Advanced Technology Group 1 University of Utah Motivation No easy way to analyze data stored in enterprise storage
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
Ankush Cluster Manager - Hadoop2 Technology User Guide
Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected
TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE
TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE Deploy a modern hyperscale storage platform on commodity infrastructure ABSTRACT This document provides a detailed overview of the EMC
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories
Data Distribution Algorithms for Reliable Parallel Storage on Flash Memories Zuse Institute Berlin November 2008, MEMICS Workshop Motivation Nonvolatile storage Flash memory - Invented by Dr. Fujio Masuoka
How To Install Powerpoint 6 On A Windows Server With A Powerpoint 2.5 (Powerpoint) And Powerpoint 3.5.5 On A Microsoft Powerpoint 4.5 Powerpoint (Powerpoints) And A Powerpoints 2
DocAve 6 Service Pack 1 Installation Guide Revision C Issued September 2012 1 Table of Contents About the Installation Guide... 4 Submitting Documentation Feedback to AvePoint... 4 Before You Begin...
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
HADOOP MOCK TEST HADOOP MOCK TEST I
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
I-Motion SQL Server admin concerns
I-Motion SQL Server admin concerns I-Motion SQL Server admin concerns Version Date Author Comments 4 2014-04-29 Rebrand 3 2011-07-12 Vincent MORIAUX Add Maintenance Plan tutorial appendix Add Recommended
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
CA Nimsoft Monitor. Probe Guide for Active Directory Server. ad_server v1.4 series
CA Nimsoft Monitor Probe Guide for Active Directory Server ad_server v1.4 series Legal Notices Copyright 2013, CA. All rights reserved. Warranty The material contained in this document is provided "as
CA Nimsoft Monitor. Probe Guide for Active Directory Response. ad_response v1.6 series
CA Nimsoft Monitor Probe Guide for Active Directory Response ad_response v1.6 series Legal Notices This online help system (the "System") is for your informational purposes only and is subject to change
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
Integrating Flash-based SSDs into the Storage Stack
Integrating Flash-based SSDs into the Storage Stack Raja Appuswamy, David C. van Moolenbroek, Andrew S. Tanenbaum Vrije Universiteit, Amsterdam April 19, 2012 Introduction: Hardware Landscape $/GB of flash
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2
Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...
Hadoop. History and Introduction. Explained By Vaibhav Agarwal
Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow
Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere
Nutanix Tech Note Configuration Best Practices for Nutanix Storage with VMware vsphere Nutanix Virtual Computing Platform is engineered from the ground up to provide enterprise-grade availability for critical
Private Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage [email protected]
Private Cloud Storage for Media Bang Chang Vice President, Broadcast Servers and Storage [email protected] Table of Contents Introduction Cloud Storage Requirements Application transparency Universal
CS 6343: CLOUD COMPUTING Term Project
CS 6343: CLOUD COMPUTING Term Project Group A1 Project: IaaS cloud middleware Create a cloud environment with a number of servers, allowing users to submit their jobs, scale their jobs Make simple resource
Setting up Remote Replication on SNC NAS Series
Setting up Remote Replication on SNC NAS Series Application Note Abstract This application note describes how to set up remote replication on SNC NAS systems. Table of Contents Product Models Covered by
File System Management
Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation
Amazon Cloud Storage Options
Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object
Deploying System Center 2012 R2 Configuration Manager
Deploying System Center 2012 R2 Configuration Manager This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Cluster APIs. Cluster APIs
Cluster APIs Cluster APIs Cluster APIs include: Cluster Control APIs Cluster Resource Group APIs Cluster Resource Group Exit Program Topics covered here are: Cluster APIs Cluster Resource Services Characteristics
PCVITA Express Migrator for SharePoint(Exchange Public Folder) 2011. Table of Contents
Table of Contents Chapter-1 ------------------------------------------------------------- Page No (2) What is Express Migrator for Exchange Public Folder to SharePoint? Migration Supported The Prominent
About the File Manager 2
This chapter describes how your application can use the to store and access data in files or to manipulate files, directories, and volumes. It also provides a complete description of all routines, data
FaxCore Ev5 Email-To-Fax Setup Guide
1 FaxCore Ev5 - Email-To-Fax Setup Guide Version 1.0.0 FaxCore Ev5 Email-To-Fax Setup Guide 2 FaxCore Ev5 - Email-To-Fax Setup Guide Contents The Email To Fax Setup Guide... 3 FaxCore Email Integration
Microsoft SMB File Sharing Best Practices Guide
Technical White Paper Microsoft SMB File Sharing Best Practices Guide Tintri VMstore, Microsoft SMB 3.0 Protocol, and VMware 6.x Author: Neil Glick Version 1.0 06/15/2016 @tintri www.tintri.com Contents
Ektron to EPiServer Digital Experience Cloud: Information Architecture
Ektron to EPiServer Digital Experience Cloud: Information Architecture This document is intended for review and use by Sr. Developers, CMS Architects, and other senior development staff to aide in the
D.N.A. 5.6 MANAGEMENT APPLICATIONS
D.N.A. 5.6 MANAGEMENT APPLICATIONS The D.N.A. suite of is composed of management specific and end user. The management allow administrators to maintain, monitor, and adjust configurations and data to maximize
Deploying Microsoft Operations Manager with the BIG-IP system and icontrol
Deployment Guide Deploying Microsoft Operations Manager with the BIG-IP system and icontrol Deploying Microsoft Operations Manager with the BIG-IP system and icontrol Welcome to the BIG-IP LTM system -
The Hadoop Distributed File System
The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture
Chapter 5 Linux Load Balancing Mechanisms
Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still
HDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Project Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008
Project Convergence: Integrating Data Grids and Compute Grids Eugene Steinberg, CTO May, 2008 Data-Driven Scalability Challenges in HPC Data is far away Latency of remote connection Latency of data movement
A Dell Technical White Paper Dell Compellent
The Architectural Advantages of Dell Compellent Automated Tiered Storage A Dell Technical White Paper Dell Compellent THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL
Talk Internet User Guides Controlgate Administrative User Guide
Talk Internet User Guides Controlgate Administrative User Guide Contents Contents (This Page) 2 Accessing the Controlgate Interface 3 Adding a new domain 4 Setup Website Hosting 5 Setup FTP Users 6 Setup
COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring 2014. HDFS Basics
COSC 6397 Big Data Analytics Distributed File Systems (II) Edgar Gabriel Spring 2014 HDFS Basics An open-source implementation of Google File System Assume that node failure rate is high Assumes a small
Printer Connection Manager
IT DIRECT Printer Connection Manager Information Technology Direct Limited PO Box 33-1406 Auckland NZ Table of Contents OVERVIEW...2 SETUP INSTRUCTIONS:...3 INSTALLATION...5 Install with New Settings.xml
Scaling Database Performance in Azure
Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned
Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2
Executive Summary Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2 October 21, 2014 What s inside Windows Server 2012 fully leverages today s computing, network, and storage
11.1. Performance Monitoring
11.1. Performance Monitoring Windows Reliability and Performance Monitor combines the functionality of the following tools that were previously only available as stand alone: Performance Logs and Alerts
Virtual desktops made easy
Product test: DataCore Virtual Desktop Server 2.0 Virtual desktops made easy Dr. Götz Güttich The Virtual Desktop Server 2.0 allows administrators to launch and maintain virtual desktops with relatively
This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
Getting Started with SandStorm NoSQL Benchmark
Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,
Veritas Cluster Server
APPENDIXE This module provides basic guidelines for the (VCS) configuration in a Subscriber Manager (SM) cluster installation. It assumes basic knowledge of the VCS environment; it does not replace the
CA Nimsoft Monitor. Probe Guide for Sharepoint. sharepoint v1.6 series
CA Nimsoft Monitor Probe Guide for Sharepoint sharepoint v1.6 series Legal Notices This online help system (the "System") is for your informational purposes only and is subject to change or withdrawal
Intellicus Cluster and Load Balancing (Windows) Version: 7.3
Intellicus Cluster and Load Balancing (Windows) Version: 7.3 Copyright 2015 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not
ICE for Eclipse. Release 9.0.1
ICE for Eclipse Release 9.0.1 Disclaimer This document is for informational purposes only and is subject to change without notice. This document and its contents, including the viewpoints, dates and functional
Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology
Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology Evaluation report prepared under contract with NetApp Introduction As flash storage options proliferate and become accepted in the enterprise,
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS Java EE Components Java EE Vendor Specifications Containers Java EE Blueprint Services JDBC Data Sources Java Naming and Directory Interface Java Message
How To Set Up An Intellicus Cluster And Load Balancing On Ubuntu 8.1.2.2 (Windows) With A Cluster And Report Server (Windows And Ubuntu) On A Server (Amd64) On An Ubuntu Server
Intellicus Cluster and Load Balancing (Windows) Intellicus Enterprise Reporting and BI Platform Intellicus Technologies [email protected] www.intellicus.com Copyright 2014 Intellicus Technologies This
Hyper-V Protection. User guide
Hyper-V Protection User guide Contents 1. Hyper-V overview... 2 Documentation... 2 Licensing... 2 Hyper-V requirements... 2 2. Hyper-V protection features... 3 Windows 2012 R1/R2 Hyper-V support... 3 Custom
ZCP 7.0 (build 41322) Zarafa Collaboration Platform. Zarafa Archiver Deployment Guide
ZCP 7.0 (build 41322) Zarafa Collaboration Platform Zarafa Archiver Deployment Guide Zarafa Collaboration Platform ZCP 7.0 (build 41322) Zarafa Collaboration Platform Zarafa Archiver Deployment Guide Edition
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver
Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization
Lesson Objectives To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization AE3B33OSD Lesson 1 / Page 2 What is an Operating System? A
5 HDFS - Hadoop Distributed System
5 HDFS - Hadoop Distributed System 5.1 Definition and Remarks HDFS is a file system designed for storing very large files with streaming data access patterns running on clusters of commoditive hardware.
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
Avid. Avid Interplay Web Services. Version 2.0
Avid Avid Interplay Web Services Version 2.0 Table of Contents Overview... 1 Interplay Web Services Functionality... 2 Asset Management... 2 Workflow Enhancement... 3 Infrastructure... 3 Folder Listing...
(Refer Slide Time: 02:17)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #06 IP Subnetting and Addressing (Not audible: (00:46)) Now,
E-mail Listeners. E-mail Formats. Free Form. Formatted
E-mail Listeners 6 E-mail Formats You use the E-mail Listeners application to receive and process Service Requests and other types of tickets through e-mail in the form of e-mail messages. Using E- mail
FioranoMQ 9. High Availability Guide
FioranoMQ 9 High Availability Guide Copyright (c) 1999-2008, Fiorano Software Technologies Pvt. Ltd., Copyright (c) 2008-2009, Fiorano Software Pty. Ltd. All rights reserved. This software is the confidential
Bigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
XCP APP FAILOVER CONFIGURATION FOR WEBLOGIC CLUSTER AND APACHE WEBSERVER
XCP APP FAILOVER CONFIGURATION FOR WEBLOGIC CLUSTER AND APACHE WEBSERVER ABSTRACT This white paper deals with the explanation of configuration of failover of xcp application session across nodes of weblogic
Using NCache for ASP.NET Sessions in Web Farms
Using NCache for ASP.NET Sessions in Web Farms April 22, 2015 Contents 1 Getting Started... 1 1.1 Step 1: Install NCache... 1 1.2 Step 2: Configure for Multiple Network Cards... 1 1.3 Step 3: Configure
Distributed Systems (CS236351) Exercise 3
Distributed Systems (CS236351) Winter, 2014-2015 Exercise 3 Due date: 11/1/15, 23:59 1 System overview In this exercise, you are going to develop another version of the basic resource management service,
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
Workflow Templates Library
Workflow s Library Table of Contents Intro... 2 Active Directory... 3 Application... 5 Cisco... 7 Database... 8 Excel Automation... 9 Files and Folders... 10 FTP Tasks... 13 Incident Management... 14 Security
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
Using Oracle NoSQL Database
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Using Oracle NoSQL Database Duration: 4 Days What you will learn In this course, you'll learn what an Oracle NoSQL Database is,
Big Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
QStar White Paper. Tiered Storage
QStar White Paper Tiered Storage QStar White Paper Tiered Storage Table of Contents Introduction 1 The Solution 1 QStar Solution 3 Conclusion 4 Copyright 2007 QStar Technologies, Inc. QStar White Paper
Tushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
