Hadp n Nutanix Reference Architecture
Cpyright 2012 Nutanix, Inc. Nutanix, Inc. 1735 Technlgy Drive, Suite 575 San Jse, CA 95110 All rights reserved. This prduct is prtected by U.S. and internatinal cpyright and intellectual prperty laws. Nutanix is a trademark f Nutanix, Inc. in the United States and/r ther jurisdictins. All ther marks and names mentined herein may be trademarks f their respective cmpanies.
Table f Cntents 1. Executive Summary... 5 2. Intrductin... 6 2.1. Audience... 6 2.2. Purpse... 6 3. Slutin Overview... 7 4. Slutin Design... 12 4.1. Hadp MapReduce the Traditinal Way... 12 4.2. Hadp MapReduce the Nutanix Way... 14 4.3. Nutanix Cmpute / Strage... 16 4.4. Netwrk... 17 5. Validatin & Benchmarking... 18 6. Further Research... 20 7. Cnclusin... 21 8. Appendix: Cnfiguratin... 22 9. References... 26 9.1. Table f Figures... 26 9.2. Table f Tables... 26 10. Abut the Authr... 27 11. Abut Nutanix... 28 12. Acknwledgements... 29
This page left intentinally blank.
1. Executive Summary The Nutanix Cmplete Cluster is a scalable virtualizatin slutin fr Desktp and Server, and Hadp Virtualizatin. This dcument shws the design and scaling f Hadp MapReduce n the Nutanix Distributed File System (NDFS). It shws the scalability f the Nutanix Cmplete Cluster and prvides detailed perfrmance and cnfiguratin infrmatin n the scale-ut capabilities f the cluster. The Nutanix Cmplete Cluster cnsists f mdular Blcks that include Cmpute, Strage and Netwrk. This design greatly reduces cst while increasing perfrmance and scalability. 5 The Nutanix Distributed File System (NDFS), the cre f Nutanix s Cmplete Cluster, tethers high perfrmance slid state strage directly t enterprise applicatins while preserving the high-capacity the SATA HDD tier prvides thrugh its adaptive infrmatin lifecycle management (ILM) capabilities. NDFS amplifies the pwer f server attached flash (NYSE:FIO) in the realm f enterprise virtualizatin by c-lcating high perfrmance, lcalized strage IO, and Ggle-like, scale-ut distributed redundancy via high speed 10GbE tp-f-rack switches. The base cluster ships with fur industry-standard x86 servers bundled with VMware's hypervisr in a 2U, 75-lb, SAN-Free server appliance. The slutin and testing prvided in this dcument was cmpleted with Hadp 0.20 deplyed n VMware vsphere n the Nutanix Cmplete Cluster. Simplified. Sftware Defined. Nutanix H
2. Intrductin 2.1. Audience This reference architecture dcument is part f the Nutanix Slutins Library and is intended fr use by individuals respnsible fr architecting, designing, managing and/r supprting Nutanix infrastructures. Cnsumers f this dcument shuld be familiar with cncepts pertaining t VMware vsphere, Hadp MapReduce, and Nutanix. We have brken dwn this dcument t address t key items fr each rle fcusing n the enablement f a successful design, implementatin and transitin t peratin. 2.2. Purpse This dcument will cver the fllwing subject areas: Overview f the Nutanix slutin Overview f Hadp and its use-cases The benefits f virtualizing Hadp n Nutanix Architecting a cmplete Hadp MapReduce slutin n the Nutanix Platfrm Design and cnfiguratin cnsideratins when architecting a Hadp slutin n Nutanix Benchmarking MapReduce perfrmance n Nutanix If yu re lking fr a high-level verview and backgrund n the slutin cntinue with the Slutin Overview sectin belw. If yu re lking fr the detailed slutin jump HERE 6 Hadp n Nutanix
3. Slutin Overview What is the Nutanix Architecture? Nutanix Cmplete Cluster is a scale-ut cluster f high-perfrmance ndes, r servers, each running a standard hypervisr and that cntains prcessrs, memry and lcal strage, including SSDs) and hard disk drives. Each nde runs virtual machines just like a standard virtual machine hst. In additin, lcal strage frm all ndes is virtualized int a unified pl by Nutanix Scale-ut Cnverged Strage (SOCS) (Figure 1). In effect, SOCS acts like an advanced SAN that uses lcal SSDs and disks frm all ndes t stre virtual machine data. Virtual machines running n the cluster write data t SOCS as if they were writing t a SAN. SOCS is VM-aware and prvides advanced data management features. It brings data clser t virtual machines by string the data lcally n the system, resulting in higher perfrmance at a lwer cst. Nutanix Cmplete Cluster can hrizntally scale frm a few ndes t a large number f ndes, enabling rganizatins t scale their infrastructure as their needs grw. 7 Figure 1 Nutanix Architecture Nutanix Distributed File System (NDFS) is at the heart f Nutanix clustering technlgy. Inspired by the Ggle File System, NDFS delivers a unified pl f strage frm all ndes acrss the cluster, leveraging techniques including striping, replicatin, aut-tiering, errr detectin, fail-ver and autmatic recvery. This pl can then be sliced and diced t be presented as shared-strage resurces t VMs fr seamless supprt f features like vmtin, HA and DRS, alng with industry-leading data management features. Additinal ndes can be added in a plug and-play manner in this high-perfrmance scale-ut architecture t build a cluster that will easily grw as yur needs d. H
What is Hadp? 1 The Apache Hadp prject develps pensurce sftware fr reliable, scalable, distributed cmputing and prcessing f data. The Apache Hadp sftware library is a framewrk that allws fr the distributed prcessing f large data sets acrss clusters f ndes using a simple prgramming mdel. It is designed t scale up frm a single nde t thusands f ndes, each ffering lcal cmputatin and strage. Rather than depending n reliable hardware t deliver high-availability and faulttlerance, the library itself is designed t detect and handle failures at the applicatin layer, delivering a highly-available service n tp f a cluster f cmputers, each f which may be prne t failures. Hadp is available in a number f different distributins, sme f which we ve highlighted belw: Distributin Cnsideratins & Thughts Apache Hadp Cludera Hrtnwrks MapR Main Hadp pen-surce initiative Cnstant updates Lack f fficial supprt Lack f deplyment mdel Official Supprt Prvisining & Management (Cludera Manager) Stable Releases Hadp Versin 0.23 in CDH4 Official Supprt Prvisining & Management (Ambari) Stable Releases NameNde / Jb Tracker HA NFS / Maprfs API 2 N NameNde Table 1 Hadp Distributin Cnsideratins Other related prjects: Prject Rle Serengeti Open-surce initiative fr prvisining virtual Hadp: LINK Apache Pig High-level data flw language: LINK Apache Hive SQL-like language and metadata repsitry: LINK Apache Zkeeper Apache HBase Apache Flume Apache Sqp High-reliable distributed crdinatin service: LINK Scalable recrd and table strage with real-time read/write access: LINK Service fr cllecting and aggregating lg and event data: LINK Data transprt engine fr relatinal database integratin: LINK 8 Hadp n Nutanix
Apache Mahut Library f machine learning algrithms fr Hadp: LINK Table 2 Hadp Related Prjects What des Hadp mean fr my business? 3 The amunt f data generated in businesses is grwing expnentially. Data n business transactins, custmer interactins, web lg data etc. is accumulating at an increasing at an unrelenting pace. Turning that data int infrmatin and actinable insights is becming mre difficult as methds based n traditinal databases are generally hard t scale up t tera- r petabyte scale. 9 Ggle was ne f the first cmpanies that recgnized this and develped an internal distributed cmpute architecture (MapReduce) and distributed file system (Ggle File System r GFS) that allws the massively parallel prcessing f large amunts f data. It scales up easily by increasing the number f cmputatinal ndes. Hadp is an Apache pen surce prject that made this cmputatinal paradigm available fr everyne, with Yah being ne f its largest cntributrs and heavy internal user. Hadp enables distributed parallel prcessing f huge amunts f data acrss inexpensive, cmmdity servers cnfigured int Hadp clusters. With Hadp, n data is t big. And in tday s hypercnnected wrld where peple and businesses are creating mre and mre data every day, Hadp s ability t grw virtually withut limits means businesses and rganizatins can nw unlck ptential value frm all their data. Figure 2 Business Data Surces T date, mst rganizatins created their Hadp clusters by dedicating and managing a separate set f physical cmpute ndes and strage. Hwever, this created bth sme management and capital expense dwnsides. H
Why virtualize Hadp ndes? Increased perfrmance: Virtualized Hadp ndes have been prven t perfrm better than their bare-metal cusins 4 Hardware utilizatin: Bare-metal Hadp deplyments average 10-20% CPU utilizatin, a majr waste f hardware resurces and datacenter space. Virtualizing Hadp allws fr better hardware utilizatin and flexibility Elastic MapReduce and scaling: Dynamic additin and remval f Hadp ndes based n lad allw yu t scale based upn yur current needs, nt what yu expect. Enable supply and demand t be in true synergy Allw DevOps & IT Ops t live in harmny: Big Data scientists demand perfrmance, reliability, and a flexible scale mdel. IT Ops relies n virtualizatin t tame server sprawl, increase utilizatin, encapsulate wrklads, manage capacity grwth, and alleviate disruptive utages caused by hardware dwntime. By virtualizing Hadp, Data Scientists and IT Ops mutually achieve all bjectives while preserving autnmy and independence fr their respective respnsibilities Sandbxing f jbs Make Hadp and Enterprise Apps play nice: Buggy MapReduce jbs can quickly saturate hardware resurces, creating havc fr remaining jbs in the queue. Virtualizing Hadp clusters encapsulates and sandbxes MapReduce jbs frm ther imprtant srting runs and general purpse wrklads Batch Scheduling & Stacked wrklads: Allw all wrklads and applicatins t c-exist, e.g. Hadp, Virtual Desktps and Servers. Schedule MapReduce jb runs during ffpeak hurs t take advantage f idle night time and weekend hurs that wuld therwise g t waste r utilize VMware s resurce pling features t run cncurrently New Hadp ecnmics: Bare metal implementatins are expensive and can spiral ut f cntrl. Dwntime and underutilized CPU cnsequences f physical server s wrklads can jepardize prject viability. Virtualizing Hadp reduces cmplexity and ensures success fr sphisticated prjects with a scale-ut grw as yu g mdel a perfect fit fr Big Data prjects Service-defined tiering: Maintain and manage SLAs with resurce priritizatin and reservatins Why run Hadp MapReduce n Nutanix? Blazing fast perfrmance: Up t 2,000 MB/s f sequential thrughput in a cmpact 2U 4- nde cluster. A TeraSrt benchmark yields 250 MB/s in the same 2U cluster Unified data platfrm: Run multiple data prcessing platfrms alng with Hadp MapReduce n a single unified data platfrm (NDFS) High Availability: With HDFS the NameNde is a single pint f failure. Nutanix has builtin high-availability and replicatin features t secure all pieces f Hadp data. With Nutanix yu eliminate any single pints f failure Change Management: Maintain envirnmental cntrl and separatin between develpment, test, staging and prductin envirnments. Nutanix snapshts and fast clnes can help in sharing prductin data with nn-prductin jbs, withut requiring full cpies and unnecessary data duplicatin 10 Hadp n Nutanix
Business Cntinuity and Data Prtectin: If yur Hadp cluster is gradually becming missin critical, it needs all the enterprise-grade data management features including backup and DR. With Nutanix these are already prvided ut f the bx and can be managed the same as wuld be fr virtual envirnments Data Archiving: Older data that is nt heavily used by jbs desn t need t fit n the highest strage tier. With Nutanix, ILM this will autmatically be mved dwn the tiers during times f inactivity and mved back up the tiers in time f heavy access. Alng with cmpressin this ffers the ideal cmbinatin f perfrmance alng with capacity Flash SSDs fr NSQL: The summaries that rll up t a NSQL database like HBase are used t run business reprts and are typically memry and IOPS-heavy. Nutanix has PCIe SSD and SATA SSD tiers cupled alng with dense memry capacities. With its heat-ptimized tiering technlgy can transparently bring IOPS-heavy wrklads t SSD tiers Enterprise-grade cluster management: An Apple-like apprach t managing large clusters, including a cnverged GUI that serves as a single pane f glass fr servers and strage, alert ntificatins, and bnjur mechanism t aut-detect new ndes in the cluster. Spend mre time enhancing yur envirnment, nt maintaining it High-density Hadp: Nutanix uses a hyperscale server architecture in which 8 sckets f Intel and up t 1TB f memry fit in a single 2U spread ver 4 mtherbards. Cupled with data archiving and cmpressin, Nutanix can reduce Hadp hardware ftprints by up t 4x Time-sliced clusters: Like public clud EC2 envirnments, Nutanix can prvide a truly cnverged clud infrastructure - Allwing yu t run yur server and desktp virtualizatin service alng with Hadp n a single cnverged clud. Get the efficiency and savings yu require with a cnverged clud n a truly cnverged architecture 1 1 Nutanix enables yu t run multiple-slutins all n the same cnverged infrastructure, while nly enhancing perfrmance and cnslidatin: Big Data Private Clud End-User Cmputing H
4. Slutin Design 4.1. Hadp MapReduce the Traditinal Way Yu shuld cnsider this apprach if: Yu have strict requirements requiring HDFS The Hadp Architecture cnsists f tw main services: the data layer (HDFS) and the parallel prcessing layer (MapReduce). HDFS is cmpsed f tw main cmpnents: 1) The NameNde and 2) the DataNde. The NameNde is respnsible fr all metadata peratins as well as maintaining the metadata fr the namespace. In the event f a NameNde failure, HDFS will be unavailable. The DataNde is respnsible fr the actual data I/O. In bare-metal deplyment the NameNde is a single instance which can be susceptible t failure; hwever this can be highly available in virtual cnfiguratins. MapReduce is cmpsed f tw main cmpnents: 1) The Jb Tracker and 2) the Task Tracker. The Jb Tracker is respnsible fr the scheduling and management f jb submissin and queue management. The Task Tracker is respnsible fr the actual MapReduce and task management. In bare-metal deplyments the Jb Tracker is a single instance which can be susceptible t failure; hwever this can be again highly available in virtual cnfiguratins. Belw we shw an image f hw these services layers are cmpsed and hw the clients interact with them: HDFS MetaData Ops NameNde RF=3 Client MR Jb Read / Write MapReduce JbTracker MetaData (Name, replicas,..) Task Attempt Task Result Nde 1 Nde 2 Nde N TaskTracker DataNde TaskTracker DataNde... TaskTracker DataNde Replicatin Replicatin Figure 3 Hadp Flw with HDFS 12 Hadp n Nutanix
Ubuntu OS 2TB VMDK 2TB VMDK 2TB VMDK 2TB VMDK 2TB VMDK 2TB VMDK 2TB VMDK 2TB VMDK /mnt/temp/ d1 /mnt/temp/ d2 /mnt/temp/ d3 /mnt/temp/ d4 /mnt/data/ d1 /mnt/data/ d2 /mnt/data/ d3 /mnt/data/ d4 /hadp/ lcal/d1 /hadp/ lcal/d2 /hadp/ lcal/d3 /hadp/ lcal/d4 /hadp/ data/d1 /hadp/ data/d2 /hadp/ data/d3 /hadp/ data/d4 Hadp can be deplyed n Nutanix the same way as it wuld be deplyed in any virtual scenari. In this case the DataNdes wuld munt VMDKs hsted n Nutanix NDFS. This des prvide multiple layers f abstractin fr strage as HDFS wuld be running n NDFS, hwever des prvide full Hadp functinality and the HDFS API. Belw we shw the relatinship between a Nutanix Nde, SVM and Hadp nde. Here yu can see the Hadp nde is running HDFS with VMDKs hsted n Nutanix NDFS. In additin, this als ensures all DataNde IO t its VMDKs is always ging t the lcal SVM: 1 3 Figure 4 Hadp Layers with HDFS Belw we shw a mre detailed cnfiguratin f hw the Hadp nde is running HDFS n NDFS via its lcal SVM: RF1 CTR TEMP RF2 CTR DATA Mapred Lcal Mapred DFS RF2 CTR VM iscsi NFS Nutanix NDFS Nutanix SVM PV-SCSI Cntrller Hadp VM 1 LSI SCSI Cntrller... Hadp VM n 0:0 NFS Datastre NFS Datastre NFS Datastre Nutanix Nde 2 CPU @ 8 Cre (32 Lgical w/ HT) 256 GB Memry Figure 5 Hadp Nde Architecture with HDFS H
4.2. Hadp MapReduce the Nutanix Way Yu shuld cnsider this apprach if: Yu d nt have strict requirements requiring HDFS Are lking fr the best pssible perfrmance Are lking fr a truly unified data platfrm Are lking fr simplified cnfiguratin and management Nutanix prvides the ideal cmbinatin f cmpute and high-perfrmance lcal strage; prviding the best pssible architecture fr Hadp and ther distributed applicatins. Nutanix prvides a highly distributed filesystem with replicatin ut f the bx; enabling us t remve the HDFS services layer and platfrm MapReduce directly n ur native filesystem. With the Nutanix Hadp slutin yu als get the benefit t run a mixed wrklad. Use the slutin fr VDI and Server Virtualizatin during the day and run batch MapReduce n the same envirnment later. Belw we shw an updated image f hw these services layers are cmpsed and hw the clients interact with them: Client Read / Write MR Jb MapReduce JbTracker Task Attempt Task Result Nde 1 Nde 2 Nde N TaskTracker SVM TaskTracker Nutanix SVMNDFS... TaskTracker SVM Replicatin Replicatin Figure 6 Hadp Flw with NDFS As yu can see frm the image abve, we ve remved the HDFS services layer and have the Task Trackers interacting directly with NDFS. Nutanix has data lcality and ILM lgic t ensure the Task is always interacting with lcal data ensuring the highest pssible perfrmance. This greatly simplifies the cnfiguratin f Hadp and quickly allws yu t get t what yur real fcus is - efficient parallel prcessing f massive scale datasets. 14 Hadp n Nutanix
Belw we shw the relatinship between a Nutanix Nde, SVM and Hadp nde which speaks t its lcal SVM via NFS n a dedicated vswitch. This cnfiguratin is unifrm amngst all Hadp ndes and allws us t further increase the ease f cnfiguratin. 1 5 Figure 7 Hadp Layers with NDFS Belw we shw a mre detailed cnfiguratin f hw the Hadp nde is munting NDFS via its lcal SVM: /mnt/lcal Hadp VM 1 Linux OS /hadp/ lcal/ /hadp/ data/ RF2 CTR DATA RF1 CTR TEMP Mapred Lcal Mapred DFS RF2 CTR VM iscsi NFS Nutanix NDFS /mnt/dfs Nutanix SVM PV-SCSI Cntrller... Hadp VM n 0:0 NFS Datastre Nutanix Nde 2 CPU @ 8 Cre (32 Lgical w/ HT) 256 GB Memry Figure 8 Hadp Nde Architecture with NDFS As yu can see frm the abve, platfrming Hadp MapReduce directly n NDFS greatly simplifies the cnfiguratin and management f a Hadp deplyment as cmpared t its HDFS relative. H
4.3. Nutanix Cmpute / Strage The Nutanix Cmplete Cluster prvides an ideal cmbinatin f bth high-perfrmance cmpute with lcalized strage with the agility t meet any demand. True t this capability we perfrmed zer re-cnfiguratin r custmizatin t the Nutanix prduct t ptimize fr the Hadp usecase. Belw is the Nutanix Strage Pl cnfiguratin: Name Rle Disk Tier(s) SP01 Main strage pl fr all data PCI-e SSD, SATA SSD, SATA Table 3 Nutanix Strage Pl Cnfiguratin Belw is the Nutanix Cntainer cnfiguratin: Name Rle Munted By CTR-RF2-VM-01 Cntainer fr all VMs ESXi - Datastre CTR-RF1-TEMP-01 Cntainer fr intermediate data ESXi - Datastre CTR-RF2-DATA-01 Cntainer fr utput data Hadp Nde Table 4 Nutanix Cntainer Cnfiguratin 16 Hadp n Nutanix
4.4. Netwrk Designed fr true linear scaling, we leverage a Leaf Spine netwrk architecture. A Leaf Spine architecture cnsists f tw netwrk tiers: A L2 Leaf and a L3 Spine based upn 40GbE and nn-blcking switches. In this architecture yu maintain cnsistent perfrmance withut any reductin due t scaling due t a static maximum f tw hps frm any nde in the netwrk. Belw we shw a design f a scale ut Leaf Spine netwrk architecture which prvides 20Gb active thrughput frm each nde t its L2 Leaf and scalable 80Gb active thrughput frm each Leaf t Spine switch. 1 7 In the image belw we shw hw a Leaf Spine architecture wuld scale frm 1 Nutanix Cmplete Blck t 1000s+ withut any impact t available bandwidth: Figure 9 Leaf Spine Netwrk Architecture H
5. Validatin & Benchmarking The industry standard TeraSrt benchmark 5 was used fr validating the Hadp MapReduce jb perfrmance n the Nutanix Cmplete Cluster. The slutin and testing prvided in this dcument was cmpleted with Cludera Hadp 3 Update 4 (CDH3U4) MapReduce n NDFS deplyed n VMware vsphere n the Nutanix Cmplete Cluster. Test Envirnment Cnfiguratin* Hardware: Strage / Cmpute: 2 Nutanix Cmplete Blcks Netwrk: Arista 7050Q / 7050S Series Switches Hadp Nde Cnfiguratin: OS: Ubuntu 10.04 LTS 1 x CDH3U4 Nde per Nutanix Nde 24 vcpu & 80 GB memry 4 x 2TB VMDKs fr intermediate data 1 x NDFS munt fr MapReduce Output *Mre infrmatin n the cnfiguratin can be fund in the Appendix Test Executin Prepare input data by running TeraGen w/ N 100 Byte Keys N = 10 Billin fr ~1TB N = 50 Billin fr ~5TB N = 100 Billin fr ~10TB Perfrm srt f data by running TeraSrt w/ input data set Perfrm validatin f data by running TeraValidate w/ utput data set Results The Nutanix Hadp Slutins prvides the highest density Hadp MapReduce perfrmance delivering perfrmance 2x s faster than the previus leader (HP 6 ) at 250MB/s srt thrughput per 2U blck. As the number f blcks scale the srt thrughput perfrmance scales linearly. 18 Hadp n Nutanix
Srt Thrughput - MB/s Srt Thrughput - MB/s In the graph belw we cmpare the TeraSrt thrughput f cmpeting slutins per 2U f rackspace. As yu can tell the Nutanix slutin s perfrmance is ver 200% that f the nearest cmpetitr shwing the density f an industry leading slutin: 300 250 2U Srt Thrughput Cmparisn 250 1 9 200 Nutanix HP 150 100 50 110 94 55 40 Micrsft Oracle SGI 0 Figure 10 2U Srt Thrughput Cmparisn 7 In the graph belw we cmpare the srt thrughput between the existing vendr slutins in the market tday as the slutin scales: 6000 Terasrt Thrughput Scale Cmparisn 5000 4000 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # f 2U Units Figure 11 TeraSrt Thrughput Scale Cmparisn Nutanix HP Micrsft Oracle SGI As yu can see, the Nutanix slutin prvides the densest pssible perfrmance as scalability. In a single rack (40U) yu can get ver 5 GB/s srt thrughput; ver duble f what the next leading slutin can prvide. H
6. Further Research As part f its cntinuus determinatin t deliver the best pssible slutins, Nutanix will cntinue t research int the fllwing areas: Perfrmance Optimizatins Scale Testing Distributin Specific Reference Architectures (i.e. Cludera, HrtnWrks, etc.) Deplyment integratins with Prject Serengeti API integratins and enhancements 20 Hadp n Nutanix
7. Cnclusin The Nutanix Hadp slutin prvides the best f many wrlds: industry leading perfrmance with the ability t run multiple mixed wrklads n a single cnverged architecture. Of the cnverged architectures in the market tday, nne can truly prvide the ptimal cnfiguratin fr all wrklads; hwever this is changing with Nutanix. The same platfrm that brught yu revlutinary virtual desktp perfrmance at massive scale can d the exact same with Hadp. 2 1 Belw we take a lk at the varius key items f the slutin frm key perspectives: Fr the Data Scientist Fr IT Best f class MapReduce perfrmance - Over 2x what the next leading cmpetitr prvides Native DFS supprt / integratin with Nutanix Distributed File System (NDFS) - Run MapReduce withut HDFS and its NameNde Run multiple data prcessing platfrms n a single unified data platfrm (NDFS) Simplified Hadp cnfiguratin and management Ease f integratin and management - manage it the same as yu d with existing virtual envirnments True linear scalability - incrementally scale yur envirnment granularly match demand as it grws Datacenter cnslidatin and cnvergence arund a single, simple prduct Fr the Business leader Increased speed f delivery - Get rid f the lng prcurement cycles Enable IT t be a strategic department Standardized, Simple and Scalable Nutanix prvides a revlutinary architecture enabling yu and yur business fr the future. H
8. Appendix: Cnfiguratin Hardware Strage / Cmpute 2 x Nutanix Cmplete Blck (8 ndes ttal) Nde Cnfiguratin CPU: Intel Xen X5650 @ 2.67 GHz Memry: 96 GB Memry Netwrk Arista 7050Q - L3 Spine Arista 7050S - L2 Leaf Sftware Nutanix Sftware build 2.6 OS Ubuntu 10.04 LTS Hadp Cludera Hadp 3 Update 4 (CDH3U4) VM Hadp Nde 1 Hadp Nde per Nutanix Nde (8 ttal) CPU: 24 vcpu Memry: 80 GB Strage: 1 x 20GB OS Disk n CTR-RF2-VM-01 NDFS backed NFS datastre 4 x 2TB TMP VMDKs n CTR-RF1-TEMP-01 NDFS backed NFS datastre 1 x 30TB NDFS munt t CTR-RF2-DATA-01 NDFS Cntainer Hadp Cnfiguratin Example w/ HDFS cre-site.xml <?xml versin="1.0"?> 22 Hadp n Nutanix
<?xml-stylesheet type="text/xsl" href="cnfiguratin.xsl"?> <cnfiguratin> <prperty> <name>fs.default.name</name> <value>hdfs://namende:prt</value> </prperty> 2 3 </cnfiguratin> mapred-site.xml <?xml versin="1.0"?> <?xml-stylesheet type="text/xsl" href="cnfiguratin.xsl"?> <cnfiguratin> <prperty> <name>mapred.jb.tracker</name> <value>jbtracker:prt</value> </prperty> <prperty> <name>mapred.lcal.dir</name> <value>/hadp/lcal/d1,/hadp/lcal/d2,/hadp/lcal/d3,/hadp/lcal /d4</value> </prperty> </cnfiguratin> hdfs-site.xml <?xml versin="1.0"?> <?xml-stylesheet type="text/xsl" href="cnfiguratin.xsl"?> <cnfiguratin> <prperty> <name>dfs.name.dir</name> <value>/hadp/data/d1/nn,/nfsmunt/dfs/nn</value> </prperty> <prperty> H
<name>dfs.data.dir</name> <value>/hadp/data/d1/dn,/hadp/data/d2/dn,/hadp/data/d3/dn,/hadp /data/d4/dn</value> </prperty> </cnfiguratin> Hadp Cnfiguratin Example w/ NDFS cre-site.xml <?xml versin="1.0"?> <?xml-stylesheet type="text/xsl" href="cnfiguratin.xsl"?> <cnfiguratin> <prperty> <name>fs.default.name</name> <value>file:///</value> </prperty> <prperty> <name>hadp.tmp.dir</name> <value>/hadp/data/tmp</value> </prperty> </cnfiguratin> mapred-site.xml <?xml versin="1.0"?> <?xml-stylesheet type="text/xsl" href="cnfiguratin.xsl"?> <cnfiguratin> <prperty> <name>mapred.jb.tracker</name> <value>jbtracker:prt</value> </prperty> <prperty> <name>mapred.lcal.dir</name> 24 Hadp n Nutanix
<value>/hadp/lcal</value> </prperty> <prperty> <name>mapred.system.dir</name> <value>/hadp/data/system</value> </prperty> 2 5 </cnfiguratin> H
9. References 9.1. Table f Figures Figure 1 Nutanix Architecture... 7 Figure 2 Business Data Surces... 9 Figure 3 Hadp Flw with HDFS...12 Figure 4 Hadp Layers with HDFS...13 Figure 5 Hadp Nde Architecture with HDFS...13 Figure 6 Hadp Flw with NDFS...14 Figure 7 Hadp Layers with NDFS...15 Figure 8 Hadp Nde Architecture with NDFS...15 Figure 9 Leaf Spine Netwrk Architecture...17 Figure 10 2U Srt Thrughput Cmparisn...19 Figure 11 TeraSrt Thrughput Scale Cmparisn...19 9.2. Table f Tables Table 1 Hadp Distributin Cnsideratins... 8 Table 2 Hadp Related Prjects... 9 Table 3 Nutanix Strage Pl Cnfiguratin...16 Table 4 Nutanix Cntainer Cnfiguratin...16 26 Hadp n Nutanix
10. Abut the Authr Steven Pitras is a Slutins Architect n the Technical Marketing team at Nutanix, Inc. In his rle, Steven helps design architectures cmbining applicatins with the Nutanix platfrm creating slutins helping slve critical business needs and requirements and disrupting the infrastructure space. Prir t jining Nutanix he was ne f the key slutin architects at the Accenture Technlgy Labs where he was fcused in the Next Generatin Infrastructure (NGI) & Next Generatin Datacenter (NGDC) dmains. In these spaces he has develped methdlgies, reference architectures and framewrks fcusing n the design and transfrmatin t agile, scalable, and cst effective infrastructures which can be cnsumed in a "service-riented" r clud like manner. 2 7 H
11. Abut Nutanix Nutanix is the first cmpany t ffer a radically simple cmpute and strage infrastructure fr implementing enterprise-class virtualizatin withut cmplex and expensive external netwrk strage (SAN r NAS). Funded in 2009 by a team that built scalable systems such as Ggle File System and enterprise-class systems such as Oracle Database/Exadata, Nutanix is based in San Jsé, Califrnia, and is backed by Lightspeed Venture Partners, Khsla Ventures and Blumberg Capital. 28 Hadp n Nutanix
12. Acknwledgements 2 9 H
30 Hadp n Nutanix
1 Apache Hadp. Apache. http://hadp.apache.rg/ 2 Because Hadp isn t Perfect: 8 ways t replace HDFS, GigaOm, http://gigam.cm/clud/becausehadp-isnt-perfect-8-ways-t-replace-hdfs/ 3 A Benchmarking Case Study f Virtualized Hadp Perfrmance n VMware vsphere 5. VMware. http://www.vmware.cm/files/pdf/vmw-hadp-perfrmance-vsphere5.pdf 3 1 4 A Benchmarking Case Study f Virtualized Hadp Perfrmance n VMware vsphere 5. VMware. http://www.vmware.cm/files/pdf/vmw-hadp-perfrmance-vsphere5.pdf 5 Srt Benchmark. http://srtbenchmark.rg/ 6 HP Unleashes the Pwer f Hadp. HP. http://www.hp.cm/hpinf/newsrm/press_kits/2012/hpdiscver2012/hadp_appliance_fact_sheet.p df 7 HP Unleashes the Pwer f Hadp. HP. http://www.hp.cm/hpinf/newsrm/press_kits/2012/hpdiscver2012/hadp_appliance_fact_sheet.p df MinuteSrt with Flat Datacenter Strage. Micrsft Research. http://srtbenchmark.rg/flatdatacenterstrage2012.pdf Sun Fire X2270 M2 Super-Linear Scaling f Hadp Terasrt and CludBurst Benchmarks. Oracle. https://blgs.racle.cm/bestperf/entry/20090920_x2270m2_hadp Increased Hadp Optimizatin Using Intel-based SGI Rackable Slutins. SGI. http://www.sgi.cm/pdfs/4333.pdf H