BIG DATA TECHNOLOGY ON RED HAT ENTERPRISE LINUX: OPENJDK VS. ORACLE JDK
|
|
|
- Patricia Dennis
- 10 years ago
- Views:
Transcription
1 BIG DATA TECHNOLOGY ON RED HAT ENTERPRISE LINUX: OPENJDK VS. ORACLE JDK Organizations that use Apache Hadoop to process very large amounts of data demand the best performance possible from their infrastructures. The Java Virtual Machine (JVM) you choose to pair with your operating system can affect how your Hadoop deployment performs, and can also create or alleviate management headaches for administrators. In the Principled Technologies labs, we compared the performance of a Hadoop deployment running on Red Hat Enterprise Linux using both OpenJDK and Oracle JDK. 1 We found that the open-source OpenJDK performed comparably to Oracle JDK on all Hadoop performance tests we ran. OpenJDK is distributed alongside the operating system and requires no additional actions by administrators, making it a natural choice for organizations wishing to standardize on open source software deployments that include Red Hat Enterprise Linux and Hadoop. 1 OpenJDK is a trademark of Oracle, Inc. MARCH 2014 A PRINCIPLED TECHNOLOGIES TEST REPORT Commissioned by Red Hat, Inc.
2 APACHE HADOOP PERFORMANCE AND THE JVM Apache Hadoop is an open-source, highly scalable, and reliable distributed data storage and analytics engine used for storing and processing massive amounts of information. Hadoop is highly available and performs advanced business analytics on data to help organizations make decisions. Core Hadoop components are mostly written in Java, and run on top of the JVM. In addition, the components include HDFS, the Hadoop Distributed File System; MapReduce, a functional data processing framework for share-nothing distributed data algorithms; and a number of Common utilities to integrate HDFS and MapReduce and support workload and job processing. Because Hadoop performance is closely tied to the performance of the underlying JVM, optimizing the JVM for lower latency and better execution speed is crucial. The choice of JVM can greatly affect the performance of the entire Hadoop deployment. In our hands-on tests, we explored the effect this JVM choice has on Hadoop performance, focusing on two market leaders, OpenJDK 7 and Oracle JDK 7. We evaluated their relative performance on a reference Hortonworks Hadoop Data Platform (HDP) 2.0 deployment running on Red Hat Enterprise Linux 6.5 on a six-node reference implementation (see Appendix A for system configuration information). We tested two areas within Hadoop where performance is critical and that are therefore of interest to both IT managers and Big Data scientists wishing to optimize standard Hadoop deployments: Core infrastructure Machine Learning (ML), essential for Big Data Analytics platforms To measure Hadoop performance, we used the Intel HiBench Sort and TeraSort benchmarks for measuring infrastructure performance as well as the Mahout Bayes classification and K-means clustering benchmarks for measuring performance of machine learning tasks. Because our goal was to measure only the relative performance of the JVMs for each test that we performed, we used near out-of-box configurations for both the operating system and Hadoop. In addition, we did not configure the HiBench workloads to saturate the resources of the servers, but we instead chose workloads that used 60 to 80 percent of the CPU resources. We expect additional operating system, Hadoop, and workload tunings to yield comparable performance boosts for both OpenJDK and Oracle JDK. For details about the hardware we used, see Appendix A. For step-by-step details on how we tested, see Appendix B. A Principled Technologies test report 2
3 Normalized Performance OpenJDK: Similar performance, better for management WHAT WE FOUND 1.2 For the Red Hat Enterprise Linux and Hortonworks HDP configurations we tested, we found that Hadoop workloads performed about the same with either the OpenJDK JVM or Oracle JVM. We expected this behavior based on other performance comparison tests we have reported on in the past. 2 From a manageability perspective, we found OpenJDK convenient for Red Hat Enterprise Linux administrators to use in their Hadoop deployments as it updates automatically as a component of the operating system. Figure 1 shows the Hadoop performance results we obtained for four HiBench benchmarks. They are plotted relative to the OpenJDK results for easy comparison. OpenJDK and Oracle JVM performed similarly across the four Hadoop workloads with all differences being under 8 percent. (See Figure 9 for details on the HiBench workloads.) Relative Hadoop Performance Sort TeraSort Bayes classifier K-means OpenJDK Oracle JDK Figure 1: Relative comparison of OpenJDK and Oracle JDK on four Hadoop workloads: Sort, TeraSort, Bayes Classification, and K-means Clustering (lower is better) 2 See our report Performance Comparison of Multiple JVM implementations with Red Hat Enterprise Linux 6 available at A Principled Technologies test report 3
4 Using statistics for both JVMs, Figures 2 through 5 shows server-resource usage, averaged over the four data nodes, during the median run of the Sort benchmark. We chose to present the resource usage for this test because the difference in benchmark duration was the greatest. DataNode Statistics with OpenJDK DataNode Statistics with Oracle JDK Figure 2: Comparison of average CPU usage during the Sort test for OpenJDK and Oracle JDK. The charts compare average CPU, average total disk reads, average total disk writes, and average outgoing network I/O. As can be observed, all performance graphs have similar shapes for both JDKs. DataNode Statistics with OpenJDK DataNode Statistics with Oracle JDK Figure 3: Comparison of average disk-reads in megabytes per second during the Sort test for OpenJDK and Oracle JDK. A Principled Technologies test report 4
5 DataNode Statistics with OpenJDK DataNode Statistics with Oracle JDK Figure 4: Comparison of average disk-writes in megabytes per second during the Sort test for OpenJDK and Oracle JDK. We have marked the two computational stages for this workload: the datapreparation/generation stage and the data-transformation/sorting stage. See section "Running one HiBench test" in Appendix B for more detail. On the other three workloads, the use of OpenJDK or Oracle JDK in conjunction with Red Hat Enterprise Linux resulted in similar performance across these Hadoop workloads. DataNode Statistics with OpenJDK DataNode Statistics with Oracle JDK Figure 5: Comparison of average incoming network I/O in megabytes per second during the Sort test for OpenJDK and Oracle JDK. A Principled Technologies test report 5
6 WHAT WE TESTED IN CONCLUSION For our tests, we used the YARN version of the Intel HiBench 2.2 suite to gauge performance of a HDP 2.0 deployment. At this time, only six of the nine benchmarks have been ported to YARN (that is, for Hadoop 2.0 and MapReduce version 2). Of these, we tested Sort, TeraSort, Bayes Classifier and K-means clustering. 3 To test the core infrastructure we used Sort and TeraSort. Both of these tests are typical real-world MapReduce jobs that transform data from one representation into another but differ in the volume of data transformed. The Sort test processes large amounts of textual data produced by the RandomTextWriter text data generator available in standard Hadoop. TeraSort is a Big Data version of Sort. It sorts 10 billion 100-byte records produced by the TeraGen generator program, also part of the standard Hadoop distribution. The other two tests we chose Bayes Classification and K-means Clustering, part of the Apache Mahout program suite, represent key machine learning tasks in Big Data Analytics that use the MapReduce framework. Bayes Classification implements the training component of Naïve Bayes, a classification algorithm used in data mining and discovery applications. It takes as input a large set of randomly generated text documents. K-means clustering implements the highly used and well-understood K- means clustering algorithm operating on a large set of randomly generated numerical multidimensional vectors with specific statistical distributions. OpenJDK is an efficient foundation for distributed data processing and analytics using Apache Hadoop. In our testing of a Hortonworks HDP 2.0 distribution running on Red Hat Enterprise Linux 6.5, we found that Hadoop performance using OpenJDK was comparable to the performance using Oracle JDK. Comparable performance paired with automatic updates means that OpenJDK can benefit organizations using Red Hat Enterprise Linux -based Hadoop deployments. 3 The YARN versions of Intel HiBench benchmarks are available in Intel s GitHub HiBench source code repository. You can download them from A Principled Technologies test report 6
7 APPENDIX A SYSTEM CONFIGURATION INFORMATION Figure 6 provides detailed configuration information for the test systems. System Dell PowerEdge R710 (Hadoop Master servers) Dell PowerEdge R710 (Hadoop Slave servers) Power supplies Total number 2 2 Vendor and model number Dell N870P-S0 Dell N870P-S0 Wattage of each (W) Cooling fans Total number 5 5 Vendor and model number Dell RK 385-A00 Dell RK 385-A00 Dimensions (h w) of each Volts Amps General Number of processor packages 2 2 Number of cores per processor 6 6 Number of hardware threads per core 2 2 CPU Vendor Intel Intel Name Xeon Xeon Model number X5670 X5670 Stepping Socket type FCLGA1366 FCLGA1366 Core frequency (GHz) Bus frequency (MHz) 3,200 3,200 L1 cache (KB) L2 cache (KB) L3 cache (KB) 12,288 12,288 Platform Vendor and model number Dell PowerEdge R710 Dell PowerEdge R710 Motherboard model number 00NH4P 00NH4P Motherboard chipset Intel 5520 Intel 5520 BIOS name and version BIOS settings Defaults + Power Setting to Maximum Performance Defaults + Power Setting to Maximum Performance Memory module(s) Total RAM in system (GB) Vendor and model number Hynix HMT31GR7BFR4A-H9 Samsung M393B1K70BH1-CH9 Type PC PC Speed (MHz) 1,333 1,333 Speed running in the system (MHz) 1,333 1,333 Timing/Latency (tcl-trcd-trptrasmin) A Principled Technologies test report 7
8 System Dell PowerEdge R710 Dell PowerEdge R710 (Hadoop Master servers) (Hadoop Slave servers) Size (GB) 8 8 Number of RAM module(s) 12 6 Chip organization Double-sided Double-sided Rank Dual Dual Hard disk Vendor and model number Seagate ST SS Western Digital WD300BKHG-18A29V0 Number of disks in system 6 8 Size (GB) Buffer size (MB) RPM 15K 10K Type SAS SAS Disk Controller Vendor and model Dell PERC 6/i Dell PERC 6/i Controller cache (MB) Controller Driver (Module) megaraid_sas megaraid_sas Controller Driver Version rh rh1 Controller firmware RAID configuration 3 RAID1 volumes of 2 disks each 1 RAID1 volume of 2 disks; 6 un-raided disks Operating system Name Red Hat Enterprise Linux 6.5 Red Hat Enterprise Linux 6.5 File system ext4 ext4 Kernel el6.x86_ el6.x86_64 Language English English Graphics Vendor and model number Matrox MGA G200eW WPCM450 Matrox MGA G200eW WPCM450 Graphics memory (MB) 8 8 Ethernet Vendor and model number 4 Broadcom 5709C NetXtreme II t 4 Broadcom 5709C NetXtreme II t 1GigE 1GigE Type PCI Express PCI Express Driver (Module) bnx2 bnx2 Driver Version Optical drive(s) Vendor and model number TEAC DVD-ROM DV28SV TEAC DVD-ROM DV28SV Type SATA SATA USB ports Number 4 external 4 external Type Figure 6: System configuration information for the test systems. A Principled Technologies test report 8
9 APPENDIX B HOW WE TESTED This section captures the installation and configuration of a six-node Hadoop cluster that has two master nodes and four data nodes. We followed closely the Red Hat Reference Architecture for running HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK. 4 One master node (named master1 ) ran the HDFS NameNode service. The second master node (named master2 ) was a back-up NameNode that ran the YARN resource manager and MapReduce job-history manager. The four data nodes (named data1, data2, data3, and data4 ) ran the HDFS DataNode service and the YARN distributed NodeManager. Figure 7: Schematic of the six-node Hadoop cluster used in our tests. The light blue lines denote the network connections for the Hadoop-cluster network. Similarly, the grey lines denote the server-management network. Figure 8 shows the hardware we used in our testing. We used six Dell PowerEdge R710 servers two master nodes and four data nodes. Each server had six or eight 300GB SAS disks, a PowerEdge Raid Controller (PERC), two Intel Xeon processors X5670, one four-port GbE NIC, and 48GB RAM. The resources required by the two master nodes differed from those of the data nodes, so we discussed those separately. 4 "Exploring the next generation of Big Data solutions with Hadoop 2: Deploying Hortonworks Data Platform 2.0 on Red Hat Enterprise Linux 6 with OpenJDK", version 1.2. A Principled Technologies test report 9
10 Figure 8: The physical hardware used for the six-node Hadoop cluster in our tests. Configuring the master nodes Each master node had 96GB RAM and six disks. The operating system was installed on a mirrored pair of disks (RAID 1). The remaining four disks were configured as two mirrored pairs because the Hadoop metadata stored therein must be protected from loss. Configuring the data nodes Each data node had 48GB RAM and eight disks. The operating system was installed on a mirrored pair of disks (RAID 1). We presented the remaining six disks to the system as JBODs (without RAID) to increase I/O by providing the maximum number of spindles for the configuration. Configuring networking We used two subnets: one for cluster traffic and one for access to the servers. Consequently, we used two network ports on each server. In our tests, the cluster subnet was /24 and the data subnet was /24. Installing Red Hat Enterprise Linux 6 The first step in setting up the six-node Hortonworks Data Platform 2.0 cluster is to install the Red Hat Enterprise Linux 6.5 operating system on the servers. A Principled Technologies test report 10
11 For each server, master and slave, install the Red Hat Enterprise Linux 6.5 operating system. Prior to starting the installation, you should have a designated list of host names and IP addresses. For definiteness, this section instructs how to install Red Hat Enterprise Linux 6.5 from the full installation DVD, but you could also install the operating system via PXE, kickstart file, or satellite server. 1. Insert the Red Hat Enterprise Linux 6.5 installation DVD into the server s DVD drive. 2. On the Welcome to Red Hat Enterprise Linux 6.5 screen, press Enter to boot the installer. 3. On the Welcome to Red Hat Enterprise Linux for x86_64 screen, select SKIP, and press Enter. 4. On the Red Hat Enterprise Linux 6 screen, click Next. 5. On the installation-language screen, select your language (this guide uses English), and click Next. 6. On the keyboard-selection screen, select the correct format, and click Next. 7. On the Storage Devices screen, select Basic Storage Devices, and click Next. 8. If a pop-up window appears with a Storage Device Warning, click Yes, discard any data. 9. On the Name This Computer screen, enter the server s name, and click Configure Network. 10. On the Network Connections pop-up screen, select your management interface (e.g., eth0), and click Edit. 11. On the Editing eth0 pop-up screen, check the Connect automatically box, and select the IPv4 Settings tab. Change Method to Manual. Click Add, and enter the IP address, Netmask, and Gateway. Enter the IP address of your DNS server. Click Apply to close the pop-up screen. 12. Back on the Network Connections pop-up screen, select the cluster-interconnect interface (e.g., eth1), and click Edit. 13. On the Editing eth1 pop-up screen, check the Connect automatically box, and select the IPv4 Settings tab. Change Method to Manual. Click Add, and enter the IP address, Netmask, and Gateway. Enter the IP address of your DNS server. Click Apply to close the pop-up screen. 14. On the Network Connections pop-up screen, click Close, and click Next. 15. On the Time Zone screen, select your time zone, and click Next. 16. On the administrator s password page, enter the root password, and click Next. 17. On the Installation Type screen, keep the default installation type, check the Review and modify partitioning layout box, and click Next. 18. On the Device assignment screen, select the server s OS disk from the list in the left column, and click the upper arrow to designate the disk as the target device. Click Next. 19. On the Review and Modify Partitioning screen, delete the /home logical volume and assign the resulting free space to the root partition. When finished, click Next. 20. On the Format Warnings pop-up screen, click Format. 21. On the Writing storage configuration to disk screen, click Write changes to disk. 22. On the Boot loader selection screen, keep the defaults, and click Next. 23. On the Red Hat Enterprise Linux software installation screen, select Basic Server, and click Next. 24. When the installation completes, click Reboot. Configuring the hosts This section shows how to perform post-installation configuration steps for all hosts at once by running a sequence of bash commands from one of the servers rather than configuring them one at a time. The post-installation configurations include: Configuring NTP Configuring the file systems Disabling unneeded services Configuring the tuned package Installing OpenJDK A Principled Technologies test report 11
12 Adding the IP address and hostnames to /etc/hosts Configuring NTP 1. The servers clocks must be synchronized to a common time source in order for the cluster to function properly. Modify /etc/ntp.conf to add the IP address of your local NTP server (for example, ) as in the following line: server iburst 2. Start the NTP daemon on each server; for example, chkconfig ntpd on service ntpd start Configuring the file system Partition, format, and mount the data disks on each server. The servers' operation and application filesystem is on disk /dev/sda. The Hadoop data disks on the master servers are /dev/sdb and /dev/sdc, and on the slave servers are /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf, and /dev/sdg. 1. On each node, partition and format the first data disk (sdb), and mount its filesystem on directory /grid01. parted /dev/sdb mklabel gpt parted /dev/sdb mkpart primary "1-1" mkfs.ext4 /dev/sdb1 mkdir /grid01 mount /dev/sdb1 /grid01 echo "/dev/sdb1 /grid01 ext4 defaults 0 0" >> /etc/fstab /dev/sdc1 /grid02 ext4 defaults On the master nodes, repeat step 1 for the second data disk (sdc). 3. On each slave node, repeat step 1 for the remaining data disks (sdc, sdd, sde, sdf, and sdg). Disabling unneeded services Disable unneeded services on each server. for i in autofs cups nfslock portreserve postfix rpcbind rpcgssd iptables ip6tables; do chkconfig $i off service $i stop done Configuring the tuned package On each server, install and configure the tuned package to use the enterprise-storage profile. yum install tuned tuned-adm profile enterprise-storage Installing OpenJDK To install HDP 2.0, we installed the OpenJDK platform. Instructions for installing the Oracle JDK for the performance test are in the section Changing the Java platform from OpenJDK to Oracle JDK. 1. On each server, install OpenJDK b51. yum install java openjdk java openjdk-devel 2. On each server, link the OpenJDK directory to /usr/java/default, the HDP default for JAVA_HOME. mkdir /usr/java ln -s /usr/lib/jvm/java openjdk *.x86_64 /usr/java/default Adding the IP address and hostnames to /etc/hosts We used the hosts file rather than a DNS server to provide name resolution. On each server, add the following to the end of the file /etc/hosts. A Principled Technologies test report 12
13 # cluster network master1 # name node master2 # secondary name node and YARN master data1 # data node data2 # data node daat3 # data node data4 # data node # management network master1m # name node master2m # secondary name node and YARN master data1m # data node data2m # data node daat3m # data node data4m # data node Installing Hortonworks Data Platform 2.0 on the servers For further detail on installing and configuring Hortonworks Data Platform, see "Installing HDP Manually" (chapters 2-4). 5 While it is possible to install Hortonworks Data Platform in other ways, the method we used has the advantage that the RPM installation creates the Hadoop user accounts and groups along with system configuration files controlling system resources. Installing Hadoop software 1. Download the HDP yum repo file from Hortonworks and copy it to each of the servers. wget 2. Install the Hadoop software (HDFS, YARN, MapReduce, Mahout, compression and SSL libraries) on each server. yum install \ hadoop hadoop-hdfs hadoop-libhdfs hadoop-client\ hadoop-yarn hadoop-mapreduce mahout\ openssl snappy snappy-devel lzo lzo-devel hadoop-lzo hadoop-lzo-native ln -sf /usr/lib64/libsnappy.so /usr/lib/hadoop/lib/native/. Server modifications for HDFS, YARN, and MapReduce 1. If necessary, delete the previous Hadoop data and log directories by running these commands on each server. rm -rf /grid0?/hadoop/ rm -rf /var/log/hadoop/hdfs rm -rf /var/log/hadoop/yarn rm -rf /var/log/hadoop/mapred rm -rf /var/run/hadoop/hdfs rm -rf /var/run/hadoop/yarn rm -rf /var/run/hadoop/mapred 2. On each server, create the first set of HDFS metadata directories under /grid01. mkdir -p /grid01/hadoop/hdfs/nn mkdir -p /grid01/hadoop/hdfs/snn chown -R hdfs:hadoop /grid01/hadoop/hdfs/nn chown -R hdfs:hadoop /grid01/hadoop/hdfs/snn chmod -R 755 /grid01/hadoop/hdfs/nn chmod -R 755 /grid01/hadoop/hdfs/snn 3. On each slave server, create the HDFS and YARN data directories under /grid01. 5 docs.hortonworks.com/hdpdocuments/hdp2/hdp /bk_installing_manually_book/content/index.html A Principled Technologies test report 13
14 mkdir -p /grid01/hadoop/hdfs/dn mkdir p /grid01/hadoop/yarn/local mkdir p /grid01/hadoop/yarn/logs chown -R hdfs:hadoop /grid01/hadoop/hdfs/dn chown -R yarn:hadoop /grid01/hadoop/yarn/local chown -R yarn:hadoop /grid01/hadoop/yarn/logs chmod -R 750 /grid01/hadoop/hdfs/dn chmod -R 750 /grid01/hadoop/yarn/local chmod -R 750 /grid01/hadoop/yarn/logs 4. On each master server, repeat step 2 for the second data directory, /grid On each slave server, repeat steps 2 and 3 for the remaining data directory, /grid02, /grid03, /grid04, /grid05, /grid On the second master server, create the directories for the YARN resource manager. mkdir -p /grid01/hadoop/yarn/local mkdir -p /grid01/hadoop/yarn/logs mkdir -p /grid02/hadoop/yarn/local mkdir -p /grid02/hadoop/yarn/logs chown -R yarn:hadoop /grid0?/hadoop/yarn/lo* chmod -R 750 /grid0?/hadoop/yarn/lo* 7. On all servers, add the YARN, and mapred users to the hdfs group. usermod -G hdfs yarn usermod -G hdfs mapred 8. On each server, correct the file ownership and permissions for a YARN health-check command. chown -R root:hadoop /usr/lib/hadoop-yarn/bin/container-executor chmod -R 6050 /usr/lib/hadoop-yarn/bin/container-executor Configuring Hadoop The Hadoop components, HDFS, YARN, and MapReduce, require configuration to specify the nodes and their roles as well as the location of each node's HDFS files. Hortonworks provides a sample set of configuration files that you can readily modify for your cluster. You can always find the up-to-date Internet location of the tar archive for these helper files in Hortonworks HDP documentation; see Hortonworks Data Platform: Installing HDP Manually, section Download Companion Files. 6 We used version of the helper files. Download the helper files from Hortonworks. wget hdp_manual_install_rpm_helper_files tar.gz 2. Extract the files. tar zxf hdp_manual_install_rpm_helper_files tar.gz 3. On each server, delete the contents of the Hadoop core configuration directory. rm /etc/hadoop/conf/* 4. On one server, copy the contents of the core Hadoop configuration files to /etc/hadoop/conf. cd hdp_manual_install_rpm_helper_files /configuration_files/core_hadoop cp * /etc/hadoop/conf cd /etc/hadoop/conf 5. In the file core-site.xml, replace every occurrence of the token TODO-NAMENODE-HOSTNAME:PORT with master1: docs.hortonworks.com/hdpdocuments/hdp2/hdp /bk_installing_manually_book/content/rpm-chap1-9.html A Principled Technologies test report 14
15 6. In the file hdfs-site.xml, perform this sequence of changes: a. Replace every occurrence of the token TODO-NAMENODE-HOSTNAME with master1. b. Replace every occurrence of the token TODO-NAMENODE- SECONDARYNAMENODE with master2. c. Replace every occurrence of the token TODO-DFS-DATA-DIR with file:///grid01/hadoop/hdfs/dn,file:///grid02/hadoop/hdfs/dn,file:///grid03/hadoo p/hdfs/dn,file:///grid04/hadoop/hdfs/dn,file:///grid05/hadoop/hdfs/dn,file:///gr id06/hadoop/hdfs/dn d. Replace every occurrence of the token TODO-DFS-NAME-DIR with grid01/hadoop/hdfs/nn,/grid02/hadoop/hdfs/nn. e. Replace every occurrence of the token TODO-DFS-NAME-DIR with /grid01/hadoop/hdfs/nn,/grid02/hadoop/hdfs/nn 7. In the file yarn-site.xml, replace every occurrence of the token TODO-RMNODE-HOSTNAME with master2. 8. In the file mapred-site.xml, replace every occurrence of the token TODO-JOBHISTORYNODE-HOSTNAME with master2. 9. In the files container-executor.cfg and yarn-site.xml, replace every occurrence of the token TODO- YARN-LOCAL-DIR with /grid01/hadoop/yarn/local,/grid02/hadoop/yarn/local al. 10. In the file container-executor.cfg, replace every occurrence of the token TODO-YARN-LOG-DIR with /grid01/hadoop/yarn/logs,/grid02/hadoop/yarn/logs. 11. In the file yarn-site.xml, replace every occurrence of the token TODO-YARN-LOCAL-LOG-DIR with /grid01/hadoop/yarn/logs,/grid02/hadoop/yarn/logs. 12. Copy the updated Hadoop configuration (HDFS, YARN, and MapReduce) to the remaining servers with scp. For example, for a server named NODE. scp /etc/hadoop/conf/* NODE:/etc/hadoop/conf/ Starting Hadoop for the first time Initializing HDFS 1. Log onto the namenode master as user hdfs. 2. Format HDFS. hdfs namenode -format 3. Start HDFS by running the commands in steps 1 through 4 in the Starting HDFS section. Initializing YARN and the Job History service 1. With HDFS running, create and configure directories on HDFS for the job-history service. 2. Log onto the second master server as user hdfs, and run the following commands. hdfs dfs -mkdir p /mr-history/tmp hdfs dfs -mkdir -p /mr-history/done hdfs dfs -mkdir -p /app-logs hdfs dfs -chmod -R 1777 /mr-history/tmp hdfs dfs -chmod -R 1777 /mr-history/done hdfs dfs -chmod -R 1777 /app-logs hdfs dfs -chown -R mapred:hdfs /mr-history hdfs dfs -chown yarn /app-logs 3. Start YARN by running the commands in steps 1 through 3 of the Starting YARN section. A Principled Technologies test report 15
16 Starting Hadoop Hadoop is started by first starting HDFS, pausing for 30 seconds, and then starting YARN. See the next two sections for instructions for starting HDFS and YARN. Starting HDFS 1. Log onto each of the servers as user hdfs. 2. Run the following command on the namenode master. /usr/lib/hadoop/sbin/hadoop-daemon.sh start namenode 3. Run the following command on the secondary-namenode master. /usr/lib/hadoop/sbin/hadoop-daemon.sh start secondarynamenode 4. Run the following command on each slave server. /usr/lib/hadoop/sbin/hadoop-daemon.sh start datanode Starting YARN HDFS must be running before starting YARN. See the previous section for instructions for starting HDFS. 1. Log onto the second master server as user yarn and run the following commands on the secondary master server. export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh start resourcemanager 2. Log off the second master server and login again as user mapred to run the following commands. export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec /usr/lib/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh start historyserver 3. Log onto the slave servers as user yarn in order to start the local YARN resource node-managers, and run the following commands on each. export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh start nodemanager"' Stopping Hadoop Hadoop is stopped by stopping YARN and then stopping HDFS. See the next two sections for instructions for stopping HDFS and YARN. Stopping YARN 1. Log onto the slave servers as user yarn and run the following commands on each. export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh stop nodemanager 2. Log onto the second master server as user mapred to run the following commands. export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec /usr/lib/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh stop historyserver 3. Log off the second master server and login again as user yarn to run the following commands. export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh stop resourcemanager Stopping HDFS 1. Log onto the servers as user hdfs. 2. On the slave servers, run the following command. usr/lib/hadoop/sbin/hadoop-daemon.sh stop datanode 3. On the secondary-namenode master server, run the following command. /usr/lib/hadoop/sbin/hadoop-daemon.sh stop secondarynamenode 4. On the namenode master server, run the following command. /usr/lib/hadoop/sbin/hadoop-daemon.sh stop namenode A Principled Technologies test report 16
17 Installing HiBench 1. Download the HiBench Suite from github.com/intel-hadoop/hibench/tree/yarn. 2. Copy the HiBench ZIP archive to one of the master nodes. 3. On the master node, unzip the HiBench archive under /opt. cd /opt unzip /tmp/hibench-yarn.zip 4. Modify the default HiBench configuration for HDP 2 in the file /opt/hibench-yarn/bin/hibench-config.sh : change HADOOP_CONF_DIR= to HADOOP_CONF_DIR=/etc/hadoop/conf, and change HADOOP_EXAMPLES_JAR= to HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples jar. 5. Change the ownership of the HiBench files to user hdfs. chown R hdfs /opt/hibench-yarn Running HiBench Sort, TeraSort, Bayes and K-means tests The HiBench tests are performed as user hdfs from the namenode. The default dataset sizes were modified to use more of the cluster's resources. The changes to the configurations are as follows: 1. For the Sort test, the dataset size is increased by a factor of 10 to 24,000,000,000 bytes: In the file /opt/hibench-yarn/sort/conf/configure.sh, change the line DATASIZE= to DATASIZE= For the Bayes test, the number of document pages in the dataset is increased by a factor of 2.5 to 100,000: : In the file /opt/hibench-yarn/bayes/conf/configure.sh, change the line PAGES=40000 to PAGES= For the TeraSort test, the number of document pages in the dataset is increased by a factor of 2.5 to 100,000: : In the file /opt/hibench-yarn/terasort/conf/configure.sh, change the line DATASIZE= to DATASIZE= , and change the line NUM_MAPS=96 to NUM_MAPS= For the K-means test, modify the file /opt/hibench-yarn/kmenas/conf/configure.sh as follows: change the line NUM_OF_CLUSTERS=5 to NUM_OF_CLUSTERS=32, the line NUM_OF_SAMPLES= to NUM_OF_SAMPLES= , the line SAMPLES_PER_INPUTFILE= to SAMPLES_PER_INPUTFILE= , the line DIMENSIONS=20 to DIMENSIONS=4. Changing the Java platform from OpenJDK to Oracle JDK The Hadoop configuration used the OpenJDK and can be used directly for performance test. To run performance tests with the Oracle JDK, we removed OpenJDK, installed the Oracle JDK, and rebuilt HDFS. 1. On all servers, stop YARN by following the instruction in section Stopping YARN. 2. On all servers, stop HDFS by following the instructions in section Stopping HDFS. 3. On all servers, remove OpenJDK. yum remove yum install java openjdk java openjdk-devel rm /usr/java/default 4. On all servers, install Oracle JDK b51 from the Red Hat Supplementary repository. yum install java oracle jpp.1.el6_5.x86_64\ java oracle-devel jpp.1.el6_5.x86_64 ln -s /usr/lib/jvm/java oracle x86_64 /usr/java/default 5. Destroy and recreate the Hadoop directories and log files by performing steps 1 through 6 in section Server modifications for HDFS, YARN, and MapReduce. 6. Reformat HDFS and YARN by following the instructions in sections Initializing HDFS and Initializing YARN and A Principled Technologies test report 17
18 the Job History service. 7. Start YARN following the instructions in section Starting YARN. Running one HiBench test The following bash script, run-hibench.sh, was used to perform complete runs for the HiBench tests. In each case, the script performs the following steps: 1. Before the HiBench data is created, the page cache is cleared on the six servers by running an auxiliary script included at the end of this section. This script must be run as root. The sudo command was used to affect this. 2. The HiBench data is prepared. 3. The page cache is cleared again before the HiBench algorithm is started. 4. Pause for 120 seconds. 5. Run the HiBench algorithm. 6. Print the HiBench statistics. The bash script sort-run.sh for running the full Sort test follows. #!/bin/bash # run-hibench.sh ## run the HiBench Sort test, specified by the argument if [ "$#" -ne 1 ]; then echo "Usage: $0 <name of test>"; exit 1 fi TEST=$1 HIBENCH_DIR=/opt/HiBench-yarn echo "Run of HiBench test $TEST"; date; echo " clearing the cache" sudo sh /root/scripts/clear-cache.sh echo; echo "Pausing for 50 seconds"; sleep 50 echo; echo Starting data preparation...; date; echo $HIBENCH_DIR/$TEST/prepare.sh; echo; date; echo echo Data prep completed echo " clearing the cache before the main run" sudo sh /root/scripts/clear-cache.sh echo; echo "Pausing for 120 seconds"; sleep 120 ; echo; echo Starting data-transformation...; date; echo MAHOUT_HOME=/usr/lib/mahout \ $HIBENCH_DIR/$TEST/bin/run.sh; echo; date; echo echo Run completed; echo echo HiBench statistics head -n 1 $HIBENCH_DIR/hibench.report tail -n 1 $HIBENCH_DIR/hibench.report echo Done The bash script clear-cache.sh, used by sort-run.sh and bayes-run.sh, follows. #!/bin/bash # clear-cache.sh ## clear the page cache on the Hadoop nodes for i in master1 master2 data1 data2 data3 data4 ; do ssh $i 'sync; echo 3 > /proc/sys/vm/drop_caches ; sync' done A Principled Technologies test report 18
19 Detailed Hadoop results for OpenJDK and Oracle JDK on four HiBench workloads Figure 9 provides the detailed results we obtained for all runs of the HiBench benchmarks, on both OpenJDK and Oracle JVMs. For each run, we show the input size in KB, the duration of the run in seconds, and the total throughput per node in KB per second. These items show the problem size, its processing time and the corresponding average datavolume movement per node, respectively. For each benchmark and JDK permutation, we selected a median value across all runs. Those runs are highlighted in yellow. The variation in HiBench scores for the various tests was similar (about half of one percent) with the exception of the Sort test with Oracle JDK. We found the variation of Sort scores with Oracle JDK was about 10 percent over 12 runs as compared to 2 percent for Sort with OpenJDK. For this reason, and to ensure a more representative median run, we tested seven runs for Oracle JDK versus five runs for OpenJDK on the Sort test. Test Input size (KB) OpenJDK Duration (seconds) Throughput/node (KBps) Input size (KB) Oracle JDK Duration (seconds) Throughput/node (KBps) Sort Run 1 96,239, , ,239,145 1, ,079.3 Run 2 96,239, , ,239, ,719.5 Run 3 96,239,051 1, , ,239,188 1, ,223.2 Run 4 96,239, , ,239,121 1, ,334.4 Run 5 96,239,081 1, , ,239, ,004.0 Run 6 96,239,020 1, ,169.4 Run 7 96,239, ,877.0 TeraSort Run 1 195,312,500 2, , ,312,500 2, ,532.0 Run 2 195,312,500 2, , ,312,500 2, ,781.5 Run 3 195,312,500 2, , ,312,500 2, ,555.0 Bayes Run 1 437,075 1, , Run 2 437,075 1, , Run 3 437,075 1, , K-means Run 1 50,000,007 2, , ,000,007 2, ,335.4 Run 2 50,000,007 2, , ,000,007 2, ,298.7 Run 3 50,000,007 2, , ,000,007 2, ,321.9 Figure 9: Detailed Hibench results for Sort, TeraSort, Bayes Classification, and K-means Clustering. A Principled Technologies test report 19
20 ABOUT PRINCIPLED TECHNOLOGIES Principled Technologies, Inc Slater Road, Suite 300 Durham, NC, We provide industry-leading technology assessment and fact-based marketing services. We bring to every assignment extensive experience with and expertise in all aspects of technology testing and analysis, from researching new technologies, to developing new methodologies, to testing with existing and new tools. When the assessment is complete, we know how to present the results to a broad range of target audiences. We provide our clients with the materials they need, from market-focused data to use in their own collateral to custom sales aids, such as test reports, performance assessments, and white papers. Every document reflects the results of our trusted independent analysis. We provide customized services that focus on our clients individual requirements. Whether the technology involves hardware, software, Web sites, or services, we offer the experience, expertise, and tools to help our clients assess how it will fare against its competition, its performance, its market readiness, and its quality and reliability. Our founders, Mark L. Van Name and Bill Catchings, have worked together in technology assessment for over 20 years. As journalists, they published over a thousand articles on a wide array of technology subjects. They created and led the Ziff-Davis Benchmark Operation, which developed such industry-standard benchmarks as Ziff Davis Media s Winstone and WebBench. They founded and led etesting Labs, and after the acquisition of that company by Lionbridge Technologies were the head and CTO of VeriTest. Principled Technologies is a registered trademark of Principled Technologies, Inc. All other product names are the trademarks of their respective owners. Disclaimer of Warranties; Limitation of Liability: PRINCIPLED TECHNOLOGIES, INC. HAS MADE REASONABLE EFFORTS TO ENSURE THE ACCURACY AND VALIDITY OF ITS TESTING, HOWEVER, PRINCIPLED TECHNOLOGIES, INC. SPECIFICALLY DISCLAIMS ANY WARRANTY, EXPRESSED OR IMPLIED, RELATING TO THE TEST RESULTS AND ANALYSIS, THEIR ACCURACY, COMPLETENESS OR QUALITY, INCLUDING ANY IMPLIED WARRANTY OF FITNESS FOR ANY PARTICULAR PURPOSE. ALL PERSONS OR ENTITIES RELYING ON THE RESULTS OF ANY TESTING DO SO AT THEIR OWN RISK, AND AGREE THAT PRINCIPLED TECHNOLOGIES, INC., ITS EMPLOYEES AND ITS SUBCONTRACTORS SHALL HAVE NO LIABILITY WHATSOEVER FROM ANY CLAIM OF LOSS OR DAMAGE ON ACCOUNT OF ANY ALLEGED ERROR OR DEFECT IN ANY TESTING PROCEDURE OR RESULT. IN NO EVENT SHALL PRINCIPLED TECHNOLOGIES, INC. BE LIABLE FOR INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH ITS TESTING, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IN NO EVENT SHALL PRINCIPLED TECHNOLOGIES, INC. S LIABILITY, INCLUDING FOR DIRECT DAMAGES, EXCEED THE AMOUNTS PAID IN CONNECTION WITH PRINCIPLED TECHNOLOGIES, INC. S TESTING. CUSTOMER S SOLE AND EXCLUSIVE REMEDIES ARE AS SET FORTH HEREIN. A Principled Technologies test report 20
How To Compare Two Servers For A Test On A Poweredge R710 And Poweredge G5P (Poweredge) (Power Edge) (Dell) Poweredge Poweredge And Powerpowerpoweredge (Powerpower) G5I (
TEST REPORT MARCH 2009 Server management solution comparison on Dell PowerEdge R710 and HP Executive summary Dell Inc. (Dell) commissioned Principled Technologies (PT) to compare server management solutions
RESOLVING SERVER PROBLEMS WITH DELL PROSUPPORT PLUS AND SUPPORTASSIST AUTOMATED MONITORING AND RESPONSE
RESOLVING SERVER PROBLEMS WITH DELL PROSUPPORT PLUS AND SUPPORTASSIST AUTOMATED MONITORING AND RESPONSE Sometimes, a power fluctuation can damage a memory module, or a hard drive can fail, threatening
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
TEST REPORT Dell PERC H700 average percentage win in IOPS over FEBRUARY 2006 Dell PERC 6/i across RAID 5 and RAID 10. Internal HDD tests
Dell 6Gbps vs. 3Gbps RAID controller performance comparison Test report commissioned by Dell Inc. January 2010 Executive summary We compared the performance of the 6Gbps Dell PowerEdge RAID Controller
Exploring the next generation of Big Data solutions with Hadoop 2
Exploring the next generation of Big Data solutions with Hadoop 2 Deploying Hortonworks Data Platform 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Version 1.2 April 2014 100 East Davie Street Raleigh
VIRTUALIZATION-MANAGEMENT COMPARISON: DELL FOGLIGHT FOR VIRTUALIZATION VS. SOLARWINDS VIRTUALIZATION MANAGER
VIRTUALIZATION-MANAGEMENT COMPARISON: DELL FOGLIGHT FOR VIRTUALIZATION VS. SOLARWINDS VIRTUALIZATION MANAGER Your company uses virtualization to maximize hardware efficiency in your large datacenters.
Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters
CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family
MIGRATING MIDDLEWARE APPLICATIONS USING RED HAT ENTERPRISE VIRTUALIZATION
MIGRATING MIDDLEWARE APPLICATIONS USING RED HAT ENTERPRISE VIRTUALIZATION Consolidating older servers to newer Intel Xeon processor-based Dell platforms that employ power-saving capabilities and faster
Figure 1A: Dell server and accessories Figure 1B: HP server and accessories Figure 1C: IBM server and accessories
TEST REPORT SEPTEMBER 2007 Out-of-box comparison between Dell, HP, and IBM servers Executive summary Dell Inc. (Dell) commissioned Principled Technologies (PT) to compare the out-of-box experience of a
I/O PERFORMANCE OF THE LENOVO STORAGE N4610
I/O PERFORMANCE OF THE LENOVO STORAGE N4610 Strong storage performance is vital to getting the most from your business s server infrastructure. Storage subsystems are often the slowest segment of your
A PRINCIPLED TECHNOLOGIES TEST REPORT SHAREPOINT PERFORMANCE: TREND MICRO PORTALPROTECT VS. MICROSOFT FOREFRONT JULY 2011
SHAREPOINT PERFORMANCE: TREND MICRO PORTALPROTECT VS. MICROSOFT FOREFRONT Busy organizations require two things from their IT infrastructure: great performance and reliable protection. Workers constantly
SERVER POWER CALCULATOR ANALYSIS: CISCO UCS POWER CALCULATOR AND HP POWER ADVISOR
SERVER POWER CALCULATOR ANALYSIS: CISCO UCS POWER CALCULATOR AND HP POWER ADVISOR OVERVIEW 20% 15% 10% Power estimation is an important part of data center planning. Historically, data center power circuits
How To Perform A File Server On A Poweredge R730Xd With Windows Storage Spaces
FILE SERVER PERFORMANCE ON THE INTEL PROCESSOR-POWERED DELL POWEREDGE R730XD WITH HYBRID STORAGE Storage constraints in your datacenter can put unnecessary limits on workload performance and affect user
CENTRALIZED SYSTEMS MANAGEMENT: DELL LIFECYCLE CONTROLLER INTEGRATION FOR SCCM VS. HP PROLIANT INTEGRATION KIT FOR SCCM
CENTRALIZED SYSTEMS MANAGEMENT: DELL LIFECYCLE CONTROLLER INTEGRATION FOR SCCM VS. HP PROLIANT INTEGRATION KIT FOR SCCM Efficient management of IT assets is critical for any business. One of the largest
LOAD BALANCING IN THE MODERN DATA CENTER WITH BARRACUDA LOAD BALANCER FDC T740
LOAD BALANCING IN THE MODERN DATA CENTER WITH BARRACUDA LOAD BALANCER FDC T740 Balancing Web traffic across your front-end servers is key to a successful enterprise Web presence. Web traffic is much like
I/O PERFORMANCE COMPARISON OF VMWARE VCLOUD HYBRID SERVICE AND AMAZON WEB SERVICES
I/O PERFORMANCE COMPARISON OF VMWARE VCLOUD HYBRID SERVICE AND AMAZON WEB SERVICES Businesses are rapidly transitioning to the public cloud to take advantage of on-demand resources and potential cost savings.
Out-of-box comparison between Dell and HP blade servers
Out-of-box comparison between and blade servers TEST REPORT JUNE 2007 Executive summary Inc. () commissioned Principled Technologies (PT) to compare the out-of-box experience of a PowerEdge 1955 Blade
Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.
EDUREKA Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.0 Cluster edureka! 11/12/2013 A guide to Install and Configure
IMPROVE YOUR SUPPORT EXPERIENCE WITH DELL PREMIUM SUPPORT WITH SUPPORTASSIST TECHNOLOGY
IMPROVE YOUR SUPPORT EXPERIENCE WITH DELL PREMIUM SUPPORT WITH SUPPORTASSIST TECHNOLOGY For anyone who hates spending time on the phone with tech support when you have a problem with your laptop, tablet,
BETTER PUBLIC CLOUD PERFORMANCE WITH SOFTLAYER
BETTER PUBLIC CLOUD PERFORMANCE WITH SOFTLAYER The public cloud service provider you select to host your organization s applications can have a big impact on performance. Even when you choose similar resource
Out-of-box comparison between Dell, HP, and IBM blade servers
Out-of-box comparison between Dell, HP, and IBM blade servers TEST REPORT DECEMBER 2007 Executive summary Dell Inc. (Dell) commissioned Principled Technologies (PT) to compare the out-of-box experience
A QUICK AND EASY GUIDE TO SETTING UP THE DELL POWEREDGE C8000
A QUICK AND EASY GUIDE TO SETTING UP THE DELL POWEREDGE C8000 A Principled Technologies setup guide commissioned by Dell Inc. TABLE OF CONTENTS Table of contents... 2 Introduction... 3 Dell 42U rack...3
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
TEST REPORT OCTOBER 2009 SMB Workload performance testing: PowerEdge T110 vs. an older small business desktop running typical small
TEST REPORT OCTOBER 2009 SMB Workload performance testing: PowerEdge T110 vs. an older small business desktop running typical small business server and client applications Executive summary Dell, Inc.
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
WEB SERVER PERFORMANCE: DELL SERVERS
WEB SERVER PERFORMANCE: DELL SERVERS OUR FINDINGS As companies sizes and functions vary, so do their Web server needs. One means of evaluating a Web server is recording the of requests per second it can
Summary. Key results at a glance:
An evaluation of blade server power efficiency for the, Dell PowerEdge M600, and IBM BladeCenter HS21 using the SPECjbb2005 Benchmark The HP Difference The ProLiant BL260c G5 is a new class of server blade
FASTER AND MORE EFFICIENT SYSTEM MANAGEMENT WITH LENOVO XCLARITY ADMINISTRATOR
sa FASTER AND MORE EFFICIENT SYSTEM MANAGEMENT WITH LENOVO XCLARITY ADMINISTRATOR Technology that can simplify management processes in the datacenter through automation, such as Lenovo XClarity Administrator,
CLOUD NETWORKING WITH CA 3TERA APPLOGIC
CLOUD NETWORKING WITH CA 3TERA APPLOGIC OVERVIEW Cloud computing, one of the most innovative recent information technology developments, benefits enterprises by simplifying the administration and optimal
CONSOLIDATING SQL SERVER 2000 ONTO DELL POWEREDGE R900 AND POWEREDGE R905 USING MICROSOFT S HYPER-V
A Principled Technologies report commissioned by Dell CONSOLIDATING SQL SERVER 2000 ONTO DELL POWEREDGE R900 AND POWEREDGE R905 USING MICROSOFT S HYPER-V Table of contents Table of contents... 2 Introduction...
VIRTUALIZED STORAGE IN CA APPLOGIC
VIRTUALIZED STORAGE IN CA APPLOGIC OVERVIEW Storage is a key infrastructure component for application deployment and performance. In a cloud environment, where agility, resiliency, and high availability
Adaptec: Snap Server NAS Performance Study
March 2006 www.veritest.com [email protected] Adaptec: Snap Server NAS Performance Study Test report prepared under contract from Adaptec, Inc. Executive summary Adaptec commissioned VeriTest, a service
Adaptec: Snap Server NAS Performance Study
October 2006 www.veritest.com [email protected] Adaptec: Snap Server NAS Performance Study Test report prepared under contract from Adaptec, Inc. Executive summary Adaptec commissioned VeriTest, a service
A Principled Technologies white paper commissioned by Dell Inc.
A Principled Technologies white paper commissioned by Dell Inc. TABLE OF CONTENTS Table of contents... 2 Summary... 3 Features of Simple Switch Mode... 3 Sample scenarios... 5 Testing scenarios... 6 Scenario
MANAGING CLIENTS WITH DELL CLIENT INTEGRATION PACK 3.0 AND MICROSOFT SYSTEM CENTER CONFIGURATION MANAGER 2012
MANAGING CLIENTS WITH DELL CLIENT INTEGRATION PACK 3.0 AND MICROSOFT SYSTEM CENTER CONFIGURATION MANAGER 2012 With so many workstations and notebooks assigned to employees for work, enterprises seek an
VERITAS Software - Storage Foundation for Windows Dynamic Multi-Pathing Performance Testing
January 2003 www.veritest.com [email protected] VERITAS Software - Storage Foundation for Windows Dynamic Multi-Pathing Performance Testing Test report prepared under contract from VERITAS Software Corporation
TOTAL COST COMPARISON SUMMARY: VMWARE VSPHERE VS. MICROSOFT HYPER-V
TOTAL COST COMPARISON SUMMARY: VMWARE VSPHERE VS. MICROSOFT HYPER-V Total cost of ownership (TCO) is the ultimate measure to compare IT infrastructure platforms, as it incorporates the purchase and support
A Principled Technologies deployment guide commissioned by Dell Inc.
A Principled Technologies deployment guide commissioned by Dell Inc. TABLE OF CONTENTS Table of contents... 2 Introduction... 3 About the components... 3 About the Dell PowerEdge VRTX...3 About the Dell
Digi International: Digi One IA RealPort Latency Performance Testing
August 2002 1001 Aviation Parkway, Suite 400 Morrisville, NC 27560 919-380-2800 Fax 919-380-2899 320 B Lakeside Drive Foster City, CA 94404 650-513-8000 Fax 650-513-8099 www.etestinglabs.com [email protected]
CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)
CactoScale Guide User Guide Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) Version History Version Date Change Author 0.1 12/10/2014 Initial version Athanasios Tsitsipas(UULM)
Hadoop Lab - Setting a 3 node Cluster. http://hadoop.apache.org/releases.html. Java - http://wiki.apache.org/hadoop/hadoopjavaversions
Hadoop Lab - Setting a 3 node Cluster Packages Hadoop Packages can be downloaded from: http://hadoop.apache.org/releases.html Java - http://wiki.apache.org/hadoop/hadoopjavaversions Note: I have tested
DATABASE PERFORMANCE: DELL SERVERS
DATABASE PERFORMANCE: DELL SERVERS OUR FINDINGS As companies sizes and functions vary, so do their database server needs. One means of evaluating a database server is recording the of orders per minute
CPU PERFORMANCE COMPARISON OF TWO CLOUD SOLUTIONS: VMWARE VCLOUD HYBRID SERVICE AND MICROSOFT AZURE
CPU PERFORMANCE COMPARISON OF TWO CLOUD SOLUTIONS: VMWARE VCLOUD HYBRID SERVICE AND MICROSOFT AZURE Businesses are rapidly transitioning to the public cloud to take advantage of on-demand resources and
ALIENWARE X51 PERFORMANCE COMPARISON: SAMSUNG SSD VS. TRADITIONAL HARD DRIVE
ALIENWARE X51 PERFORMANCE COMPARISON: SAMSUNG SSD VS. TRADITIONAL HARD DRIVE Gamers, who demand high levels of performance from their computers, reap enormous benefits from the increased speed that solid-state
A PRINCIPLED TECHNOLOGIES TEST REPORT EXCHANGE PERFORMANCE: TREND MICRO SCANMAIL SUITE VS. MICROSOFT FOREFRONT SEPTEMBER 2011
EXCHANGE PERFORMANCE: TREND MICRO SCANMAIL SUITE VS. MICROSOFT FOREFRONT Busy organizations require two things from their IT infrastructure: great performance and reliable protection. Email is the backbone
Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers
Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers Enterprise Product Group (EPG) Dell White Paper By Todd Muirhead and Peter Lillian July 2004 Contents Executive Summary... 3 Introduction...
Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)
WHITE PAPER Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) July 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table of Contents
Comtrol Corporation: Latency Performance Testing of Network Device Servers
April 2003 1001 Aviation Parkway, Suite 400 Morrisville, NC 27560 919-380-2800 Fax 919-380-2899 320 B Lakeside Drive Foster City, CA 94404 650-513-8000 Fax 650-513-8099 www.veritest.com [email protected]
HSearch Installation
To configure HSearch you need to install Hadoop, Hbase, Zookeeper, HSearch and Tomcat. 1. Add the machines ip address in the /etc/hosts to access all the servers using name as shown below. 2. Allow all
CONSOLIDATING SQL SERVER 2005 ONTO DELL POWEREDGE R900 AND POWEREDGE R905 USING MICROSOFT S HYPER-V
A Principled Technologies report commissioned by Dell CONSOLIDATING SQL SERVER 2005 ONTO DELL POWEREDGE R900 AND POWEREDGE R905 USING MICROSOFT S HYPER-V Table of contents Table of contents... 2 Introduction...
JUNE 2009 File server performance comparison of three Dell PowerVault NX3000 configurations using Microsoft FSCT
TEST REPORT JUNE 2009 File server performance comparison of three Dell PowerVault NX3000 Executive summary Dell Inc. (Dell) commissioned Principled Technologies (PT) to compare the performance of three
Deploying Hadoop on SUSE Linux Enterprise Server
White Paper Big Data Deploying Hadoop on SUSE Linux Enterprise Server Big Data Best Practices Table of Contents page Big Data Best Practices.... 2 Executive Summary.... 2 Scope of This Document.... 2
DATABASE PERFORMANCE WITH ENTERPRISE-CLASS KINGSTON SSDS
DATABASE PERFORMANCE WITH ENTERPRISE-CLASS KINGSTON SSDS Processing power isn t the only thing that drives database performance. High performing server storage, such as enterprise-class solid-state drives
LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011
LOCKSS on LINUX Installation Manual and the OpenBSD Transition 02/17/2011 1 Table of Contents Overview... 3 LOCKSS Hardware... 5 Installation Checklist... 7 BIOS Settings... 10 Installation... 11 Firewall
Microsoft Windows Server 2003 vs. Linux Competitive File Server Performance Comparison
April 2003 1001 Aviation Parkway, Suite 400 Morrisville, NC 27560 919-380-2800 Fax 919-380-2899 320 B Lakeside Drive Foster City, CA 94404 6-513-8000 Fax 6-513-8099 www.veritest.com [email protected] Microsoft
CDH 5 Quick Start Guide
CDH 5 Quick Start Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
Experiences with Lustre* and Hadoop*
Experiences with Lustre* and Hadoop* Gabriele Paciucci (Intel) June, 2014 Intel * Some Con fidential name Do Not Forward and brands may be claimed as the property of others. Agenda Overview Intel Enterprise
Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice.
Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice. Before installing and using the software, please review the readme files,
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03
Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide Rev: A03 Use of Open Source This product may be distributed with open source code, licensed to you in accordance with the applicable open source
Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide
Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide July 2010 1 Specifications are subject to change without notice. The Cloud.com logo, Cloud.com, Hypervisor Attached Storage, HAS, Hypervisor
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
Syncplicity On-Premise Storage Connector
Syncplicity On-Premise Storage Connector Implementation Guide Abstract This document explains how to install and configure the Syncplicity On-Premise Storage Connector. In addition, it also describes how
Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure
Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure The Intel Distribution for Apache Hadoop* software running on 808 VMs using VMware vsphere Big Data Extensions and Dell
INTEL XEON PROCESSOR E5-2690: ENTERPRISE WORKLOAD PERFORMANCE WHILE RUNNING STORAGE EFFICIENCY TASKS
INTEL XEON PROCESSOR E5-2690: ENTERPRISE WORKLOAD PERFORMANCE WHILE RUNNING STORAGE EFFICIENCY TASKS High-end storage appliances require powerful processors that not only deliver optimal workload performance,
LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013
LOCKSS on LINUX CentOS6 Installation Manual 08/22/2013 1 Table of Contents Overview... 3 LOCKSS Hardware... 5 Installation Checklist... 6 BIOS Settings... 9 Installation... 10 Firewall Configuration...
Plexxi Control Installation Guide Release 2.1.0
Plexxi Control Installation Guide Release 2.1.0 702-20002-10 Rev 1.2 February 19, 2015 100 Innovative Way - Suite 3322 Nashua, NH 03062 Tel. +1.888.630.PLEX (7539) www.plexxi.com Notices The information
Hadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted
MICROSOFT SHAREPOINT CONSOLIDATION AND TCO: DELL POWEREDGE R720 VS. HP PROLIANT DL380 G7
MICROSOFT SHAREPOINT CONSOLIDATION AND TCO: DELL POWEREDGE R720 VS. HP PROLIANT DL380 G7 Your existing SharePoint servers are doing their jobs, but your company plans to grow. Consolidating these servers
SERVER PERFORMANCE WITH FULL DISK ENCRYPTION: THE DELL-AMD ADVANTAGE
SERVER PERFORMANCE WITH FULL DISK ENCRYPTION: THE DELL-AMD ADVANTAGE Protecting your company s data is a key part of maintaining your corporate reputation. It is a given that you want the strongest security
CONSOLIDATING SERVERS WITH THE DELL POWEREDGE R720 RUNNING MICROSOFT WINDOWS SERVER 2012 AND MICROSOFT SQL SERVER 2012
CONSOLIDATING SERVERS WITH THE DELL POWEREDGE R720 RUNNING MICROSOFT WINDOWS SERVER 2012 AND MICROSOFT SQL SERVER 2012 TCO REPORT MIGRATION GUIDE If you re still using years-old servers running outdated
TimeIPS Server. IPS256T Virtual Machine. Installation Guide
TimeIPS Server IPS256T Virtual Machine Installation Guide TimeIPS License Notification The terms and conditions applicable to the license of the TimeIPS software, sale of TimeIPS hardware and the provision
Hadoop Installation. Sandeep Prasad
Hadoop Installation Sandeep Prasad 1 Introduction Hadoop is a system to manage large quantity of data. For this report hadoop- 1.0.3 (Released, May 2012) is used and tested on Ubuntu-12.04. The system
LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment
LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment Evaluation report prepared under contract with LSI Corporation Introduction Interest in solid-state storage (SSS) is high, and
FAULT TOLERANCE PERFORMANCE AND SCALABILITY COMPARISON: NEC HARDWARE-BASED FT VS. SOFTWARE-BASED FT
FAULT TOLERANCE PERFORMANCE AND SCALABILITY COMPARISON: NEC HARDWARE-BASED FT VS. SOFTWARE-BASED FT Because no enterprise can afford downtime or data loss when a component of one of their servers fails,
Pivotal HD Enterprise
PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1 Stack and Tool Reference Guide Rev: A01 2013 GoPivotal, Inc. Table of Contents 1 Pivotal HD 1.1 Stack - RPM Package 11 1.1 Overview 11 1.2 Accessing
SYMANTEC NETBACKUP 7.6 BENCHMARK COMPARISON: DATA PROTECTION IN A LARGE-SCALE VIRTUAL ENVIRONMENT (PART 1)
SYMANTEC NETBACKUP 7.6 BENCHMARK COMPARISON: DATA PROTECTION IN A LARGE-SCALE VIRTUAL ENVIRONMENT (PART 1) Virtualization technology is changing the way data centers work. Technologies such as VMware vsphere
Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage
Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage Release: August 2011 Copyright Copyright 2011 Gluster, Inc. This is a preliminary document and may be changed substantially prior to final commercial
Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER
Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER Table of Contents Apache Hadoop Deployment on VMware vsphere Using vsphere Big Data Extensions.... 3 Local
Deploying IBM Lotus Domino on Red Hat Enterprise Linux 5. Version 1.0
Deploying IBM Lotus Domino on Red Hat Enterprise Linux 5 Version 1.0 November 2008 Deploying IBM Lotus Domino on Red Hat Enterprise Linux 5 1801 Varsity Drive Raleigh NC 27606-2072 USA Phone: +1 919 754
VERITAS NETBACKUP 7.6 BENCHMARK COMPARISON: DATA PROTECTION IN A LARGE-SCALE VIRTUAL ENVIRONMENT (PART 3)
VERITAS NETBACKUP 7.6 BENCHMARK COMPARISON: DATA PROTECTION IN A LARGE-SCALE VIRTUAL ENVIRONMENT (PART 3) Virtualization 1 technology is changing the way data centers work. Technologies such as VMware
TOTAL COST OF OWNERSHIP: DELL LATITUDE E6510 VS. APPLE 15-INCH MACBOOK PRO
TOTAL COST OF OWNERSHIP: DELL LATITUDE E6510 VS. APPLE 15-INCH MACBOOK PRO A TCO comparison commissioned by Dell Inc. Table of contents Table of contents... 2 Introduction... 3 Scope of this analysis...
NetIQ Sentinel 7.0.1 Quick Start Guide
NetIQ Sentinel 7.0.1 Quick Start Guide April 2012 Getting Started Use the following information to get Sentinel installed and running quickly. Meeting System Requirements on page 1 Installing Sentinel
PARALLELS SERVER BARE METAL 5.0 README
PARALLELS SERVER BARE METAL 5.0 README 1999-2011 Parallels Holdings, Ltd. and its affiliates. All rights reserved. This document provides the first-priority information on the Parallels Server Bare Metal
ThinkServer RD550 and RD650 Operating System Installation Guide
ThinkServer RD550 and RD650 Operating System Installation Guide Note: Before using this information and the product it supports, be sure to read and understand the Read Me First and Safety, Warranty, and
Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.
Big Data Computing Instructor: Prof. Irene Finocchi Master's Degree in Computer Science Academic Year 2013-2014, spring semester Installing Hadoop Emanuele Fusco ([email protected]) Prerequisites You
ThinkServer RD540 and RD640 Operating System Installation Guide
ThinkServer RD540 and RD640 Operating System Installation Guide Note: Before using this information and the product it supports, be sure to read and understand the Read Me First and Safety, Warranty, and
Deploy and Manage Hadoop with SUSE Manager. A Detailed Technical Guide. Guide. Technical Guide Management. www.suse.com
Deploy and Manage Hadoop with SUSE Manager A Detailed Technical Guide Guide Technical Guide Management Table of Contents page Executive Summary.... 2 Setup... 3 Networking... 4 Step 1 Configure SUSE Manager...6
RSA Security Analytics Virtual Appliance Setup Guide
RSA Security Analytics Virtual Appliance Setup Guide Copyright 2010-2015 RSA, the Security Division of EMC. All rights reserved. Trademarks RSA, the RSA Logo and EMC are either registered trademarks or
COMPLEXITY COMPARISON: CISCO UCS VS. HP VIRTUAL CONNECT
COMPLEXITY COMPARISON: CISCO UCS VS. HP VIRTUAL CONNECT Not all IT architectures are created equal. Whether you are updating your existing infrastructure or building from the ground up, choosing a solution
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
IMPROVING PERFORMANCE WITH IN-MEMORY SQL SERVER 2014 DATABASE FEATURES ON THE DELL POWEREDGE R930
IMPROVING PERFORMANCE WITH IN-MEMORY SQL SERVER 2014 DATABASE FEATURES ON THE DELL POWEREDGE R930 Your fastest path to a truly modern datacenter with maximized database performance is to upgrade to the
