Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Size: px
Start display at page:

Download "Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms"

Transcription

1 Intel Cloud Builders Guide Intel Xeon Processor-based Servers Apache* Hadoop* Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Apache* Hadoop* Intel Xeon Processor 5600 Series Audience and Purpose This reference architecture is for companies who are looking to build their own cloud computing infrastructure, including both enterprise IT organizations and cloud service providers or cloud hosting providers. The decision to use a cloud for the delivery of IT services is best done by starting with the knowledge and experience gained from previous work. This reference architecture gathers into one place the essentials of a Apache* Hadoop* cluster build out complete with benchmarking using TeraSort workload. This paper defines easy to use steps to replicate the deployment at your data center lab environment. The installation is based on Intel -powered servers and creates a multi node, optimized Hadoop environment. The reference architecture contains details on the Hadoop topology, hardware and software deployed, installation and configuration steps, and tests for real-world use cases that should significantly reduce the learning curve for building and operating your first Hadoop infrastructure. It is not expected that this paper can be used as-is. For example, adapting to an existing network and identifying specific management requirements are out of scope for this paper. Therefore, it is expected that the user of this paper will make significant adjustments as required to the design presented in order to meet their specific requirements of their own data center or lab environment. This paper also assumes that the reader has basic knowledge of computing infrastructure components and services. Intermediate knowledge of Linux* operating system, Python*, Hadoop framework and basic system administration skills is assumed. February 2012

2 Table of Contents Executive Summary... 3 Hadoop* Overview... 3 Hadoop System Architecture... 4 Operation of a Hadoop Cluster... 5 TeraSort Workload...7 TeraSort Workflow... 7 Test Methodology... 7 Intel Benchmark Install and Test Tool (Intel BITT)... 8 Intel BITT Benefits... 8 Configuring the Setups... 8 Running TeraSort Results...24 Conclusion

3 Executive Summary Map reduce technology is gaining popularity among enterprises for a variety of large-scale data intensive jobs. Map reduce based on Apache* Hadoop* is rapidly emerging as a technology preferred for big data processing and management. Enterprises are deploying commodity standard server clusters and using business intelligence tools along with Apache Hadoop to obtain high performing solutions for their large scale data processing requirements. Motivation to deploy Hadoop comes from the fact that enterprises are gathering huge unstructured data sets generated by their business processes, which enterprises are looking to exploit to get the most value out of this data to help them in the decision making process. Hadoop infrastructure moves data closer to compute to achieve high processing throughput. In this paper we tried to create a small commodity server cluster based on an Apache Hadoop distribution and ran sort benchmark to get data on how fast the cluster can process data. This reference architecture will give Figure 1: Hadoop* stack an understanding on how to set up the cluster, tune parameters, and run sort benchmark. This reference architecture provides a blue print for building a cluster with Intel Xeon processor based standard server platforms and the open source Apache Hadoop distribution. The paper further describes parameters for tuning and execution of sort benchmark to measure performance. Hadoop* Overview Apache Hadoop is a framework for running applications on large cluster built using standard hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System (HDFS) are designed so that node failures are automatically tolerated by the framework. Hadoop framework consists of three major components: Common: Hadoop Common is a set of utilities that support the Hadoop subprojects. Hadoop Common includes FileSystem, RPC, and serialization libraries. HDFS: The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on lowcost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS can stream file system data. MapReduce: MapReduce was first developed by Google to process large datasets. MapReduce has two functions, map and reduce, and a framework for running a large number of instances of these programs on commodity hardware. The map function reads a set of records from an input file, processes these records, and outputs a set of intermediate records. As part of the map function, a split function distributes the intermediate records across many buckets using a hash function. The reduce function then processes the intermediate records. The MapReduce Framework consists of a single master JobTracker and one slave TaskTracker per cluster node. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them, and re-executing the failed tasks. The slaves execute the tasks as directed by the master. 3

4 Hadoop* System Architecture Hadoop framework works on the principle of "moving compute closer to the data." Figure 2 shows typical deployment of Hadoop framework on multiple standard server nodes. The computation occurs on the same node where data resides, which enables Hadoop to deliver better performance compared to storing data on the network. A combination of standard server platforms and Hadoop infrastructure provide a cost efficient and high performance platform for dataparallel applications. Each Hadoop cluster has one Master Node and multiple slave nodes. The Master node runs NameNode and JobTracker functions, coordinating with slave nodes to get the job fed to the cluster completed. The SlaveNodes run TaskTracker, HDFS to store the data, and have Map and Reduce functions which perform the data computations. Figure 2: Hadoop* deployment on standard server nodes 4

5 Operation of a Hadoop* Cluster Figure 3 shows the operation of a Hadoop cluster. The client submits the job to the Master node which acts as an orchestrator with the Slave nodes to complete the job. The JobTracker on the Master node is responsible for controlling the MapReduce job. The slaves run TaskTracker which keeps track of the MapReduce job, reporting the job status to the JobTracker on frequent intervals. In an event of a task failure, the JobTracker reschedules the task on the same slave node or a different slave node. HDFS is a location aware or rack aware file system which primarily manages data in a Hadoop cluster. HDFS replicates the data on various nodes in the cluster to attain data reliability; however, HDFS has a single point of failure in NameNode function. If the NameNode fails the file system and data become inaccessible. Since the JobTracker assigns the data to slave nodes, JobTracker is aware of the data location and efficiently schedules the task where the data is residing, thus decreasing the need to move data from one node to other and saving network bandwidth. Once the map function is complete, the data is transferred to different node to perform reduce function. MapReduce framework provides an efficient way to scale the size of the cluster by adopting modular scaleout strategy. The nodes are scaled out by adding one or more nodes with HDFS and MapReduce functions supporting new nodes as they are added. Figure 3: Operation of Hadoop* cluster 5

6 Cluster hardware setup: Total 17 nodes in the cluster. One Master node and 16 Slave nodes. Data Network: Arista 7124 switch connected to Intel Ethernet Server Adapter X520-DA2 dual 10GbE NIC on every node. Each server has an internal private Intel dual 1GbE NIC connected to a top-of-rack switch that is used for management tasks. Each node has a disk enclosure populated with SATA II 7.2K, 2TB hard disk drives for a total of 24TBs of raw storage per hard disk enclosure. Dual socket Intel 5520 Chipset platform. Two Intel Xeon processor X5680 at 3.33GHz, 12MB cache. 48GB 1333MHz DDR3 memory Red Hat Enterprise Linux* 6.0 (RHEL 6.0)(Kernel: el6..x86_64) Hadoop* Framework v Figure 4: Cluster hardware setup 6

7 TeraSort Workload TeraSort is a popular Hadoop benchmarking workload. The 1TB limit is not a hard-set limit since TeraSort allows the user to sort any size of dataset by changing various parameters. TeraSort benchmark tests HDFS and MapReduce functions in the Hadoop cluster. TeraSort is part of the Hadoop framework and is part of the standard Apache Hadoop installation package. TeraSort is widely used to benchmark and tune large Hadoop clusters with hundreds of nodes. TeraSort works in two steps: TeraGen: This generates random data based on the dataset size set by the user. This dataset is used as input data for the sort benchmark. TeraSort: TeraSort sorts the input data generated by TeraGen and stores the output data on HDFS. An optional third step, called TeraValidate, allows validation of the sorted data. This paper does not discuss this optional third step. TeraSort Workflow Figure 5 shows the workflow of the TeraSort workload tested on our cluster. The flow chart depicts the start of the workload at one control node with one master node kick starting the job and 16 slave nodes dividing 8192 map tasks. Once the map phase is complete, the cluster starts the reduce phase with 243 tasks. When the reduce phase is completed, the data output is stored on the file system. Test Methodology To run the workload we used an Intel Benchmark Install and Test Tool (Intel BITT. The workload was scripted to kickstart the job on the cluster, run TeraGen to generate the test data, and run the TeraSort task to sort the generated data. The scrip also kicks off a series of counters on the slave nodes to gather performance metrics on each of the nodes. Key hardware metrics such as processor utilization, network bandwidth consumption, memory utilization, and disk bandwidth consumption is captured on each node at 30 second intervals. Once the job is complete, the counters are stopped on all slave nodes and the log files containing performance data are copied to the master node for calculating utilization of the cluster. This data is plotted into graphs using gnuplot and presented for further analysis. Also we noted the time taken to complete the job taken from the Hadoop management user interface. The lower the time measurement the better the performance. Figure 5: TeraSort workflow 7

8 Intel Benchmark Install and Test Tool Intel Benchmark Install and Test Tool (Intel BITT) provides tools to install, configure, run, and analyze benchmark programs on small test clusters. The installcli tool is used to install tar files on a cluster. moncli is used to monitor performance of the cluster nodes and provides options to start monitoring, stop monitoring, and generate CPU, disk I/O, memory, and network performance plots for the nodes and cluster. hadoopcli provides an automated Hadoop test environment. The Intel BITT templates enable configurable plot generation. Intel BITT command scripts enable configurable scripts to control monitoring actions. Benchmark configuration is implemented by using XML files. Configurable properties include the location of installation, monitoring directories, monitoring sampling duration, the list of the cluster nodes, and the list of the tar files that need to be installed. Intel BITT is implemented by using Python* and uses gnuplot to generate performance plots. Intel BITT currently runs on Linux*. Intel BITT Features Intel Benchmark Install and Test Tool provides the following tools: installcli: Used to install a specified list of tar files to a specified list of nodes moncli: Used to monitor performance metrics locally and/or remotely. It can be used to monitor the performance of a cluster. The tool currently supports sar and iostat monitoring tools. hadoopcli: Used to install, configure, and test Hadoop clusters. Intel BITT is implemented in an object oriented fashion. It can be extended to support other performance monitoring tools such as vmstat and mpstat if it is needed. The toolkit includes the following building blocks: XML parser: Parses the XML properties including name, value, and description fields. The install and monitor configuration is defined by using XML properties. Tool specific options are passed through command line options. Log file parser: Log files in the form of tables which contains rows and columns are parsed and CSV files are generated for each column. The column items on each row are separated using whitespace. The column header names are used to create CSV file names. Plot generator: gnuplot is used to plot the contents of the CSV files by using templates. The templates define the list of CSV files that are used as inputs to generate the plots. The templates also define labels and titles of the plots. Sar monitoring tool Iostat monitoring tool VTuneTM monitoring tool Emon monitoring tool installcli is used to install Intel BITT moncli is used to monitor local or cluster nodes hadoopcli is implemented by using the building blocks defined above and it is used to create and test Hadoop clusters Configuring the Setup We installed RHEL 6.0 on all 17 nodes with the default configuration and configured passphraseless SSH access between the nodes to enable them to communicate without having to login with a password every time there is a transaction between them. 1. Install Intel BITT tar file Cd mkdir bitt cp bitt-1.0.tar bitt cd bitt/bitt-2.0 The following is the list of subdirectories under Intel BITT home: cmd conf samples scripts templates 8

9 2. Create a release directory under Intel BITT home to copy tar files. mkdir p bitt/bitt-1.0/release cp bitt-1.0.tar bitt/bitt-1.0/release You can also download and copy the Hadoop tar file to the release directory as well if you are planning to test Hadoop. cp hadoop tar.gz ~/bitt/bitt-1.0/release 3. Download jdk and create a tar file from the installed jdk tar. For example: mkdir jdk cp jdk-6u23-linux-x64.bin jdk cd jdk chmod +x jdk-6u23-linux-x64.bin./jdk-6u23-linux-x64.bin rm jdk-6u23-linux-x64.bin tar -cvf ~/bitt/bitt-1.0/release/jdk1.6.0_23.tar 4. Download gnuplot and create a tar file from the installed gnuplot tree. For example: mkdir myinstall cp gnuplot rc1.tar myinstall cd myinstall/ tar -xvf gnuplot rc1.tar mkdir p install/ gnuplot cd gnuplot rc1./configure --prefix=/home/<user>/myinstall/install/ gnuplot make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/gnuplot tar. 5. Download Python and create a tar file from the installed python tar for your platform. For example: mkdir myinstall cp Python tgz myinstall cd myinstall/ tar -xvf Python tgz mkdir p install/ Python cd Python /configure --prefix=/home/<user>/myinstall/install/ Python make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/ Python tar. 9

10 6. Run TeraSort. For example: Run terasort.sh. You need to update the corresponding configuration files as described below. cd ~/bitt/bitt-1.0/conf install gnuplot on your client system install python on your client system Make sure python3 and gnuplot are on your path on the client system cd ~/bitt/bitt-1.0/scripts./terasort.sh 7. Configuration file edits. All configuration files are found under ~/bitt/bitt-1.0/conf a. hadoopnodelist: Configuration file which contains cluster nodes. Any addition or removal of nodes from the cluster should register here to be recognized by the load generator tool. node1.domain.com node2.domain.com node3.domain.com node4.domain.com.. node17.domain.com b. hadooptarlist: Configuration file where the executable are installed.../release/bitt-1.0.tar.gz../release/python-3.2.tar.gz../release/jdk1.6.0_25.tar.gz../release/hadoop tar.gz../release/gnuplot tar.gz 10

11 c. hadoop-env.sh: Main Hadoop environment configuration file. # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management. jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" 11

12 d. hadoopcloudconf.xml: Custom XML configuration file used to define key parameters on how the test is executed and where the data is stored. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>cloudtemplateloc</name> <value>/home/hadoop/bitt/bitt-1.0/conf</value> <description>cloud conf template file location</description> <name>cloudtemplatevars</name> <value>all</value> <description>the list of template variables to copy</description> <name>jobtrackerport</name> <value>8021</value> <description>jobtracker port</description> <name>namenodeport</name> <value>8020</value> <description>jobtracker port</description> <name>cloudconfdir</name> <value>/tmp/hadoopconf</value> <description>generated cloud conf file</description> <name>cloudtmpdir</name> <value>hadoop-${user.name}</value> <description>cloud tmp dir</description> <name>cloudinstalldir</name> <value>/usr/local/hadoop/install</value> <description>cloud install dir</description> <name>cloudnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopnodelist</value> <description>cluster nodes</description> <name>monnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopmonnodelist</value> 12

13 <description>cluster monitor nodes</description> <name>cloudtarlist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadooptarlist</value> <description>cluster nodes</description> <name>moninterval</name> <value>30</value> <description>sampling duration</description> <name>moncount</name> <value>0</value> <description>number of samples</description> <name>monresults</name> <value>/tmp/monhadres</value> <description>cloud monitor log files location</description> <name>monsummary</name> <value>/tmp/monhadsum</value> <description>cloud monitor log files location</description> <name>mondir</name> <value>/tmp/monhadloc</value> <description>cloud monitor log files location</description> <name>gnucmd</name> <value>/usr/local/hadoop/install/gnuplot-4.4.3/bin/gnuplot</value> <description>none</description> </configuration> 13

14 e. hdfs-site-template.xml: Hadoop configuration file where HDFS parameters are set. Please note the optimizations values we used to run the test are shown in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>dfs.replication</name> <value>3</value> <description>default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> <name>dfs.datanode.max.xcievers</name> <value>655360</value> <description>number of files Hadoop serves at one time</description> <name>dfs.data.dir</name> <value>/mnt/disk1/hdfs/data,/mnt/disk2/hdfs/data,/mnt/disk3/hdfs/data,/mnt/disk4/hdfs/data,/mnt/disk5/hdfs/data,/mnt/ disk6/hdfs/data,/mnt/disk7/hdfs/data,/mnt/disk8/hdfs/data,/mnt/disk9/hdfs/data,/mnt/disk10/hdfs/data,/mnt/disk11/hdfs/ data,/mnt/disk12/hdfs/data</value> <description>determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> <name>dfs.block.size</name> <value> </value> <description>the default block size for new files.</description> <name>io.file.buffer.size</name> <value>131072</value> <description> </description> 14

15 <name>ipc.server.tcpnodelay</name> <value>true</value> <description> </description> <name>ipc.client.tcpnodelay</name> <value>true</value> <description> </description> <name>dfs.namenode.handler.count</name> <value>40</value> <description> </description> <name>io.sort.factor</name> <value>100</value> <description> </description> <name>io.sort.mb</name> <value>220</value> <description> </description> </configuration> 15

16 f. mapred-site-template.xml: Hadoop configuration file which defines key MapReduce parameters. Values used in our testing are highlighted in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>mapred.tasktracker.map.tasks.maximum</name> <value>24</value> <description>the maximum number of map tasks that will be run simultaneously by a task tracker. </description> <name>io.sort.record.percent</name> <value>0.3</value> <description>added as per ssg reco </description> <name>io.sort.spill.percent</name> <value>0.9</value> <description>addded as per ssg reco </description> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>12</value> <description>the maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> <name>mapred.reduce.tasks</name> <value>64</value> <description>the default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Assume 10 nodes, 10*2-2 </description> <name>mapred.local.dir</name> <value>/mnt/disk1/hdfs/mapred,/mnt/disk2/hdfs/mapred,/mnt/disk3/hdfs/ mapred,/mnt/disk4/hdfs/mapred,/mnt/disk5/hdfs/mapred,/mnt/disk6/hdfs/ mapred,/mnt/disk7/hdfs/mapred,/mnt/disk8/hdfs/mapred,/mnt/disk9/hdfs/ mapred,/mnt/disk10/hdfs/mapred,/mnt/disk11/hdfs/mapred,/mnt/disk12/hdfs/ 16

17 mapred</value> <description>the local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. </description> <name>mapred.child.java.opts</name> <value>-xmx2048m -Djava.net.preferIPv4Stack=true</value> <description>java opts for the task tracker child processes. The following symbol, if present, will be is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes. </description> <name>mapred.output.compress</name> <value>false</value> <description>should the job outputs be compressed? </description> <name>mapred.compress.map.output</name> <value>false</value> <description>should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. </description> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> 17

18 <name>mapred.map.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.reduce.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.job.reuse.jvm.num.tasks</name> <value>1</value> <description> </description> <name>mapred.reduce.parallel.copies</name> <value>20</value> <description> </description> <name>mapred.min.split.size</name> <value>65536</value> <description> </description> <name>mapred.reduce.copy.backoff</name> <value>5</value> <description> </description> <name>mapred.job.shuffle.merge.percent</name> <value>0.7</value> <description> </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> </description> <name>mapred.job.reduce.input.buffer.percent</name> <value>0.90</value> <description> </description> </configuration> 18

19 g. hadoop-terasort.xml: Intel BITT configuration file from which the parameters are read before the test runs. Parameters in this configuration file override values in the other configuration files mentioned above. This configuration file helps to quickly change the parameter values for different test runs without editing individual configuration files. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>mapred.map.tasks</name> <value>8192</value> <description> Total Map task number </description> <name>mapred.reduce.tasks</name> <value>243</value> <description> Total Reduce task number </description> <name>dfs.replication</name> <value>3</value> <description> Number of copies to replicate </description> <name>mapred.compress.map.output</name> <value>true</value> <description> compress map output </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> none </description> <name>datasetsizesmall</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> <name>datasetsize</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> 19

20 <name>datasetname</name> <value>tera</value> <description> none </description> <name>outputdataname</name> <value>tera-sort2</value> <description> none </description> <name>jarfile</name> <value>hadoop examples.jar</value> <description> none </description> </configuration> 20

21 Running TeraSort TeraSort can be started by running terasort.sh. The script runs various commands involved in starting the test, starting performance counters, ending the test, and gathering performance counter data for analysis. Below is the list of commands executed when the script is running, and a brief explanation on what the command does. #!/usr/bin/env bash ########################################################### #Intel Benchmark Install and Test Tool (BITT) Use Cases #Typical sequence for hadoop terasort benchmark: ########################################################### echo "START: terasort benchmark..." date # Stop any current running test on the cluster.../scripts/hadoopcli -a stop -c../conf/hadoopcloudconf.xml # Kill Java* processes on the nodes./runkill.sh # Install fresh copy of executables on the slave nodes.../scripts/hadoopcli -a install -c../conf/hadoopcloudconf.xml # Format the HDFS to store the data../scripts/hadoopcli -a format -c../conf/hadoopcloudconf.xml # Start Java processes on all slave nodes.../scripts/hadoopcli -a start -c../conf/hadoopcloudconf.xml # 2 minutes delay to get the processes started on the slave nodes. sleep 120 # Generate 1TB of data which will be used for sorting.../scripts/hadoopcli -a data -c../conf/hadoopcloudconf.xml # Create monitoring directories.../scripts/moncli -r clean -c../conf/hadoopcloudconf.xml # Start iostat utility to monitor disk usage on the slave nodes.../scripts/moncli -m iostat -a run -c../conf/hadoopcloudconf.xml -s run_iostat.sh 21

22 # Start sar utility on all the slave nodes to monitor CPU, network, and memory utilization../scripts/moncli -m sar -a run -c../conf/hadoopcloudconf.xml -s run_sar2.sh # Start the sort activity on the 1TB data generated in the earlier step.../scripts/hadoopcli -a run -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop iostat utility.../scripts/moncli -m iostat -a kill -s run_iostat_kill.sh -c../conf/hadoopcloudconf.xml # Convert iostat generated data to CSV file format.../scripts/moncli -m iostat -a csv -c../conf/hadoopcloudconf.xml # Convert data generated from sar utility to CSV format.../scripts/moncli -m sar -a csv -c../conf/hadoopcloudconf.xml -s run_sar_gen.sh # Using gnuplot to generate image containing graph of iostat data.../scripts/moncli -m iostat -a plot -t iostat -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing CPU graph from sar data.../scripts/moncli -m sar -a plot -t cpu -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing memory graph from sar data.../scripts/moncli -m sar -a plot -t mem -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing network graph from sar data.../scripts/moncli -m sar -a plot -t nw -c../conf/hadoopcloudconf.xml # Archive logfiles on all the slave nodes.../scripts/moncli -r tar -c../conf/hadoopcloudconf.xml 22

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations

More information

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has

More information

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms Ubuntu* Enterprise Cloud Executive Summary Intel Cloud Builder Guide Intel Xeon Processor Ubuntu* Enteprise Cloud Canonical*

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Fast, Low-Overhead Encryption for Apache Hadoop*

Fast, Low-Overhead Encryption for Apache Hadoop* Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software

More information

Cloud based Holdfast Electronic Sports Game Platform

Cloud based Holdfast Electronic Sports Game Platform Case Study Cloud based Holdfast Electronic Sports Game Platform Intel and Holdfast work together to upgrade Holdfast Electronic Sports Game Platform with cloud technology Background Shanghai Holdfast Online

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster

More information

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com 2012.10

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com 2012.10 COSBench: A benchmark Tool for Cloud Object Storage Services Jiangang.Duan@intel.com 2012.10 Updated June 2012 Self introduction COSBench Introduction Agenda Case Study to evaluate OpenStack* swift performance

More information

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Data Direct I/O Technology (Intel DDIO): A Primer > Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) WHITE PAPER Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) July 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table of Contents

More information

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To

More information

Intel Media SDK Library Distribution and Dispatching Process

Intel Media SDK Library Distribution and Dispatching Process Intel Media SDK Library Distribution and Dispatching Process Overview Dispatching Procedure Software Libraries Platform-Specific Libraries Legal Information Overview This document describes the Intel Media

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

HSearch Installation

HSearch Installation To configure HSearch you need to install Hadoop, Hbase, Zookeeper, HSearch and Tomcat. 1. Add the machines ip address in the /etc/hosts to access all the servers using name as shown below. 2. Allow all

More information

Big Business, Big Data, Industrialized Workload

Big Business, Big Data, Industrialized Workload Big Business, Big Data, Industrialized Workload Big Data Big Data 4 Billion 600TB London - NYC 1 Billion by 2020 100 Million Giga Bytes Copyright 3/20/2014 BMC Software, Inc 2 Copyright 3/20/2014 BMC Software,

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Intel Platform and Big Data: Making big data work for you.

Intel Platform and Big Data: Making big data work for you. Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable

More information

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation 1. GridGain In-Memory Accelerator For Hadoop GridGain's In-Memory Accelerator For Hadoop edition is based on the industry's first high-performance dual-mode in-memory file system that is 100% compatible

More information

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family

More information

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms Solution Brief Intel Xeon Processors Lanner Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms Internet usage continues to rapidly expand and evolve, and with it network

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness

More information

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0* How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0* Technical Brief v1.0 December 2011 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms Enomaly Elastic Computing Platform, * Service Provider Edition Executive Summary Intel Cloud Builder Guide

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Intel Service Assurance Administrator. Product Overview

Intel Service Assurance Administrator. Product Overview Intel Service Assurance Administrator Product Overview Running Enterprise Workloads in the Cloud Enterprise IT wants to Start a private cloud initiative to service internal enterprise customers Find an

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations

More information

CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings

CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings WHITE PAPER CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings August 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

RDMA for Apache Hadoop 0.9.9 User Guide

RDMA for Apache Hadoop 0.9.9 User Guide 0.9.9 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library

More information

Intel System Event Log (SEL) Viewer Utility

Intel System Event Log (SEL) Viewer Utility Intel System Event Log (SEL) Viewer Utility User Guide Document No. E12461-003 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS FOR THE GENERAL PURPOSE OF SUPPORTING

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

Intel RAID RS25 Series Performance

Intel RAID RS25 Series Performance PERFORMANCE BRIEF Intel RAID RS25 Series Intel RAID RS25 Series Performance including Intel RAID Controllers RS25DB080 & PERFORMANCE SUMMARY Measured IOPS surpass 200,000 IOPS When used with Intel RAID

More information

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Intel Open Network Platform Release 2.1: Driving Network Transformation

Intel Open Network Platform Release 2.1: Driving Network Transformation data sheet Intel Open Network Platform Release 2.1: Driving Network Transformation This new release of the Intel Open Network Platform () introduces added functionality, enhanced performance, and greater

More information

Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux

Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux Order Number: D58855-002 Disclaimer Information in this document is provided in connection with

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness

More information

Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation

Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation Agenda Overview HAM and HAL Hadoop* Ecosystem with Lustre * Benchmark results Conclusion and future work

More information

MapReduce Evaluator: User Guide

MapReduce Evaluator: User Guide University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview

More information

Accelerate Big Data Analysis with Intel Technologies

Accelerate Big Data Analysis with Intel Technologies White Paper Intel Xeon processor E7 v2 Big Data Analysis Accelerate Big Data Analysis with Intel Technologies Executive Summary It s not very often that a disruptive technology changes the way enterprises

More information

Configuring RAID for Optimal Performance

Configuring RAID for Optimal Performance Configuring RAID for Optimal Performance Intel RAID Controller SRCSASJV Intel RAID Controller SRCSASRB Intel RAID Controller SRCSASBB8I Intel RAID Controller SRCSASLS4I Intel RAID Controller SRCSATAWB

More information

HADOOP PERFORMANCE TUNING

HADOOP PERFORMANCE TUNING PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The

More information

Performance measurement of a Hadoop Cluster

Performance measurement of a Hadoop Cluster Performance measurement of a Hadoop Cluster Technical white paper Created: February 8, 2012 Last Modified: February 23 2012 Contents Introduction... 1 The Big Data Puzzle... 1 Apache Hadoop and MapReduce...

More information

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1 102 年 度 國 科 會 雲 端 計 算 與 資 訊 安 全 技 術 研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊 Version 0.1 總 計 畫 名 稱 : 行 動 雲 端 環 境 動 態 群 組 服 務 研 究 與 創 新 應 用 子 計 畫 一 : 行 動 雲 端 群 組 服 務 架 構 與 動 態 群 組 管 理 (NSC 102-2218-E-259-003) 計

More information

Intel System Event Log (SEL) Viewer Utility. User Guide SELViewer Version 10.0 /11.0 December 2012 Document number: G88216-001

Intel System Event Log (SEL) Viewer Utility. User Guide SELViewer Version 10.0 /11.0 December 2012 Document number: G88216-001 Intel System Event Log (SEL) Viewer Utility User Guide SELViewer Version 10.0 /11.0 December 2012 Document number: G88216-001 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Intel System Event Log (SEL) Viewer Utility

Intel System Event Log (SEL) Viewer Utility Intel System Event Log (SEL) Viewer Utility User Guide Document No. E12461-007 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS FOR THE GENERAL PURPOSE OF SUPPORTING

More information

Intel Solid-State Drive Data Center Tool User Guide Version 1.1

Intel Solid-State Drive Data Center Tool User Guide Version 1.1 Intel Solid-State Drive Data Center Tool User Guide Version 1.1 Order Number: 327191-002 October 2012 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR

More information

Intel X38 Express Chipset Memory Technology and Configuration Guide

Intel X38 Express Chipset Memory Technology and Configuration Guide Intel X38 Express Chipset Memory Technology and Configuration Guide White Paper January 2008 Document Number: 318469-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

TP1: Getting Started with Hadoop

TP1: Getting Started with Hadoop TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web

More information

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1. Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V Technical Brief v1.0 September 2012 2 Intel Ethernet and Configuring SR-IOV on Windows*

More information

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

Intel System Event Log (SEL) Viewer Utility

Intel System Event Log (SEL) Viewer Utility Intel System Event Log (SEL) Viewer Utility User Guide Document No. E12461-005 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS FOR THE GENERAL PURPOSE OF SUPPORTING

More information

Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide

Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide OPTIMIZATION AND TUNING GUIDE Intel Distribution for Apache Hadoop* Software Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide Configuring and managing your Hadoop* environment

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Setting up Hadoop with MongoDB on Windows 7 64-bit

Setting up Hadoop with MongoDB on Windows 7 64-bit SGT WHITE PAPER Setting up Hadoop with MongoDB on Windows 7 64-bit HCCP Big Data Lab 2015 SGT, Inc. All Rights Reserved 7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301)

More information

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1 How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1 Technical Brief v1.0 February 2013 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

HDFS. Hadoop Distributed File System

HDFS. Hadoop Distributed File System HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files

More information

Intel Internet of Things (IoT) Developer Kit

Intel Internet of Things (IoT) Developer Kit Intel Internet of Things (IoT) Developer Kit IoT Cloud-Based Analytics User Guide September 2014 IoT Cloud-Based Analytics User Guide Introduction Table of Contents 1.0 Introduction... 4 1.1. Revision

More information

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop

More information

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce

More information

How To Use Hadoop

How To Use Hadoop Hadoop in Action Justin Quan March 15, 2011 Poll What s to come Overview of Hadoop for the uninitiated How does Hadoop work? How do I use Hadoop? How do I get started? Final Thoughts Key Take Aways Hadoop

More information

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Intelligent Business Operations

Intelligent Business Operations White Paper Intel Xeon Processor E5 Family Data Center Efficiency Financial Services Intelligent Business Operations Best Practices in Cash Supply Chain Management Executive Summary The purpose of any

More information

Business white paper. HP Process Automation. Version 7.0. Server performance

Business white paper. HP Process Automation. Version 7.0. Server performance Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing White Paper Intel Xeon Processor E5 Family Big Data Analytics Cloud Computing Solutions Big Data Technologies for Ultra-High-Speed Data Transfer and Processing Using Technologies from Aspera and Intel

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

Intel Solid-State Drives Increase Productivity of Product Design and Simulation WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State

More information

IMPLEMENTING GREEN IT

IMPLEMENTING GREEN IT Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK

More information

System Event Log (SEL) Viewer User Guide

System Event Log (SEL) Viewer User Guide System Event Log (SEL) Viewer User Guide For Extensible Firmware Interface (EFI) and Microsoft Preinstallation Environment Part Number: E12461-001 Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

iscsi Quick-Connect Guide for Red Hat Linux

iscsi Quick-Connect Guide for Red Hat Linux iscsi Quick-Connect Guide for Red Hat Linux A supplement for Network Administrators The Intel Networking Division Revision 1.0 March 2013 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide White Paper August 2007 Document Number: 316971-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Running Kmeans Mapreduce code on Amazon AWS

Running Kmeans Mapreduce code on Amazon AWS Running Kmeans Mapreduce code on Amazon AWS Pseudo Code Input: Dataset D, Number of clusters k Output: Data points with cluster memberships Step 1: for iteration = 1 to MaxIterations do Step 2: Mapper:

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

How To Install Hadoop 1.2.1.1 From Apa Hadoop 1.3.2 To 1.4.2 (Hadoop)

How To Install Hadoop 1.2.1.1 From Apa Hadoop 1.3.2 To 1.4.2 (Hadoop) Contents Download and install Java JDK... 1 Download the Hadoop tar ball... 1 Update $HOME/.bashrc... 3 Configuration of Hadoop in Pseudo Distributed Mode... 4 Format the newly created cluster to create

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information