Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
|
|
- Rebecca Watkins
- 8 years ago
- Views:
Transcription
1 Intel Cloud Builders Guide Intel Xeon Processor-based Servers Apache* Hadoop* Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Apache* Hadoop* Intel Xeon Processor 5600 Series Audience and Purpose This reference architecture is for companies who are looking to build their own cloud computing infrastructure, including both enterprise IT organizations and cloud service providers or cloud hosting providers. The decision to use a cloud for the delivery of IT services is best done by starting with the knowledge and experience gained from previous work. This reference architecture gathers into one place the essentials of a Apache* Hadoop* cluster build out complete with benchmarking using TeraSort workload. This paper defines easy to use steps to replicate the deployment at your data center lab environment. The installation is based on Intel -powered servers and creates a multi node, optimized Hadoop environment. The reference architecture contains details on the Hadoop topology, hardware and software deployed, installation and configuration steps, and tests for real-world use cases that should significantly reduce the learning curve for building and operating your first Hadoop infrastructure. It is not expected that this paper can be used as-is. For example, adapting to an existing network and identifying specific management requirements are out of scope for this paper. Therefore, it is expected that the user of this paper will make significant adjustments as required to the design presented in order to meet their specific requirements of their own data center or lab environment. This paper also assumes that the reader has basic knowledge of computing infrastructure components and services. Intermediate knowledge of Linux* operating system, Python*, Hadoop framework and basic system administration skills is assumed. February 2012
2 Table of Contents Executive Summary... 3 Hadoop* Overview... 3 Hadoop System Architecture... 4 Operation of a Hadoop Cluster... 5 TeraSort Workload...7 TeraSort Workflow... 7 Test Methodology... 7 Intel Benchmark Install and Test Tool (Intel BITT)... 8 Intel BITT Benefits... 8 Configuring the Setups... 8 Running TeraSort Results...24 Conclusion
3 Executive Summary Map reduce technology is gaining popularity among enterprises for a variety of large-scale data intensive jobs. Map reduce based on Apache* Hadoop* is rapidly emerging as a technology preferred for big data processing and management. Enterprises are deploying commodity standard server clusters and using business intelligence tools along with Apache Hadoop to obtain high performing solutions for their large scale data processing requirements. Motivation to deploy Hadoop comes from the fact that enterprises are gathering huge unstructured data sets generated by their business processes, which enterprises are looking to exploit to get the most value out of this data to help them in the decision making process. Hadoop infrastructure moves data closer to compute to achieve high processing throughput. In this paper we tried to create a small commodity server cluster based on an Apache Hadoop distribution and ran sort benchmark to get data on how fast the cluster can process data. This reference architecture will give Figure 1: Hadoop* stack an understanding on how to set up the cluster, tune parameters, and run sort benchmark. This reference architecture provides a blue print for building a cluster with Intel Xeon processor based standard server platforms and the open source Apache Hadoop distribution. The paper further describes parameters for tuning and execution of sort benchmark to measure performance. Hadoop* Overview Apache Hadoop is a framework for running applications on large cluster built using standard hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System (HDFS) are designed so that node failures are automatically tolerated by the framework. Hadoop framework consists of three major components: Common: Hadoop Common is a set of utilities that support the Hadoop subprojects. Hadoop Common includes FileSystem, RPC, and serialization libraries. HDFS: The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on lowcost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS can stream file system data. MapReduce: MapReduce was first developed by Google to process large datasets. MapReduce has two functions, map and reduce, and a framework for running a large number of instances of these programs on commodity hardware. The map function reads a set of records from an input file, processes these records, and outputs a set of intermediate records. As part of the map function, a split function distributes the intermediate records across many buckets using a hash function. The reduce function then processes the intermediate records. The MapReduce Framework consists of a single master JobTracker and one slave TaskTracker per cluster node. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them, and re-executing the failed tasks. The slaves execute the tasks as directed by the master. 3
4 Hadoop* System Architecture Hadoop framework works on the principle of "moving compute closer to the data." Figure 2 shows typical deployment of Hadoop framework on multiple standard server nodes. The computation occurs on the same node where data resides, which enables Hadoop to deliver better performance compared to storing data on the network. A combination of standard server platforms and Hadoop infrastructure provide a cost efficient and high performance platform for dataparallel applications. Each Hadoop cluster has one Master Node and multiple slave nodes. The Master node runs NameNode and JobTracker functions, coordinating with slave nodes to get the job fed to the cluster completed. The SlaveNodes run TaskTracker, HDFS to store the data, and have Map and Reduce functions which perform the data computations. Figure 2: Hadoop* deployment on standard server nodes 4
5 Operation of a Hadoop* Cluster Figure 3 shows the operation of a Hadoop cluster. The client submits the job to the Master node which acts as an orchestrator with the Slave nodes to complete the job. The JobTracker on the Master node is responsible for controlling the MapReduce job. The slaves run TaskTracker which keeps track of the MapReduce job, reporting the job status to the JobTracker on frequent intervals. In an event of a task failure, the JobTracker reschedules the task on the same slave node or a different slave node. HDFS is a location aware or rack aware file system which primarily manages data in a Hadoop cluster. HDFS replicates the data on various nodes in the cluster to attain data reliability; however, HDFS has a single point of failure in NameNode function. If the NameNode fails the file system and data become inaccessible. Since the JobTracker assigns the data to slave nodes, JobTracker is aware of the data location and efficiently schedules the task where the data is residing, thus decreasing the need to move data from one node to other and saving network bandwidth. Once the map function is complete, the data is transferred to different node to perform reduce function. MapReduce framework provides an efficient way to scale the size of the cluster by adopting modular scaleout strategy. The nodes are scaled out by adding one or more nodes with HDFS and MapReduce functions supporting new nodes as they are added. Figure 3: Operation of Hadoop* cluster 5
6 Cluster hardware setup: Total 17 nodes in the cluster. One Master node and 16 Slave nodes. Data Network: Arista 7124 switch connected to Intel Ethernet Server Adapter X520-DA2 dual 10GbE NIC on every node. Each server has an internal private Intel dual 1GbE NIC connected to a top-of-rack switch that is used for management tasks. Each node has a disk enclosure populated with SATA II 7.2K, 2TB hard disk drives for a total of 24TBs of raw storage per hard disk enclosure. Dual socket Intel 5520 Chipset platform. Two Intel Xeon processor X5680 at 3.33GHz, 12MB cache. 48GB 1333MHz DDR3 memory Red Hat Enterprise Linux* 6.0 (RHEL 6.0)(Kernel: el6..x86_64) Hadoop* Framework v Figure 4: Cluster hardware setup 6
7 TeraSort Workload TeraSort is a popular Hadoop benchmarking workload. The 1TB limit is not a hard-set limit since TeraSort allows the user to sort any size of dataset by changing various parameters. TeraSort benchmark tests HDFS and MapReduce functions in the Hadoop cluster. TeraSort is part of the Hadoop framework and is part of the standard Apache Hadoop installation package. TeraSort is widely used to benchmark and tune large Hadoop clusters with hundreds of nodes. TeraSort works in two steps: TeraGen: This generates random data based on the dataset size set by the user. This dataset is used as input data for the sort benchmark. TeraSort: TeraSort sorts the input data generated by TeraGen and stores the output data on HDFS. An optional third step, called TeraValidate, allows validation of the sorted data. This paper does not discuss this optional third step. TeraSort Workflow Figure 5 shows the workflow of the TeraSort workload tested on our cluster. The flow chart depicts the start of the workload at one control node with one master node kick starting the job and 16 slave nodes dividing 8192 map tasks. Once the map phase is complete, the cluster starts the reduce phase with 243 tasks. When the reduce phase is completed, the data output is stored on the file system. Test Methodology To run the workload we used an Intel Benchmark Install and Test Tool (Intel BITT. The workload was scripted to kickstart the job on the cluster, run TeraGen to generate the test data, and run the TeraSort task to sort the generated data. The scrip also kicks off a series of counters on the slave nodes to gather performance metrics on each of the nodes. Key hardware metrics such as processor utilization, network bandwidth consumption, memory utilization, and disk bandwidth consumption is captured on each node at 30 second intervals. Once the job is complete, the counters are stopped on all slave nodes and the log files containing performance data are copied to the master node for calculating utilization of the cluster. This data is plotted into graphs using gnuplot and presented for further analysis. Also we noted the time taken to complete the job taken from the Hadoop management user interface. The lower the time measurement the better the performance. Figure 5: TeraSort workflow 7
8 Intel Benchmark Install and Test Tool Intel Benchmark Install and Test Tool (Intel BITT) provides tools to install, configure, run, and analyze benchmark programs on small test clusters. The installcli tool is used to install tar files on a cluster. moncli is used to monitor performance of the cluster nodes and provides options to start monitoring, stop monitoring, and generate CPU, disk I/O, memory, and network performance plots for the nodes and cluster. hadoopcli provides an automated Hadoop test environment. The Intel BITT templates enable configurable plot generation. Intel BITT command scripts enable configurable scripts to control monitoring actions. Benchmark configuration is implemented by using XML files. Configurable properties include the location of installation, monitoring directories, monitoring sampling duration, the list of the cluster nodes, and the list of the tar files that need to be installed. Intel BITT is implemented by using Python* and uses gnuplot to generate performance plots. Intel BITT currently runs on Linux*. Intel BITT Features Intel Benchmark Install and Test Tool provides the following tools: installcli: Used to install a specified list of tar files to a specified list of nodes moncli: Used to monitor performance metrics locally and/or remotely. It can be used to monitor the performance of a cluster. The tool currently supports sar and iostat monitoring tools. hadoopcli: Used to install, configure, and test Hadoop clusters. Intel BITT is implemented in an object oriented fashion. It can be extended to support other performance monitoring tools such as vmstat and mpstat if it is needed. The toolkit includes the following building blocks: XML parser: Parses the XML properties including name, value, and description fields. The install and monitor configuration is defined by using XML properties. Tool specific options are passed through command line options. Log file parser: Log files in the form of tables which contains rows and columns are parsed and CSV files are generated for each column. The column items on each row are separated using whitespace. The column header names are used to create CSV file names. Plot generator: gnuplot is used to plot the contents of the CSV files by using templates. The templates define the list of CSV files that are used as inputs to generate the plots. The templates also define labels and titles of the plots. Sar monitoring tool Iostat monitoring tool VTuneTM monitoring tool Emon monitoring tool installcli is used to install Intel BITT moncli is used to monitor local or cluster nodes hadoopcli is implemented by using the building blocks defined above and it is used to create and test Hadoop clusters Configuring the Setup We installed RHEL 6.0 on all 17 nodes with the default configuration and configured passphraseless SSH access between the nodes to enable them to communicate without having to login with a password every time there is a transaction between them. 1. Install Intel BITT tar file Cd mkdir bitt cp bitt-1.0.tar bitt cd bitt/bitt-2.0 The following is the list of subdirectories under Intel BITT home: cmd conf samples scripts templates 8
9 2. Create a release directory under Intel BITT home to copy tar files. mkdir p bitt/bitt-1.0/release cp bitt-1.0.tar bitt/bitt-1.0/release You can also download and copy the Hadoop tar file to the release directory as well if you are planning to test Hadoop. cp hadoop tar.gz ~/bitt/bitt-1.0/release 3. Download jdk and create a tar file from the installed jdk tar. For example: mkdir jdk cp jdk-6u23-linux-x64.bin jdk cd jdk chmod +x jdk-6u23-linux-x64.bin./jdk-6u23-linux-x64.bin rm jdk-6u23-linux-x64.bin tar -cvf ~/bitt/bitt-1.0/release/jdk1.6.0_23.tar 4. Download gnuplot and create a tar file from the installed gnuplot tree. For example: mkdir myinstall cp gnuplot rc1.tar myinstall cd myinstall/ tar -xvf gnuplot rc1.tar mkdir p install/ gnuplot cd gnuplot rc1./configure --prefix=/home/<user>/myinstall/install/ gnuplot make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/gnuplot tar. 5. Download Python and create a tar file from the installed python tar for your platform. For example: mkdir myinstall cp Python tgz myinstall cd myinstall/ tar -xvf Python tgz mkdir p install/ Python cd Python /configure --prefix=/home/<user>/myinstall/install/ Python make make install cd../install tar -cvf ~/bitt/bitt-1.0/release/ Python tar. 9
10 6. Run TeraSort. For example: Run terasort.sh. You need to update the corresponding configuration files as described below. cd ~/bitt/bitt-1.0/conf install gnuplot on your client system install python on your client system Make sure python3 and gnuplot are on your path on the client system cd ~/bitt/bitt-1.0/scripts./terasort.sh 7. Configuration file edits. All configuration files are found under ~/bitt/bitt-1.0/conf a. hadoopnodelist: Configuration file which contains cluster nodes. Any addition or removal of nodes from the cluster should register here to be recognized by the load generator tool. node1.domain.com node2.domain.com node3.domain.com node4.domain.com.. node17.domain.com b. hadooptarlist: Configuration file where the executable are installed.../release/bitt-1.0.tar.gz../release/python-3.2.tar.gz../release/jdk1.6.0_25.tar.gz../release/hadoop tar.gz../release/gnuplot tar.gz 10
11 c. hadoop-env.sh: Main Hadoop environment configuration file. # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management. jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" 11
12 d. hadoopcloudconf.xml: Custom XML configuration file used to define key parameters on how the test is executed and where the data is stored. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>cloudtemplateloc</name> <value>/home/hadoop/bitt/bitt-1.0/conf</value> <description>cloud conf template file location</description> <name>cloudtemplatevars</name> <value>all</value> <description>the list of template variables to copy</description> <name>jobtrackerport</name> <value>8021</value> <description>jobtracker port</description> <name>namenodeport</name> <value>8020</value> <description>jobtracker port</description> <name>cloudconfdir</name> <value>/tmp/hadoopconf</value> <description>generated cloud conf file</description> <name>cloudtmpdir</name> <value>hadoop-${user.name}</value> <description>cloud tmp dir</description> <name>cloudinstalldir</name> <value>/usr/local/hadoop/install</value> <description>cloud install dir</description> <name>cloudnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopnodelist</value> <description>cluster nodes</description> <name>monnodelist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadoopmonnodelist</value> 12
13 <description>cluster monitor nodes</description> <name>cloudtarlist</name> <value>/home/hadoop/bitt/bitt-1.0/conf/hadooptarlist</value> <description>cluster nodes</description> <name>moninterval</name> <value>30</value> <description>sampling duration</description> <name>moncount</name> <value>0</value> <description>number of samples</description> <name>monresults</name> <value>/tmp/monhadres</value> <description>cloud monitor log files location</description> <name>monsummary</name> <value>/tmp/monhadsum</value> <description>cloud monitor log files location</description> <name>mondir</name> <value>/tmp/monhadloc</value> <description>cloud monitor log files location</description> <name>gnucmd</name> <value>/usr/local/hadoop/install/gnuplot-4.4.3/bin/gnuplot</value> <description>none</description> </configuration> 13
14 e. hdfs-site-template.xml: Hadoop configuration file where HDFS parameters are set. Please note the optimizations values we used to run the test are shown in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>dfs.replication</name> <value>3</value> <description>default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> <name>dfs.datanode.max.xcievers</name> <value>655360</value> <description>number of files Hadoop serves at one time</description> <name>dfs.data.dir</name> <value>/mnt/disk1/hdfs/data,/mnt/disk2/hdfs/data,/mnt/disk3/hdfs/data,/mnt/disk4/hdfs/data,/mnt/disk5/hdfs/data,/mnt/ disk6/hdfs/data,/mnt/disk7/hdfs/data,/mnt/disk8/hdfs/data,/mnt/disk9/hdfs/data,/mnt/disk10/hdfs/data,/mnt/disk11/hdfs/ data,/mnt/disk12/hdfs/data</value> <description>determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> <name>dfs.block.size</name> <value> </value> <description>the default block size for new files.</description> <name>io.file.buffer.size</name> <value>131072</value> <description> </description> 14
15 <name>ipc.server.tcpnodelay</name> <value>true</value> <description> </description> <name>ipc.client.tcpnodelay</name> <value>true</value> <description> </description> <name>dfs.namenode.handler.count</name> <value>40</value> <description> </description> <name>io.sort.factor</name> <value>100</value> <description> </description> <name>io.sort.mb</name> <value>220</value> <description> </description> </configuration> 15
16 f. mapred-site-template.xml: Hadoop configuration file which defines key MapReduce parameters. Values used in our testing are highlighted in bold font. <?xml version="1.0" encoding="utf-8"?> <!-- Put site-specific property overrides in this file. --> <configuration> <name>mapred.tasktracker.map.tasks.maximum</name> <value>24</value> <description>the maximum number of map tasks that will be run simultaneously by a task tracker. </description> <name>io.sort.record.percent</name> <value>0.3</value> <description>added as per ssg reco </description> <name>io.sort.spill.percent</name> <value>0.9</value> <description>addded as per ssg reco </description> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>12</value> <description>the maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> <name>mapred.reduce.tasks</name> <value>64</value> <description>the default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Assume 10 nodes, 10*2-2 </description> <name>mapred.local.dir</name> <value>/mnt/disk1/hdfs/mapred,/mnt/disk2/hdfs/mapred,/mnt/disk3/hdfs/ mapred,/mnt/disk4/hdfs/mapred,/mnt/disk5/hdfs/mapred,/mnt/disk6/hdfs/ mapred,/mnt/disk7/hdfs/mapred,/mnt/disk8/hdfs/mapred,/mnt/disk9/hdfs/ mapred,/mnt/disk10/hdfs/mapred,/mnt/disk11/hdfs/mapred,/mnt/disk12/hdfs/ 16
17 mapred</value> <description>the local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. </description> <name>mapred.child.java.opts</name> <value>-xmx2048m -Djava.net.preferIPv4Stack=true</value> <description>java opts for the task tracker child processes. The following symbol, if present, will be is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes. </description> <name>mapred.output.compress</name> <value>false</value> <description>should the job outputs be compressed? </description> <name>mapred.compress.map.output</name> <value>false</value> <description>should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. </description> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.defaultcodec</value> <description>if the job outputs are compressed, how should they be compressed? </description> 17
18 <name>mapred.map.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.reduce.tasks.speculative.execution</name> <value>true</value> <description> </description> <name>mapred.job.reuse.jvm.num.tasks</name> <value>1</value> <description> </description> <name>mapred.reduce.parallel.copies</name> <value>20</value> <description> </description> <name>mapred.min.split.size</name> <value>65536</value> <description> </description> <name>mapred.reduce.copy.backoff</name> <value>5</value> <description> </description> <name>mapred.job.shuffle.merge.percent</name> <value>0.7</value> <description> </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> </description> <name>mapred.job.reduce.input.buffer.percent</name> <value>0.90</value> <description> </description> </configuration> 18
19 g. hadoop-terasort.xml: Intel BITT configuration file from which the parameters are read before the test runs. Parameters in this configuration file override values in the other configuration files mentioned above. This configuration file helps to quickly change the parameter values for different test runs without editing individual configuration files. <?xml version="1.0" encoding="utf-8"?> <configuration> <name>mapred.map.tasks</name> <value>8192</value> <description> Total Map task number </description> <name>mapred.reduce.tasks</name> <value>243</value> <description> Total Reduce task number </description> <name>dfs.replication</name> <value>3</value> <description> Number of copies to replicate </description> <name>mapred.compress.map.output</name> <value>true</value> <description> compress map output </description> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.66</value> <description> none </description> <name>datasetsizesmall</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> <name>datasetsize</name> <value> </value> <description> Total Record Number about 1T data. 1 record has 100 bytes </ description> 19
20 <name>datasetname</name> <value>tera</value> <description> none </description> <name>outputdataname</name> <value>tera-sort2</value> <description> none </description> <name>jarfile</name> <value>hadoop examples.jar</value> <description> none </description> </configuration> 20
21 Running TeraSort TeraSort can be started by running terasort.sh. The script runs various commands involved in starting the test, starting performance counters, ending the test, and gathering performance counter data for analysis. Below is the list of commands executed when the script is running, and a brief explanation on what the command does. #!/usr/bin/env bash ########################################################### #Intel Benchmark Install and Test Tool (BITT) Use Cases #Typical sequence for hadoop terasort benchmark: ########################################################### echo "START: terasort benchmark..." date # Stop any current running test on the cluster.../scripts/hadoopcli -a stop -c../conf/hadoopcloudconf.xml # Kill Java* processes on the nodes./runkill.sh # Install fresh copy of executables on the slave nodes.../scripts/hadoopcli -a install -c../conf/hadoopcloudconf.xml # Format the HDFS to store the data../scripts/hadoopcli -a format -c../conf/hadoopcloudconf.xml # Start Java processes on all slave nodes.../scripts/hadoopcli -a start -c../conf/hadoopcloudconf.xml # 2 minutes delay to get the processes started on the slave nodes. sleep 120 # Generate 1TB of data which will be used for sorting.../scripts/hadoopcli -a data -c../conf/hadoopcloudconf.xml # Create monitoring directories.../scripts/moncli -r clean -c../conf/hadoopcloudconf.xml # Start iostat utility to monitor disk usage on the slave nodes.../scripts/moncli -m iostat -a run -c../conf/hadoopcloudconf.xml -s run_iostat.sh 21
22 # Start sar utility on all the slave nodes to monitor CPU, network, and memory utilization../scripts/moncli -m sar -a run -c../conf/hadoopcloudconf.xml -s run_sar2.sh # Start the sort activity on the 1TB data generated in the earlier step.../scripts/hadoopcli -a run -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop sar monitoring utility.../scripts/moncli -m sar -a kill -s run_sar_kill.sh -c../conf/hadoopcloudconf.xml # Stop iostat utility.../scripts/moncli -m iostat -a kill -s run_iostat_kill.sh -c../conf/hadoopcloudconf.xml # Convert iostat generated data to CSV file format.../scripts/moncli -m iostat -a csv -c../conf/hadoopcloudconf.xml # Convert data generated from sar utility to CSV format.../scripts/moncli -m sar -a csv -c../conf/hadoopcloudconf.xml -s run_sar_gen.sh # Using gnuplot to generate image containing graph of iostat data.../scripts/moncli -m iostat -a plot -t iostat -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing CPU graph from sar data.../scripts/moncli -m sar -a plot -t cpu -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing memory graph from sar data.../scripts/moncli -m sar -a plot -t mem -c../conf/hadoopcloudconf.xml # Using gnuplot to generate image containing network graph from sar data.../scripts/moncli -m sar -a plot -t nw -c../conf/hadoopcloudconf.xml # Archive logfiles on all the slave nodes.../scripts/moncli -r tar -c../conf/hadoopcloudconf.xml 22
Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms
EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations
More informationCS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has
More informationIntel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms Ubuntu* Enterprise Cloud Executive Summary Intel Cloud Builder Guide Intel Xeon Processor Ubuntu* Enteprise Cloud Canonical*
More informationDell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
More informationIntel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationFast, Low-Overhead Encryption for Apache Hadoop*
Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software
More informationCloud based Holdfast Electronic Sports Game Platform
Case Study Cloud based Holdfast Electronic Sports Game Platform Intel and Holdfast work together to upgrade Holdfast Electronic Sports Game Platform with cloud technology Background Shanghai Holdfast Online
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationDeploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
More informationCOSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com 2012.10
COSBench: A benchmark Tool for Cloud Object Storage Services Jiangang.Duan@intel.com 2012.10 Updated June 2012 Self introduction COSBench Introduction Agenda Case Study to evaluate OpenStack* swift performance
More informationIntel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationIncreasing Hadoop Performance with SanDisk Solid State Drives (SSDs)
WHITE PAPER Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs) July 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table of Contents
More informationIntel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study
Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To
More informationIntel Media SDK Library Distribution and Dispatching Process
Intel Media SDK Library Distribution and Dispatching Process Overview Dispatching Procedure Software Libraries Platform-Specific Libraries Legal Information Overview This document describes the Intel Media
More informationArchitecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
More informationReal-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software
Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationApache Hadoop new way for the company to store and analyze big data
Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File
More informationHSearch Installation
To configure HSearch you need to install Hadoop, Hbase, Zookeeper, HSearch and Tomcat. 1. Add the machines ip address in the /etc/hosts to access all the servers using name as shown below. 2. Allow all
More informationBig Business, Big Data, Industrialized Workload
Big Business, Big Data, Industrialized Workload Big Data Big Data 4 Billion 600TB London - NYC 1 Billion by 2020 100 Million Giga Bytes Copyright 3/20/2014 BMC Software, Inc 2 Copyright 3/20/2014 BMC Software,
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationThree Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
More informationWeekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
More informationIntel Platform and Big Data: Making big data work for you.
Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable
More information1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation
1. GridGain In-Memory Accelerator For Hadoop GridGain's In-Memory Accelerator For Hadoop edition is based on the industry's first high-performance dual-mode in-memory file system that is 100% compatible
More informationDeploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters
CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family
More informationIntel Network Builders: Lanner and Intel Building the Best Network Security Platforms
Solution Brief Intel Xeon Processors Lanner Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms Internet usage continues to rapidly expand and evolve, and with it network
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationAccelerating Business Intelligence with Large-Scale System Memory
Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness
More informationHow to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*
How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0* Technical Brief v1.0 December 2011 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED
More informationIntel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms
Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms Enomaly Elastic Computing Platform, * Service Provider Edition Executive Summary Intel Cloud Builder Guide
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationIntel Service Assurance Administrator. Product Overview
Intel Service Assurance Administrator Product Overview Running Enterprise Workloads in the Cloud Enterprise IT wants to Start a private cloud initiative to service internal enterprise customers Find an
More informationUnstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationBig Data Storage Options for Hadoop Sam Fineberg, HP Storage
Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations
More informationCloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings
WHITE PAPER CloudSpeed SATA SSDs Support Faster Hadoop Performance and TCO Savings August 2014 951 SanDisk Drive, Milpitas, CA 95035 2014 SanDIsk Corporation. All rights reserved www.sandisk.com Table
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationRDMA for Apache Hadoop 0.9.9 User Guide
0.9.9 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)
More informationMeasuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
More informationIntel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
More informationIntel System Event Log (SEL) Viewer Utility
Intel System Event Log (SEL) Viewer Utility User Guide Document No. E12461-003 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS FOR THE GENERAL PURPOSE OF SUPPORTING
More informationHADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com
More informationIntel RAID RS25 Series Performance
PERFORMANCE BRIEF Intel RAID RS25 Series Intel RAID RS25 Series Performance including Intel RAID Controllers RS25DB080 & PERFORMANCE SUMMARY Measured IOPS surpass 200,000 IOPS When used with Intel RAID
More informationNew Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC
New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationIntel Open Network Platform Release 2.1: Driving Network Transformation
data sheet Intel Open Network Platform Release 2.1: Driving Network Transformation This new release of the Intel Open Network Platform () introduces added functionality, enhanced performance, and greater
More informationIntel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux
Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux Order Number: D58855-002 Disclaimer Information in this document is provided in connection with
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationAccelerating Business Intelligence with Large-Scale System Memory
Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness
More informationHadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation
Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation Agenda Overview HAM and HAL Hadoop* Ecosystem with Lustre * Benchmark results Conclusion and future work
More informationMapReduce Evaluator: User Guide
University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview
More informationAccelerate Big Data Analysis with Intel Technologies
White Paper Intel Xeon processor E7 v2 Big Data Analysis Accelerate Big Data Analysis with Intel Technologies Executive Summary It s not very often that a disruptive technology changes the way enterprises
More informationConfiguring RAID for Optimal Performance
Configuring RAID for Optimal Performance Intel RAID Controller SRCSASJV Intel RAID Controller SRCSASRB Intel RAID Controller SRCSASBB8I Intel RAID Controller SRCSASLS4I Intel RAID Controller SRCSATAWB
More informationHADOOP PERFORMANCE TUNING
PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The
More informationPerformance measurement of a Hadoop Cluster
Performance measurement of a Hadoop Cluster Technical white paper Created: February 8, 2012 Last Modified: February 23 2012 Contents Introduction... 1 The Big Data Puzzle... 1 Apache Hadoop and MapReduce...
More information研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1
102 年 度 國 科 會 雲 端 計 算 與 資 訊 安 全 技 術 研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊 Version 0.1 總 計 畫 名 稱 : 行 動 雲 端 環 境 動 態 群 組 服 務 研 究 與 創 新 應 用 子 計 畫 一 : 行 動 雲 端 群 組 服 務 架 構 與 動 態 群 組 管 理 (NSC 102-2218-E-259-003) 計
More informationIntel System Event Log (SEL) Viewer Utility. User Guide SELViewer Version 10.0 /11.0 December 2012 Document number: G88216-001
Intel System Event Log (SEL) Viewer Utility User Guide SELViewer Version 10.0 /11.0 December 2012 Document number: G88216-001 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationIntel System Event Log (SEL) Viewer Utility
Intel System Event Log (SEL) Viewer Utility User Guide Document No. E12461-007 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS FOR THE GENERAL PURPOSE OF SUPPORTING
More informationIntel Solid-State Drive Data Center Tool User Guide Version 1.1
Intel Solid-State Drive Data Center Tool User Guide Version 1.1 Order Number: 327191-002 October 2012 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR
More informationIntel X38 Express Chipset Memory Technology and Configuration Guide
Intel X38 Express Chipset Memory Technology and Configuration Guide White Paper January 2008 Document Number: 318469-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationTP1: Getting Started with Hadoop
TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web
More informationIntel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.
Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V Technical Brief v1.0 September 2012 2 Intel Ethernet and Configuring SR-IOV on Windows*
More informationHadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013
Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay
More informationMapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
More informationIntel System Event Log (SEL) Viewer Utility
Intel System Event Log (SEL) Viewer Utility User Guide Document No. E12461-005 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS FOR THE GENERAL PURPOSE OF SUPPORTING
More informationIntel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide
OPTIMIZATION AND TUNING GUIDE Intel Distribution for Apache Hadoop* Software Intel Distribution for Apache Hadoop* Software: Optimization and Tuning Guide Configuring and managing your Hadoop* environment
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationSetting up Hadoop with MongoDB on Windows 7 64-bit
SGT WHITE PAPER Setting up Hadoop with MongoDB on Windows 7 64-bit HCCP Big Data Lab 2015 SGT, Inc. All Rights Reserved 7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301)
More informationHow to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1
How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1 Technical Brief v1.0 February 2013 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationHDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
More informationIntel Internet of Things (IoT) Developer Kit
Intel Internet of Things (IoT) Developer Kit IoT Cloud-Based Analytics User Guide September 2014 IoT Cloud-Based Analytics User Guide Introduction Table of Contents 1.0 Introduction... 4 1.1. Revision
More informationLecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015
Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop
More informationPerformance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
More informationHow To Use Hadoop
Hadoop in Action Justin Quan March 15, 2011 Poll What s to come Overview of Hadoop for the uninitiated How does Hadoop work? How do I use Hadoop? How do I get started? Final Thoughts Key Take Aways Hadoop
More informationPower Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze
Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...
More informationJeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
More informationIntelligent Business Operations
White Paper Intel Xeon Processor E5 Family Data Center Efficiency Financial Services Intelligent Business Operations Best Practices in Cash Supply Chain Management Executive Summary The purpose of any
More informationBusiness white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationAccelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
More informationBig Data Technologies for Ultra-High-Speed Data Transfer and Processing
White Paper Intel Xeon Processor E5 Family Big Data Analytics Cloud Computing Solutions Big Data Technologies for Ultra-High-Speed Data Transfer and Processing Using Technologies from Aspera and Intel
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationIntel Solid-State Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
More informationIMPLEMENTING GREEN IT
Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK
More informationSystem Event Log (SEL) Viewer User Guide
System Event Log (SEL) Viewer User Guide For Extensible Firmware Interface (EFI) and Microsoft Preinstallation Environment Part Number: E12461-001 Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN
More informationiscsi Quick-Connect Guide for Red Hat Linux
iscsi Quick-Connect Guide for Red Hat Linux A supplement for Network Administrators The Intel Networking Division Revision 1.0 March 2013 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH
More informationIntel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide
Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide White Paper August 2007 Document Number: 316971-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationRunning Kmeans Mapreduce code on Amazon AWS
Running Kmeans Mapreduce code on Amazon AWS Pseudo Code Input: Dataset D, Number of clusters k Output: Data points with cluster memberships Step 1: for iteration = 1 to MaxIterations do Step 2: Mapper:
More informationHP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
More informationHow To Install Hadoop 1.2.1.1 From Apa Hadoop 1.3.2 To 1.4.2 (Hadoop)
Contents Download and install Java JDK... 1 Download the Hadoop tar ball... 1 Update $HOME/.bashrc... 3 Configuration of Hadoop in Pseudo Distributed Mode... 4 Format the newly created cluster to create
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More information