Report Documentation Page
|
|
|
- Ezra Hancock
- 9 years ago
- Views:
Transcription
1 Applyig performace models to uderstad data-itesive computig efficiecy Elie Krevat, Tomer Shira, Eric Aderso, Joseph Tucek, Jay J. Wylie, Gregory R. Gager Caregie Mello Uiversity HP Labs CMU-PDL May 2010 Parallel Data Laboratory Caregie Mello Uiversity Pittsburgh, PA Abstract ew programmig frameworks for scale-out parallel aalysis, such as MapReduce ad Hadoop, have become a corerstoe for exploitig large datasets. However, there has bee little aalysis of how these systems perform relative to the capabilities of the hardware o which they ru. This paper describes a simple aalytical model that predicts the optimal performace of a parallel dataflow system. The model exposes the iefficiecy of popular scale-out systems, which take 3 13 loger to complete jobs tha the hardware should allow, eve i well-tued systems used to achieve record-breakig bechmark results. To validate the saity of our model, we preset small-scale experimets with Hadoop ad a simplified dataflow processig tool called Parallel DataSeries. Parallel DataSeries achieves performace close to the aalytic optimal, showig that the model is realistic ad that large improvemets i the efficiecy of parallel aalytics are possible. Ackowledgemets: We thak the members ad compaies of the PDL Cosortium icludig APC, Data Domai, EMC, Facebook, Google, Hewlett-Packard Labs, Hitachi, IBM, Itel, LSI, Microsoft Research, EC Laboratories, etapp, Oracle, Seagate, Su, Symatec, VMWare, ad Yahoo! Labs) for their iterest, isights, feedback, ad support. This research was sposored i part by a HP Iovatio Research Award ad by CyLab at Caregie Mello Uiversity uder grat DAAD from the Army Research Office. Elie Krevat is supported i part by a DSEG Fellowship, which is sposored by the Departmet of Defese.
2 Report Documetatio Page Form Approved OMB o Public reportig burde for the collectio of iformatio is estimated to average 1 hour per respose, icludig the time for reviewig istructios, searchig existig data sources, gatherig ad maitaiig the data eeded, ad completig ad reviewig the collectio of iformatio. Sed commets regardig this burde estimate or ay other aspect of this collectio of iformatio, icludig suggestios for reducig this burde, to Washigto Headquarters Services, Directorate for Iformatio Operatios ad Reports, 1215 Jefferso Davis Highway, Suite 1204, Arligto VA Respodets should be aware that otwithstadig ay other provisio of law, o perso shall be subject to a pealty for failig to comply with a collectio of iformatio if it does ot display a curretly valid OMB cotrol umber. 1. REPORT DATE MAY REPORT TYPE 3. DATES COVERED to TITLE AD SUBTITLE Applyig performace models to uderstad data-itesive computig efficiecy 5a. COTRACT UMBER 5b. GRAT UMBER 5c. PROGRAM ELEMET UMBER 6. AUTHORS) 5d. PROJECT UMBER 5e. TASK UMBER 5f. WORK UIT UMBER 7. PERFORMIG ORGAIZATIO AMES) AD ADDRESSES) Caregie Mello Uiversity,Parallel Data Laboratory,Pittsburgh,PA, PERFORMIG ORGAIZATIO REPORT UMBER 9. SPOSORIG/MOITORIG AGECY AMES) AD ADDRESSES) 10. SPOSOR/MOITOR S ACROYMS) 12. DISTRIBUTIO/AVAILABILITY STATEMET Approved for public release; distributio ulimited 13. SUPPLEMETARY OTES 11. SPOSOR/MOITOR S REPORT UMBERS) 14. ABSTRACT ew programmig frameworks for scale-out parallel aalysis, such as MapReduce ad Hadoop, have become a corerstoe for exploitig large datasets. However, there has bee little aalysis of how these systems perform relative to the capabilities of the hardware o which they ru. This paper describes a simple aalytical model that predicts the optimal performace of a parallel dataflow system. The model exposes the iefficiecy of popular scale-out systems, which take 3?13?loger to complete jobs tha the hardware should allow, eve i well-tued systems used to achieve record-breakig bechmark results. To validate the saity of our model, we preset small-scale experimets with Hadoop ad a simplified dataflow processig tool called Parallel DataSeries. Parallel DataSeries achieves performace close to the aalytic optimal, showig that the model is realistic ad that large improvemets i the efficiecy of parallel aalytics are possible. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATIO OF: 17. LIMITATIO OF ABSTRACT a. REPORT uclassified b. ABSTRACT uclassified c. THIS PAGE uclassified Same as Report SAR) 18. UMBER OF PAGES 18 19a. AME OF RESPOSIBLE PERSO Stadard Form 298 Rev. 8-98) Prescribed by ASI Std Z39-18
3 Keywords: data-itesive computig, cloud computig, aalytical modelig, Hadoop, MapReduce, performace ad efficiecy
4 1 Itroductio Data-itesive scalable computig DISC) refers to a rapidly growig style of computig characterized by its reliace o huge ad growig datasets [7]. Drive by the desire ad capability to extract isight from such datasets, data-itesive computig is quickly emergig as a major activity of may orgaizatios. With massive amouts of data arisig from such diverse sources as telescope imagery, medical records, olie trasactio records, ad web pages, may researchers are discoverig that statistical models extracted from data collectios promise major advaces i sciece, health care, busiess efficiecies, ad iformatio access. Ideed, statistical approaches are quickly bypassig expertise-based approaches i terms of efficacy ad robustess. To assist programmers with data-itesive computig, ew programmig frameworks e.g., MapReduce [9], Hadoop [1] ad Dryad [13]) have bee developed. They provide abstractios for specifyig dataparallel computatios, ad they also provide eviromets for automatig the executio of data-parallel programs o large clusters of commodity machies. The map-reduce programmig model, i particular, has received a great deal of attetio, ad several implemetatios are publicly available [1, 20]. These frameworks ca scale jobs to thousads of computers, which is great. However, they curretly focus o scalability without cocer for efficiecy. Worse, aecdotal experieces idicate that they fall far short of fully utilizig hardware resources, effectively wastig large fractios of the computers over which jobs are scaled. If these iefficiecies are real, the same work could theoretically) be completed at much lower costs. A ideal approach would provide maximum scalability for a give computatio without wastig resources such as the CPU or disk. Give the widespread use ad scale of data-itesive computig, it is importat that we move toward such a ideal. A importat first step is uderstadig the degree, characteristics, ad causes of iefficiecy. Ufortuately, little help is curretly available. This paper begis to fill the void with a simple model of ideal map-reduce job rutimes ad the evaluatio of systems relative to it. The model s iput parameters describe basic characteristics of the job e.g., amout of iput data, degree of filterig i the map ad reduce phases), of the hardware e.g., per-ode disk ad etwork throughputs), ad of the framework cofiguratio e.g., replicatio factor). The output is the ideal job rutime. A ideal ru is hardware-efficiet, meaig that the realized throughput matches the maximum throughput for the bottleeck hardware resource, give its usage i.e., amout of data moved over it). Our model ca expose how close or far, curretly) a give system is from this ideal. Such throughput will ot occur, for example, if the framework does ot provide sufficiet parallelism to keep the bottleeck resource fully utilized, or it makes poor use of a particular resource e.g., iflatig etwork traffic). I additio, our model ca be used to quatify resources wasted due to imbalace i a ubalaced system, oe resource e.g., etwork, disk, or CPU) is uder-provisioed relative to others ad acts as a bottleeck. The other resources are wasted to the extet that they are over-provisioed ad active. To illustrate these issues, we applied the model to a umber of bechmark results e.g., for the TeraSort ad PetaSort bechmarks) touted i the idustry. These presumably well-tued systems achieve rutimes that are 3 13 loger tha the ideal model suggests should be possible. We also report o our ow experimets with Hadoop, cofirmig ad partially explaiig sources of iefficiecy. To cofirm that the model s ideal is achievable, we preset results from a efficiet parallel dataflow system called Parallel DataSeries PDS). PDS lacks may features of the other frameworks, but its careful egieerig ad stripped-dow feature-set demostrate that ear-ideal hardware-efficiecy withi 20%) is possible. I additio to validatig the model, PDS provides a iterestig foudatio for subsequet aalyses of the icremetal costs associated with features, such as distributed file system fuctioality, dyamic task distributio, fault tolerace, ad task replicatio. Data-parallel computatio is here to stay, as is scale-out performace. However, we hope that the low efficiecy idicated by our model is ot. By gaiig a better uderstadig of computatioal bottleecks, 1
5 Figure 1: A map-reduce dataflow. ad uderstadig the limits of what is achievable, we hope that our work will lead to improvemets i commoly used DISC frameworks. 2 Dataflow parallelism ad map-reduce computig Today s data-itesive computig derives much from earlier work o parallel databases. Broadly speakig, data is read from iput files, processed, ad stored i output files. The dataflow is orgaized as a pipelie i which the output of oe operator is the iput of the followig operator. DeWitt ad Gray [10] describe two forms of parallelism i such dataflow systems: partitioed parallelism ad pipelied parallelism. Partitioed parallelism is achieved by partitioig the data ad splittig oe operator ito may ruig o differet processors. Pipelied parallelism is achieved by streamig the output of oe operator ito the iput of aother, so that the two operators ca work i series o differet data at the same time. Google s MapReduce 1 [9] offers a simple programmig model that facilitates developmet of scalable parallel applicatios that process a vast amout of data. Programmers specify a map fuctio that geerates values ad associated keys from each iput data item ad a reduce fuctio that describes how all data matchig each key should be combied. The rutime system hadles details of schedulig, load balacig, ad error recovery. Hadoop [1] is a ope-source implemetatio of the map-reduce model. Figure 1 illustrates the pipelie of a map-reduce computatio ivolvig three odes computers). The computatio is divided ito two phases, labeled Phase 1 ad Phase 2. Phase 1: Phase 1 begis with the readig of the iput data from disk ad eds with the sort operator. It icludes the map operators ad the exchage of data over the etwork. The first write operator i Phase 1 stores the output of the map operator. This backup write operator is optioal, but used by default i the Google ad Hadoop implemetatios of map-reduce, servig to icrease the system s ability to cope with failures or other evets that may occur later. Phase 2: Phase 2 begis with the sort operator ad eds with the writig of the output data to disk. I systems that replicate data across multiple odes, such as the GFS [11] ad HDFS [3] distributed file systems used with MapReduce ad Hadoop, respectively, the output data must be set to all other odes that will store the data o their local disks. 1 We refer to the programmig model as map-reduce ad to Google s implemetatio as MapReduce. 2
6 Parallelism: I Figure 1, partitioed parallelism takes place o the vertical axis; the iput data is split betwee three odes, ad each operator is, i fact, split ito three sub-operators that each ru o a differet ode. Pipelied parallelism takes place o the horizotal axis; each operator withi a phase processes data uits e.g., records) as it receives them, rather tha waitig for them all to arrive, ad passes data uits to the ext operator as appropriate. The oly breaks i pipelied parallelism occur at the boudary betwee phases. As show, this boudary is the sort operator. The sort operator ca oly produce its first output record after it has received all of its iput records, sice the last iput record received might be the first i sorted order. Quatity of data flow: Figure 1 also illustrates how the amout of data flowig through the system chages throughout the computatio. The amout of iput data per ode is d i, ad the amout of output data per ode is d o. The amout of data per ode produced by the map operator ad cosumed by the reduce operator is d m. I most applicatios, the amout of data flowig through the system either remais the same or decreases i.e., d i d m d o ). I geeral, the mapper will implemet some form of select, filterig out rows, ad the reducer will perform aggregatio. This reductio i data across the stages ca play a key role i the overall performace of the computatio. Ideed, Google s MapReduce icludes combier fuctios to move some of the aggregatio work to the map operators ad, hece, reduce the amout of data ivolved i the etwork exchage [9]. May map-reduce workloads resemble a grep -like computatio, i which the map operator decreases the amout of data d i d m ad d m = d o ). I others, such as i a sort, either the map or the reduce fuctio decrease the amout of data d i = d m = d o ). 2.1 Related work Cocers about the performace of map-reduce style systems emerged from the parallel databases commuity, where similar data processig tasks have bee tackled by commercially available systems. I particular, Stoebraker et al. compare Hadoop to a variety of DBMSs ad fid that Hadoop ca be up to 36x slower tha a commercial parallel DBMS [25]. I previous work [5], two of the authors of our paper poited out that may parallel systems especially map-reduce systems, but also other parallel systems) have focused almost exclusively o absolute throughput ad high-ed scalability. This focus, as the authors quatify by back-of-the-evelope comparisos, has bee at the detrimet of other worthwhile metrics. I perhaps the most relevat prior work, Wag et al. use simulatio to evaluate how certai desig decisios e.g., etwork layout ad data locality) will effect the performace of Hadoop jobs [27]. Specifically, their MRPerf simulator istatiates fake jobs, which impose fixed times e.g., job startup) ad iput-size depedet times cycles/byte of compute) for the Hadoop parameters uder study. The fake jobs geerate etwork traffic simulated with s-2) ad disk I/O also simulated). Usig executio characteristics accurately measured from small istaces of Hadoop jobs, MRPerf accurately predicts to withi 5-12%) the performace of larger clusters. Although simulatio techiques like MRPerf are useful for explorig differet desigs, by relyig o measuremets of actual behavior e.g., of Hadoop) such simulatios will also emulate ay iefficiecies particular to the specific implemetatio simulated. 3 Performace model This sectio presets a model for the rutime of a map-reduce job o a hardware-efficiet system. It icludes the model s assumptios, parameters, ad equatios, alog with a descriptio of commo workloads. Assumptios: For a large class of data-itesive workloads, which we assume for our model, computatio time is egligible i compariso to I/O speeds. Amog others, this assumptio holds for grep- ad sort-like jobs, such as those described by Dea ad Ghemawat [9] as beig represetative of most MapReduce jobs at Google, but may ot hold i other settigs. For workloads fittig the assumptio, pipelied 3
7 d m < memory i-memory sort) d m memory exteral sort) Phase 1 Disk read iput): d i Disk read iput): d i Disk write backup): d m Disk write backup): d m etwork: 1 d m etwork: 1 d m Disk write sort): d m Phase 2 etwork: r 1)d o Disk read sort): d m etwork: r 1)d o Disk write output): rd o Disk write output): rd o Table 1: I/O operatios i a map-reduce job. The first disk write i Phase 1 is a optioal backup to protect agaist failures. parallelism ca allow o-i/o operatios to execute etirely i parallel with I/O operatios, such that overall throughput for each phase will be determied by the I/O resource etwork or storage) with the lowest effective throughput. For modelig purposes, we also do ot cosider specific etwork topologies or techologies, ad we assume that the etwork core is over-provisioed eough that the iteral etwork topology does ot impact the speeds of iter-ode data trasfers. From our experiece, ulimited backplae badwidth without ay performace degradatio is probably impractical, although it was ot a issue for our experimets ad we curretly have o evidece for it causig issues o the other large clusters which we aalyze i Sectio 8. The model assumes that iput data is evely distributed across all participatig odes i the cluster, that odes are homogeeous, ad that each ode retrieves its iitial iput from local storage. Most map-reduce systems are desiged to fit these assumptios. The model also accouts for output data replicatio, assumig the commo strategy of storig the first replica o the local disks ad sedig the others over the etwork to other odes. Fially, aother importat assumptio is that a sigle job has full access to the cluster at a time, with o competig jobs or other activities. Productio map-reduce clusters may be shared by more tha oe simultaeous job, but uderstadig a sigle job s performace is a useful startig poit. Derivig the model from I/O operatios: Table 1 idetifies the I/O operatios i each map-reduce phase for two variats of the sort operator. Whe the data fits i memory, a fast i-memory sort ca be used. Whe it does ot fit, a exteral sort is used, which ivolves sortig each batch of data i memory, writig it out to disk, ad the readig ad mergig the sorted batches ito oe sorted stream. The 1 d m term appears i the equatio, where is the umber of odes, because i a well-balaced system each ode partitios ad trasfers that fractio of its mapped data over the etwork, keepig 1 of the data for itself. Table 2 lists the I/O speed ad workload property parameters of the model. They iclude amouts of data flowig through the system, which ca be expressed either i absolute terms d i, d m, ad d o ) or i terms of the ratios of the map ad reduce operators output ad iput e M ad e R, respectively). Table 3 gives the model equatios for the executio time of a map-reduce job i each of four scearios, represetig the cross-product of the Phase 1 backup write optio yes or o) ad the sort type i-memory or exteral). I each case, the per-byte time to complete each phase map ad reduce) is determied, summed, ad multiplied by the umber of iput bytes per ode i ). The per-byte value for each phase is the larger max) of that phase s per-byte disk time ad per-byte etwork time. Usig the last row exteral sort, with backup write) as a example, the map phase icludes three disk trasfers ad oe etwork trasfer: readig, writig the e M map output bytes to disk the backup write; e M Dw ), writig e M bytes as ) part of the exteral sort em Dw, ad sedig 1 1 ) of the e M map output bytes over the etwork e M to other reduce odes. The correspodig reduce phase icludes two disk trasfers ad oe etwork trasfer: each iput byte 1 D r ) 4
8 Symbol D w D r Defiitio The umber of odes i the cluster. The aggregate disk write throughput of a sigle ode. A ode with four disks, where each disk provides 65 MB/s writes, would have D = 260 MB/s. The aggregate disk read throughput of a sigle ode. The etwork throughput of a sigle ode. r The replicatio factor used for the job s output data. If o replicatio is used, r = 1. i ) d i = i d m = i e M ) d o = i e M e R ) e M = d m di ) e R = d o d m ) The total amout of iput data for a give computatio. The amout of iput data per ode, for a give computatio. The amout of data per ode after the map operator, for a give computatio. The amout of output data per ode, for a give computatio. The ratio betwee the map operator s output ad its iput. The ratio betwee the reduce operator s output ad its iput. Table 2: Modelig parameters that iclude I/O speeds ad workload properties. Without backup write With backup write Without backup write With backup write d m < memory i-memory sort) { } i max 1, 1 e M Dr + max{ rem e R D } w max{ 1 D + e M e M r i Dw, 1 + max{ rem e R D w d m memory exteral sort) } i max{ 1 D + e M r Dw, 1 e M + max{ em } i max{ 1 D r + 2e M D w, 1 e M + max{ em }), e Me R r 1), e Me R r 1) Dr + re Me R D w D r + re Me R D w }) }), e Me R r 1) }), e Me R r 1) Table 3: Model equatios for the executio time of a map-reduce computatio o a parallel dataflow system. readig sorted batches em bytes replicated from other odes odes em e R r 1) ) em e R D w ad r 1)e M e R D r ), writig e M e R reduce output bytes produced locally r 1)eM e R D w ), ad sedig e M e R bytes produced locally to r 1) other ). Puttig all of this together produces the equatio show. Applyig the model to commo workloads: May workloads beefit from a parallel dataflow system because they ru o massive datasets, either extractig ad processig a small amout of iterestig data or shufflig data from oe represetatio to aother. We focus o parallel sort ad grep i aalyzig systems ad validatig our model, which Dea ad Ghemawat [9] idicate are represetative of most programs writte by users of Google s MapReduce. For a grep-like job that selects a very small fractio of the iput data, e M 0 ad e R = 1, meaig that oly a egligible amout of data is optioally) writte to the backup files, set over the etwork, ad writte to the output files. Thus, the best-case rutime is determied by the iitial iput disk reads: tgrep = i D r 1) A sort workload maitais the same amout of data i both the map ad reduce phases, so e M = e R = 1. If the amout of data per ode is small eough to accommodate a i-memory sort ad ot warrat a Phase 1 5
9 backup, the top equatio of Table 3 is used, simplifyig to: tsort = i { 1 max, 1 D r } + max { r, r 1 }) D w Determiig iput parameters for the model: Appropriate parameter values are a crucial aspect of model accuracy, whether usig the model to evaluate how well a productio system is performig or to determie what should be expected from a hypothetical system. The ad r parameters are system cofiguratio choices that ca be applied directly i the model for both productio ad hypothetical systems. The amout of data flowig through various operators d i, d m, or d o ) deped upo the characteristics of the map ad reduce operators ad of the data itself. For a productio system, they ca be measured ad the plugged ito a model that evaluates the performace of a give workload ru o that system. For a hypothetical system, or if actual system measuremets are ot available, some estimates must be used, such as d i = d m = d o for sort or d m = d o = 0 for grep. The determiatio of which equatio to use, based o the backup write optio ad sort type choices, is also largely depedet o the workload characteristics, but i combiatio with system characteristics. Specifically, the sort type choice depeds o the relatioship betwee d m ad the amout of mai memory available for the sort operator. The backup write optio is a softer choice, worthy of further study, ivolvig the time to do a backup write dm D w ), the total executio time of the job, ad the likelihood of a ode failure durig the job s executio. Both Hadoop ad Google s MapReduce always do the backup write, at least to the local file system cache. The appropriate values for I/O speed deped o what is beig evaluated. For both productio ad hypothetical systems, specificatio values for the hardware ca be used for example, 1 Gbps for the etwork ad the maximum streamig badwidth specified for the give disks). This approach is appropriate for evaluatig the efficiecy of the etire software stack, from the operatig system up. However, if the focus is o the programmig framework, usig raw hardware specificatios ca idicate greater iefficiecy tha is actually preset. I particular, some efficiecy is geerally lost i the uderlyig operatig system s coversio of raw disk ad etwork resources ito higher level abstractios, such as file systems ad etwork sockets. To focus attetio o programmig framework iefficiecies, oe should use measuremets of the disk ad etwork badwidths available to applicatios usig the abstractios. As show i our experimets, such measured values are lower tha specified values ad ofte have o-trivial characteristics, such as depedece o file system age or etwork commuicatio patters. 4 Existig data-itesive computig systems are far from optimal Our model idicates that, though they may scale beautifully, popular data-itesive computig systems leave a lot to be desired i terms of efficiecy. Figure 2 compares optimal times, as predicted by the model, to reported measuremets of a few bechmark ladmarks touted i the literature, presumably o well-tued istaces of the programmig frameworks utilized. These results idicate that far more machies ad disks are ofte employed tha would be eeded if the systems were hardware-efficiet. The remaider of this sectio describes the systems ad bechmarks represeted i Figure 2. Hadoop TeraSort: I April 2009, Hadoop set a ew record [18] for sortig 1 TB of data i the Sort Bechmark [17] format. The setup had the followig parameters: i = 1 TB, r = 1, = 1460, D = 4 disks 65 MB/s/disk = 260 MB/s, = 110 MB/s, d m = i/ = 685 MB. With oly 685 MB per ode, the data ca be sorted by the idividual odes i memory. A phase 1 backup write is ot eeded, give the short rutime. Equatio 2 gives a best-case rutime of 8.86 secods. After fie-tuig the system for this specific bechmark, Yahoo! achieved 62 secods 7 slower. A optimal system usig the same hardware would achieve better throughput with 209 odes istead of 1460). 2) 6
10 Figure 2: Published bechmarks of popular parallel dataflow systems. Each bar represets the reported throughput relative to the ideal throughput idicated by our performace model, parameterized accordig to a cluster s hardware. MapReduce TeraSort: I ovember 2008, Google reported TeraSort results for 1000 odes with 12 disks per ode [8]. The followig parameters were used: i = 1 TB, r = 1, = 1000, D = = 780 MB/s, = 110 MB/s, d m = i/ = 1000 MB. Equatio 2 gives a best-case rutime of 10.4 secods. Google achieved 68 secods over 6 slower. A optimal system usig the same hardware would achieve better throughput with 153 odes istead of 1000). MapReduce PetaSort: Google s PetaSort experimet [8] is similar to TeraSort, with three differeces: 1) a exteral sort is required with a larger amout of data per ode d m = 250 GB), 2) output was stored o GFS with three-way replicatio, 3) a Phase 1 backup write is justified by the loger rutimes. I fact, Google ra the experimet multiple times, ad at least oe disk failed durig each executio. The setup is described as follows: i = 1 PB, r = 3, = 4000, D = = 780 MB/s, = 110 MB/s, d m = i/ = 250 GB. The bottom cell of Table 3 gives a best-case rutime of 6818 secods. Google achieved 21,720 secods approximately 3.2 slower. A optimal system usig the same hardware would achieve better throughput with 1256 odes istead of 4000). Also, accordig to our model, for the purpose of sort-like computatios, Google s odes are over-provisioed with disks. I a optimal system, the etwork would be the bottleeck eve if each ode had oly 6 disks istead of 12. Hadoop PetaSort: Yahoo! s PetaSort experimet [18] is similar to Google s, with oe differece: The output was stored o HDFS with two-way replicatio. The setup is described as follows: i = 1 PB, r = 2, = 3658, D = 4 65 = 260 MB/s, = 110 MB/s, d m = i/ = 273 GB. The bottom cell of Table 3 gives a best-case rutime of 6308 secods. Yahoo! achieved 58,500 secods about 9.3 slower. A optimal system usig the same hardware would achieve better throughput with 400 odes istead of 3658). MapReduce Grep: The origial MapReduce paper [9] described a distributed grep computatio that was executed o MapReduce. The setup is described as follows: i = 1 TB, = 1800, D = 2 40 = 80 MB/s, = 110 MB/s, d m = 9.2 MB, e M = 9.2/ , e R = 1. The paper does ot specify the throughput of the disks, so we used 40 MB/s, coservatively estimated based o disks of the timeframe 2004). Equatio 1 gives a best-case rutime of 6.94 secods. Google achieved 150 secods icludig startup overhead, or 90 secods without that overhead still about 13 slower. A optimal system usig the same hardware would achieve better throughput with 139 odes istead of 1800). The 60-secod startup time experieced by MapReduce o a cluster of 1800 odes would also have bee much shorter o a cluster of 139 odes. 7
11 5 Explorig the efficiecy of data-itesive computig The model idicates that there is substatial iefficiecy i popular data-itesive computig systems. The remaider of the paper reports ad aalyzes results of experimets explorig such iefficiecy. This sectio describes our cluster ad quatifies efficiecy lost to OS fuctioality. Sectio 6 cofirms the Hadoop iefficiecy idicated i the bechmark aalyses, ad Sectio 7 uses a stripped-dow framework to validate that the model s optimal rutimes ca be approached. Sectio 8 discusses these results ad ties together our observatios of the sources of iefficiecy with opportuities for future work i this area. Experimetal cluster: Our experimets used 1 25 odes of a cluster. Each ode is cofigured with two quad-core Itel Xeo E5430 processors, four 1 TB Seagate Barracuda ES.2 SATA drives, 16 GB of RAM, ad a Gigabit Etheret lik to a Force10 switch. The I/O speeds idicated by the hardware specificatios are = 1 Gbps ad D r = D w = 108 MB/s for the outer-most disk zoe). All machies ru the Liux Xe kerel, but oe of our experimets were ru i virtual machies they were all ru directly o domai zero. The kerel s default TCP implemetatio TCP ewreo usig up to 1500 byte packets) was used. Except where otherwise oted, the XFS file system was used to maage a sigle oe of the disks for every ode i our experimets. Disk badwidth for applicatios: For sufficietly large or sequetial disk trasfers, seek times have a egligible effect o performace; raw disk badwidth approaches the maximum trasfer rate to/from the disk media, which is dictated by the disk s rotatio speed ad data-per-track values [21]. For moder disks, sufficietly large is o the order of 8 MB [26]. Most applicatios do ot access the raw disk, istead accessig the disk idirectly via a file system. Usig the raw disk, we observe 108 MB/s, which is i lie with the specificatios for our disks. early the same badwidth withi 1%) ca be achieved for large sequetial file reads oext3 ad XFS file systems. For writes, our measuremets idicate more iterestig behavior. Usig the dd utility with the syc optio, a 64 MB block size, ad iput from the /dev/zero pseudo-device, we observe steady-state write badwidths of 84 MB/s ad 102 MB/s, respectively. Whe writig a amout of data less tha or close to the file system cache size, the reported badwidth is up to aother 10% lower, sice the file system does ot start writig the data to disk immediately; that is, disk writig is ot occurrig durig the early portio of the utility rutime. This differece betwee read ad write badwidths causes us to use two values D r ad D w ) i the model; our origial model used oe value for both. The differece is ot due to the uderlyig disks, which have the same media trasfer rate for both reads ad writes. Rather, it is caused by file system decisios regardig coalescig ad orderig of write-backs, icludig the eed to update metadata. XFS ad ext3 both maitai a write-ahead log for data cosistecy, which also iduces some overhead o ew data writes. ext3 s relatively higher write pealty is likely caused by its block allocator, which allocates oe 4 KB block at a time, i cotrast to XFS s variable-legth extet-based allocator. 2 The 108 MB/s value, ad the dd measuremets discussed above, are for the first disk zoe. Moder disks have multiple zoes, each with a differet data-per-track value ad, thus, media trasfer rate [22]. Whe measurig a XFS filesystem o a partitio coverig the etire disk, read speeds remaied cosistet at 108 MB/s, but write speeds fluctuated across a rage of MB/s with a average of 97 MB/s over 10 rus. I reportig optimal values for experimets with our cluster, we use 108 MB/s ad 97 MB/s for the disk read ad write speeds, respectively. etwork badwidth for applicatios: Although a full-duplex 1 Gbps Etheret lik could theoretically trasfer 125 MB/s i each directio, maximum achievable data trasfer badwidths are lower due to uavoidable protocol overheads. Usig the iperf tool with the maximum kerel-allowed 256 KB TCP widow size, we measured sustaied badwidths betwee two machies of approximately MB/s, which is 2 To address some of these shortcomigs, theext4 file system improves the desig ad performace ofext3 by addig, amog other thigs, multi-block allocatios [16]. 8
12 i lie with expected best-case data badwidth. However, we observed lower badwidths with more odes i the all-to-all patter used i map-reduce jobs. For example, i a 5 16 ode all-to-all etwork trasfer, we observed MB/s aggregate ode-to-ode badwidths over ay oe lik. These lower values are caused by ewreo s kow slow covergece o usig full lik badwidths o high-speed etworks [14]. Such badwidth reductios uder some commuicatio patters may make the use of a sigle etwork badwidth ) iappropriate for some eviromets. For evaluatig data-itesive computig o our cluster, we use a coservative value of = 110 MB/s. We also ra experimets usig the ewer CUBIC [12] cogestio cotrol algorithm, which is the default o Liux ad is tued to support high-badwidth liks. It achieved higher throughput up to 115 MB/s per ode with 10 odes), but exhibited sigificat ufairess betwee flows, yieldig skews i completio times of up to 86% of the total time. CUBIC s ufairess ad stability issues are kow ad are promptig cotiuig research toward better algorithms [14]. 6 Experieces with Hadoop We experimeted with Hadoop o our cluster to cofirm ad better uderstad the iefficiecy exposed by our aalysis of reported bechmark results. Tuig Hadoop s settigs: Default Hadoop settigs fail to use most odes i a cluster, usig oly two total) map tasks ad oe reduce task. Eve icreasig those values to use four map ad reduce tasks per ode, a better umber for our cluster, with o replicatio, still results i lower-tha-expected performace. We improved the Hadoop sort performace by a additioal 2 by adjustig a umber of cofiguratio settigs as suggested by Hadoop cluster setup documetatio ad other sources [2, 24, 19]. Table 4 describes our chages, which iclude reducig the replicatio level, icreasig block sizes, icreasig the umbers of map ad reduce tasks per ode, ad icreasig heap ad buffer sizes. Iterestigly, we foud that speculative executio did ot improve performace for our cluster. Occasioal map task failures ad laggig odes ca ad do occur, especially whe ruig over more odes. However, they are less commo for our smaller cluster size oe ameode ad 1 25 slave odes), ad surprisigly they had little effect o the overall performace whe they did occur. Whe usig speculative executio, it is geerally advised to set the umber of total reduce tasks to 95 99% of the cluster s reduce capacity to allow for a ode to fail ad still fiish executio i a sigle wave. Sice failures are less of a issue for our experimets, we optimized for the failure-free case ad chose eough Map ad Reduce tasks for each job to fill every machie at 100% capacity. Sort measuremets ad compariso to the model: Figure 3 shows sort results for differet umbers of odes usig our tued Hadoop cofiguratio. Each measuremet sorts 4 GB of data per ode up to 100 GB total over 25 odes). Radom 100 byte iput records were geerated with the TeraGe program, spread across active odes via HDFS, ad sorted with the stadard TeraSort Hadoop program. Before every sort, the buffer cache was flushed withsyc) to prevet previously cached writes from iterferig with the measuremet. Additioally, the buffer cache was dropped from the kerel to force disk read operatios for the iput data. The sorted output is writte to the file system, but ot syced to disk before completio is reported; thus, the reported results are a coservative reflectio of actual Hadoop sort executio times. The results cofirm that Hadoop scales well, sice the average rutime oly icreases 6% 14 secods) from 1 ode up to 25 odes as the workload icreases i proportio). For compariso, we also iclude the optimal sort times i Figure 3, calculated from our performace model. The model s optimal values reveal a large costat iefficiecy for the tued Hadoop setup each sort requires 3 the optimal rutime to complete, eve without sycig the output data to disk. The 6% higher total rutime at 25 odes is due to skew i the completio times of the odes this is the source of the 9% additioal iefficiecy at 25 odes. The iefficiecy due to OS abstractios is 9
13 Hadoop Settig Default Tued Effect Replicatio level 3 1 The replicatio level was set to 1 to avoid extra disk writes. HDFS block size 64 MB 128 MB Larger block sizes i HDFS make large file reads ad writes faster, amortizig the overhead for startig each map task. Speculative exec. true f alse Failures are ucommo o small clusters, avoid extra work. Maximum map tasks per ode Maximum reduce tasks per ode 2 4 Our odes ca hadle more map tasks i parallel. 1 4 Our odes ca hadle more reduce tasks i parallel. Map tasks 2 4 For a cluster of odes, maximize the map tasks per ode. Reduce tasks 1 4 For a cluster of odes, maximize the reduce tasks per ode. Java VM heap size 200 MB 1 GB Icrease the Java VM heap size for each child task. Daemo heap size 1 GB 2 GB Icrease the heap size for Hadoop daemos. Sort buffer memory 100 MB 600 MB Use more buffer memory whe sortig files. Sort streams factor Merge more streams at oce whe sortig files. Table 4: Hadoop cofiguratio settigs used i our experimets. already accouted for, as discussed i Sectio 5. Oe potetial explaatio for part of the iefficiecy is that Hadoop uses a backup write for the map output, eve though the rutimes are short eough to make it of questioable merit. As show by the dotted lie i Figure 3a, usig the model equatio with a backup write would yield a optimal rutime that is 39 secods loger. This would explai approximately 25% of the iefficiecy. However, as with the sort output, the backup write is set to the file system but ot syced to disk with 4 GB of map output per ode ad 16 GB of memory per ode, most of the backup write data may ot actually be writte to disk durig the map phase. It is uclear what fractio of the potetial 25% is actually explaied by Hadoop s use of a backup write. Aother possible source of iefficiecy could be ubalaced distributio of the iput data or the reduce data. However, we foud that the iput data is spread almost evely across the cluster. Also, the differece betwee the ideal split of data ad what is actually set to each reduce ode is less tha 3%. Therefore, the radom iput geeratio alog with TeraSort s samplig ad splittig algorithms is partitioig work evely, ad the workload distributio is ot to blame for the loss of efficiecy. Aother potetial source of iefficiecy could be poor schedulig ad task assigmet by Hadoop. However, Hadoop actually did a good job at schedulig map tasks to ru o the odes that store the data, allowig local disk access rather tha etwork trasfers) for over 95% of the iput data. The fact that this value was below 100% is due to skew of completio times where some odes fiish processig their local tasks a little faster tha others, ad take over some of the load from the slower odes. We do ot yet have a full explaatio for Hadoop s iefficiecy. Although we have ot bee able to verify i the complex Hadoop code, some of the iefficiecy appears to be caused by isufficietly pipelied parallelism betwee operators, causig serializatio of activities e.g., iput read, CPU processig, ad etwork write) that should ideally proceed i parallel. Part of the iefficiecy is commoly attributed to CPU overhead iduced by Hadoop s Java-based implemetatio. Of course, Hadoop may also ot be usig I/O resources at full efficiecy. More diagosig of Hadoop s iefficiecy is a topic for cotiuig research. 10
14 Actual Optimal Time s) Time s) Hadoop Optimal with backup write Optimal odes a) Scalig a Hadoop sort bechmark up to 25 odes Map Reduce odes b) Time breakdow ito phases. Figure 3: Measured ad optimal sort rutimes for a tued Hadoop cluster. Performace is about 3 times slower tha optimal, ad 2 times slower tha a optimal sort that icludes a extra backup write for the map output, which is curretly Hadoop s behavior. Hadoop scales well with 4 GB per ode up to 25 odes, but it is iefficiet. The measured rutime, optimal calculatio, ad optimal with backup write calculatio are show i a). The breakdow of rutime ito map ad reduce phases is show i b). 7 Verifyig the model with Parallel DataSeries The Hadoop results above clearly diverge from the predicted optimal. The large extet to which they diverge, however, brigs the accuracy of the model ito questio. To validate our model, we preset Parallel DataSeries PDS), a data aalysis tool that attempts to closely approach the maximum possible throughput. PDS Desig: Parallel DataSeries builds o DataSeries, a efficiet ad flexible data format ad rutime library optimized for aalyzig structured data [4]. DataSeries files are stored as a sequece of extets, where each extet is a series of records. The records themselves are typed, followig a schema defied for each extet. Data is aalyzed at the record level, but I/O is performed at the much larger extet level. DataSeries supports passig records i a pipelie fashio through a series of modules. PDS exteds DataSeries with modules that support parallelism over multiple cores itra-ode parallelism) ad multiple odes iter-ode parallelism), to support parallel flows across modules as depicted i Figure 4. Sort evaluatio: We built a parallel sort module i PDS that implemets a dataflow patter similar to map-reduce. I Phase 1, data is partitioed ad shuffled across the etwork. As soo as a ode receives all data from the shuffle, it exits Phase 1 ad begis Phase 2 with a local sort. To geerate iput data for experimets, we used Gesort, which is the sort bechmark [17] iput geerator o which TeraGe is based. The Gesort iput set is separated ito partitios, oe for each ode. PDS does t curretly utilize a distributed filesystem, so we maually partitio the iput, with 40 millio records 4 GB) at each ode. We coverted the GeSort data to DataSeries format without compressio, which expads the iput by 4%. We measured PDS to see how closely it performed to the optimal predicted performace o the same cluster used for the Hadoop experimets. Figure 5 presets the equivalet sort task as ru for Hadoop. We repeated all experimets 10 times, startig from a cold cache ad sycig all data to disk before termiatig the measuremet. As with the earlier Hadoop measuremets, time is broke dow ito each phase. Furthermore, average per-ode times are icluded for the actual sort, as well as a stragglers category that represets the average wait time of a ode from the time it completes all its work util the the last ode ivolved i the 11
15 [Module] Output Module Source Module [Module] [Module] Output Module [Module] Output Module Figure 4: Parallel DataSeries is a carefully-tued parallel rutime library for structured data aalysis. Icomig data is queued ad passed i a pipelie through a umber of modules i parallel. parallel sort also fiishes. PDS performed well at 12-24% of optimal. About 4% of that is the aforemetioed iput expasio. The sort time takes a little over 2 secods, which accouts for aother 3% of the overhead. Much of this CPU could be overlapped with IO PDS does t curretly), ad it is sufficietly small to justify excludig CPU time from the model. These two factors explai most of the 12% overhead of the sigle ode case, leavig a small amout of atural coordiatio ad rutime overhead i the framework. As the parallel sort is scaled to 25 odes, besides the additioal coordiatio overhead from code structures that eable partitioig ad parallelism, the remaiig divergece ca be mostly explaied by two factors: 1) straggler odes, ad 2) etwork slowdow effects from may competig trasfers. Stragglers broke out i Figure 5b) ca be the result of geerally slow i.e., bad ) odes, skew i etwork trasfers, or variace i disk write times. The up to 5% observed straggler overhead is reasoable. The etwork slowdow effects were idetified i Sectio 5 usigiperf measuremets, ad are mostly resposible for the slight time icrease startig aroud 4 odes. However, eve if the effective etwork goodput speeds were 100 MB/s istead of the 110 MB/s used with the model, that would elimiate oly 4% of the additioal overhead for our PDS results compared to the predicted optimal time. As more odes are added at scale, the straggler effects ad etwork slowdows become more proouced. Whe we origially ra these experimets ad ispected the results of the 25 ode case, we oticed that 6 of the odes cosistetly fiished later ad were processig about 10% more work tha the other 19. It tured out that our data partitioer was usig oly the first byte of the key to split up the space ito 256 bis, so it partitioed the data uevely for clusters that were ot a power of 2. After desigig a fairer partitioer that used more bytes of the key, ad applyig it to the 25 ode parallel sort, we were able to brig dow the overhead from 30% to 24%. To see how both the model ad PDS react to the etwork as a bottleeck, we cofigured our etwork switches to egotiate 100 Mbps Etheret. Just as the 1 term i the model predicts icreasigly loger sort times which coverge i scale as more odes participate, Figure 6 demostrates that our actual results with PDS match up very well to that patter. The PDS sort results vary betwee 12-27% slower tha optimal. For clusters of size 16 ad 25, 5% of the time is spet waitig for stragglers. The slow speed of the etwork amplifies the effects of skew; we observed a few odes fiishig their secod phase before the most delayed odes had received all of their data from the first phase. 8 Discussio The experimets with PDS demostrate that our model is ot wildly optimistic it is possible to get close to the optimal rutime. Thus, the iefficiecies idicated for our Hadoop cluster ad the published bechmark 12
16 Time s) PDS with ubalaced partitioer PDS with balaced partitioer Optimal odes a) Scalig a PDS sort bechmark up to 25 odes. Time s) Actual Optimal Phase 1 Sort Phase 2 Stragglers odes b) Time breakdow. Figure 5: Usig Parallel DataSeries to sort up to 100 GB, it is possible to approach withi 12-24% of the optimal sort times as predicted by our performace model. PDS scales well for a i-memory sort with 4 GB per ode up to 25 odes i a), although there is a small time icrease startig aroud 4 odes due to etwork effects. Also show for the 25 ode case is the performace of our older, ubalaced partitioer, which had a additioal 6% performace overhead from optimal. A breakdow of time i b) shows that the time icreases at scale are mostly i the first phase of a map-reduce dataflow, which icludes the etwork data shuffle, ad i the time odes sped waitig for stragglers due to effects of skew. results are real. We do ot have complete explaatios for the 3 13 loger rutimes for curret dataitesive computig frameworks, but we have idetified a umber of cotributors. Oe class of iefficiecies comes from duplicatio of work or uecessary use of a bottleeck resource. For example, Hadoop ad Google s MapReduce always write phase 1 map output to the file system, whether or ot a backup write is warrated, ad the read it from the file system whe sedig it to the reducer ode. This file system activity, which may traslate ito disk I/O, is uecessary for completig the job ad iappropriate for shorter jobs. Oe sigificat effect faced by map-reduce systems is that a job oly completes whe the last ode fiishes its work. For our cluster, we aalyzed the pealty iduced by such stragglers, fidig that it grows to 4% of the rutime for Hadoop over 25 odes. Thus, it is ot the source of most of the iefficiecy at that scale. For much larger scale systems, such as the ode systems used for the bechmark results, this straggler effect is expected to be much more sigificat it is possible that this effect explais much of the differece betwee our measured 3 higher-tha-optimal rutimes ad the published 6 higher-thaoptimal rutime of the Hadoop record-settig TeraSort bechmark. The straggler effect is also why Google s MapReduce ad Hadoop dyamically distribute map ad reduce tasks amog odes. Support for speculative executio also ca help mitigate this effect, although fault tolerace is its primary value. If the straggler effect really is the cause of poor ed-to-ed performace at scale, the it motivates chages to these ew data-parallel systems to examie ad adapt the load balacig techiques used i works like River [6] or Flux [23]. It is temptig to blame lack of sufficiet bisectio badwidth i the etwork topology for much of the iefficiecy at scale. This would exhibit itself as over-estimatio of each ode s true etwork badwidth, assumig uiform commuicatio patters, sice the model does ot accout for such a bottleeck. However, this is ot a issue for the measured Hadoop results o our small-scale cluster because all odes are attached 13
17 Time s) Parallel DataSeries Optimal odes a) Scalig a PDS sort bechmark up to 25 odes. Time s) Actual Optimal Phase 1 Sort Phase 2 Stragglers odes b) Time breakdow ito phases. Figure 6: With 100 Mbps Etheret as the bottleeck resource, a 100 GB sort bechmark o Parallel DataSeries matches up well with the model s predictio ad stays withi 12-27% of optimal. As more data is set over the etwork with larger cluster sizes i a), both the model ad PDS predict loger sort times that evetually coverge. A breakdow of time i b) shows that the predicted ad actual time icreases occur durig the first map-reduce phase, which icludes the etwork data shuffle. across two switches with sufficiet backplae badwidth. The etwork topology was ot disclosed for most of the published bechmarks, but for may we do t believe bisectio badwidth was a issue. For example, MapReduce grep ivolves miimal data exchage because e M 0. Also, for Hadoop PetaSort, Yahoo! used 91 racks, each with 40 odes, oe switch, ad a 8 Gbps coectio to a core switch via 8 truked 1 Gbps Etheret liks). For this experimet, the average badwidth per ode was 4.7 MB/s. Thus, the average badwidth per uplik was oly 1.48 Gb/s i each directio, well below 8 Gbps. Other bechmarks may have ivolved a bisectio badwidth limitatio, but such a imbalace would have meat that far more machies were used per rack ad overall) tha were appropriate for the job, resultig i sigificat wasted resources. aturally, deep istrumetatio ad aalysis of Hadoop will provide more isight ito its iefficiecy. Also, PDS i particular provides a promisig startig poit for uderstadig the sources of iefficiecy. For example, replacig the curret maual data distributio with a distributed file system is ecessary for ay useful system. Addig that feature to PDS, which is kow to be efficiet, would allow oe to quatify its icremetal cost. The same approach ca be take with other features, such as dyamic task distributio ad fault tolerace. 9 Coclusio Data-itesive computig is a icreasigly popular style of computig that is beig served by scalable, but iefficiet, systems. A simple model of optimal map-reduce job rutimes shows that popular mapreduce systems take 3 13 loger to execute jobs tha their hardware resources should allow. With Parallel DataSeries, our simplified dataflow processig tool, we demostrated that the model s rutimes ca be approached, validatig the model ad cofirmig the iefficiecy of Hadoop ad Google s MapReduce. Our model ad results highlight ad begi to explai the iefficiecy of existig systems, providig isight ito areas for cotiued improvemets. 14
18 Refereces [1] Apache Hadoop, [2] Hadoop Cluster Setup Documetatio, setup.html. [3] HDFS, desig.html. [4] Eric Aderso, Marti Arlitt, Charles B. Morrey, III, ad Alistair Veitch, Dataseries: a efficiet, flexible data format for structured serial data, SIGOPS Oper. Syst. Rev ), o. 1, [5] Eric Aderso ad Joseph Tucek, Efficiecy Matters!, HotStorage 09: Proceedigs of the Workshop o Hot Topics i Storage ad File Systems 2009). [6] Remzi H. Arpaci-Dusseau, Eric Aderso, oah Treuhaft, David E. Culler, Joseph M. Hellerstei, David Patterso, ad Kathy Yelick, Cluster I/O with River: makig the fast case commo, IOPADS 99: Proceedigs of the sixth workshop o I/O i parallel ad distributed systems ew York, Y, USA), ACM, 1999, pp [7] Radal E. Bryat, Data-Itesive Supercomputig: The case for DISC, Tech. report, Caregie Mello Uiversity, [8] Grzegorz Czajkowski, Sortig 1PB with MapReduce, October 2008, [9] Jeffrey Dea ad Sajay Ghemawat, MapReduce: simplified data processig o large clusters, Commuicatios of the ACM ), o. 1, [10] David DeWitt ad Jim Gray, Parallel database systems: the future of high performace database systems, Commu. ACM ), o. 6, [11] Sajay Ghemawat, Howard Gobioff, ad Shu-Tak Leug, The Google file system, SOSP 03: Proceedigs of the ieteeth ACM symposium o Operatig systems priciples ew York, Y, USA), ACM, 2003, pp [12] Sagtae Ha, Ijog Rhee, ad Lisog Xu, CUBIC: A ew TCP-friedly high-speed TCP variat, SIGOPS Oper. Syst. Rev ), o. 5, [13] Michael Isard, Mihai Budiu, Yua Yu, Adrew Birrell, ad Deis Fetterly, Dryad: distributed data-parallel programs from sequetial buildig blocks, EuroSys 07: Proceedigs of the 2d ACM SIGOPS/EuroSys Europea Coferece o Computer Systems 2007 ew York, Y, USA), ACM, 2007, pp [14] Vishu Koda ad Jaslee Kaur, RAPID: Shrikig the Cogestio-cotrol Timescale, IFOCOM 09, April 2009, pp [15] Michael A. Kozuch, Michael P. Rya, Richard Gass, Steve W. Schlosser, James Cipar, Elie Krevat, Michael Stroucke, Julio Lpez, ad Gregory R. Gager, Tashi: Locatio-aware Cluster Maagemet, ACDC 09: First Workshop o Automated Cotrol for Dataceters ad Clouds, Jue [16] Aeesh Kumar K.V, Migmig Cao, Jose R. Satos, ad Adreas Dilger, Ext4 block ad iode allocator improvemets, Proceedigs of the Liux Symposium 2008), [17] Chris yberg ad Mehul Shah, Sort Bechmark, 15
19 [18] Owe O Malley ad Aru C. Murthy, Wiig a 60 Secod Dash with a Yellow Elephat, April 2009, [19] Itel White Paper, Optimizig Hadoop Deploymets, October [20] Colby Rager, Ramaa Raghurama, Aru Pemetsa, Gary Bradski, ad Christos Kozyrakis, Evaluatig MapReduce for Multi-core ad Multiprocessor Systems, HPCA 07: Proceedigs of the 2007 IEEE 13th Iteratioal Symposium o High Performace Computer Architecture Washigto, DC, USA), IEEE Computer Society, 2007, pp [21] Chris Ruemmler ad Joh Wilkes, A itroductio to disk drive modelig, IEEE Computer ), [22] Jiri Schidler, Joh Liwood Griffi, Christopher R. Lumb, ad Gregory R. Gager, Track-aliged Extets: Matchig Access Patters to Disk Drive Characteristics, I proceedigs of the 1st USEIX Symposium o File ad Storage Techologies, 2002, pp [23] Mehul A. Shah, Joseph M. Hellerstei, Sirish Chadrasekara, ad Michael J. Frakli, Flux: A Adaptive Partitioig Operator for Cotiuous Query Systems, Iteratioal Coferece o Data Egieerig ), 25. [24] Sajay Sharma, Advaced Hadoop Tuig ad Optimisatio, December 2009, [25] Michael Stoebraker, Daiel Abadi, David J. DeWitt, Sam Madde, Erik Paulso, Adrew Pavlo, ad Alexader Rasi, MapReduce ad Parallel DBMSs: Frieds or Foes?, CACM 2010). [26] Matthew Wachs, Michael Abd-El-Malek, Eo Thereska, ad Gregory R. Gager, Argo: Performace isulatio for shared storage servers, I Proceedigs of the 5th USEIX Coferece o File ad Storage Techologies., USEIX Associatio, 2007, pp [27] Guayig Wag, Ali R. Butt, Prashat Padey, ad Kara Gupta, A Simulatio Approach to Evaluatig Desig Decisios i MapReduce Setups, 17th IEEE/ACM MASCOTS, September
Domain 1: Designing a SQL Server Instance and a Database Solution
Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a
(VCP-310) 1-800-418-6789
Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.
Analyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.
Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.
Modified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
CHAPTER 3 THE TIME VALUE OF MONEY
CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all
LECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
Engineering Data Management
BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package
.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth
Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,
I. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
CS100: Introduction to Computer Science
Review: History of Computers CS100: Itroductio to Computer Sciece Maiframes Miicomputers Lecture 2: Data Storage -- Bits, their storage ad mai memory Persoal Computers & Workstatios Review: The Role of
CCH Accountants Starter Pack
CCH Accoutats Starter Pack We may be a bit smaller, but fudametally we re o differet to ay other accoutig practice. Util ow, smaller firms have faced a stark choice: Buy cheaply, kowig that the practice
Quantitative Computer Architecture
Performace Measuremet ad Aalysis i Computer Quatitative Computer Measuremet Model Iovatio Proposed How to measure, aalyze, ad specify computer system performace or My computer is faster tha your computer!
How to read A Mutual Fund shareholder report
Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.
Output Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
Domain 1: Configuring Domain Name System (DNS) for Active Directory
Maual Widows Domai 1: Cofigurig Domai Name System (DNS) for Active Directory Cofigure zoes I Domai Name System (DNS), a DNS amespace ca be divided ito zoes. The zoes store ame iformatio about oe or more
Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY
Optimize your Network I the Courier, Express ad Parcel market ADDING CREDIBILITY Meetig today s challeges ad tomorrow s demads Aswers to your key etwork challeges ORTEC kows the highly competitive Courier,
CREATIVE MARKETING PROJECT 2016
CREATIVE MARKETING PROJECT 2016 The Creative Marketig Project is a chapter project that develops i chapter members a aalytical ad creative approach to the marketig process, actively egages chapter members
INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology
Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology
Baan Service Master Data Management
Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :
INVESTMENT PERFORMANCE COUNCIL (IPC)
INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks
Soving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
PUBLIC RELATIONS PROJECT 2016
PUBLIC RELATIONS PROJECT 2016 The purpose of the Public Relatios Project is to provide a opportuity for the chapter members to demostrate the kowledge ad skills eeded i plaig, orgaizig, implemetig ad evaluatig
CS100: Introduction to Computer Science
I-class Exercise: CS100: Itroductio to Computer Sciece What is a flip-flop? What are the properties of flip-flops? Draw a simple flip-flop circuit? Lecture 3: Data Storage -- Mass storage & represetig
Confidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS
COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
Incremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)
Ehacig Oracle Busiess Itelligece with cubus EV How users of Oracle BI o Essbase cubes ca beefit from cubus outperform EV Aalytics (cubus EV) CONTENT 01 cubus EV as a ehacemet to Oracle BI o Essbase 02
ODBC. Getting Started With Sage Timberline Office ODBC
ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.
Automatic Tuning for FOREX Trading System Using Fuzzy Time Series
utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which
A Balanced Scorecard
A Balaced Scorecard with VISION A Visio Iteratioal White Paper Visio Iteratioal A/S Aarhusgade 88, DK-2100 Copehage, Demark Phoe +45 35430086 Fax +45 35434646 www.balaced-scorecard.com 1 1. Itroductio
Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks
Maual Security+ Domai 1 Network Security Every etwork is uique, ad architecturally defied physically by its equipmet ad coectios, ad logically through the applicatios, services, ad idustries it serves.
DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2
Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,
Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis
Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
Evaluating Model for B2C E- commerce Enterprise Development Based on DEA
, pp.180-184 http://dx.doi.org/10.14257/astl.2014.53.39 Evaluatig Model for B2C E- commerce Eterprise Developmet Based o DEA Weli Geg, Jig Ta Computer ad iformatio egieerig Istitute, Harbi Uiversity of
Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:
Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network
Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
Wells Fargo Insurance Services Claim Consulting Capabilities
Wells Fargo Isurace Services Claim Cosultig Capabilities Claim Cosultig Claims are a uwelcome part of America busiess. I a recet survey coducted by Fulbright & Jaworski L.L.P., large U.S. compaies face
Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments
Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please
Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues
Maual Widows 7 Eterprise Desktop Support Techicia (70-685) 1-800-418-6789 Domai 1: Idetifyig Cause of ad Resolvig Desktop Applicatio Issues Idetifyig ad Resolvig New Software Istallatio Issues This sectio
CHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
Center, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
facing today s challenges As an accountancy practice, managing relationships with our clients has to be at the heart of everything we do.
CCH CRM cliet relatios facig today s challeges As a accoutacy practice, maagig relatioships with our cliets has to be at the heart of everythig we do. That s why our CRM system ca t be a bolt-o extra it
One Goal. 18-Months. Unlimited Opportunities.
18 fast-track 18-Moth BACHELOR S DEGREE completio PROGRAMS Oe Goal. 18-Moths. Ulimited Opportuities. www.ortheaster.edu/cps Fast-Track Your Bachelor s Degree ad Career Goals Complete your bachelor s degree
Professional Networking
Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market
Research Method (I) --Knowledge on Sampling (Simple Random Sampling)
Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact
Department of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions
Chapter 5 Uit Aual Amout ad Gradiet Fuctios IET 350 Egieerig Ecoomics Learig Objectives Chapter 5 Upo completio of this chapter you should uderstad: Calculatig future values from aual amouts. Calculatig
Chapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives
Outsourcig ad Globalizatio i Software Developmet Jacques Crocker UW CSE Alumi 2003 [email protected] Ageda Itroductio The Outsourcig Pheomeo Leadig Offshore Projects Maagig Customers Offshore Developmet
France caters to innovative companies and offers the best research tax credit in Europe
1/5 The Frech Govermet has three objectives : > improve Frace s fiscal competitiveess > cosolidate R&D activities > make Frace a attractive coutry for iovatio Tax icetives have become a key elemet of public
Determining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
Digital Enterprise Unit. White Paper. Web Analytics Measurement for Responsive Websites
Digital Eterprise Uit White Paper Web Aalytics Measuremet for Resposive Websites About the Authors Vishal Machewad Vishal Machewad has over 13 years of experiece i sales ad marketig, havig worked as a
Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand [email protected]
SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial
client communication
CCH Portal cliet commuicatio facig today s challeges Like most accoutacy practices, we ow use email for most cliet commuicatio. It s quick ad easy, but we do worry about the security of sesitive data.
ANALYTICS. Insights that drive your business
ANALYTICS Isights that drive your busiess Eterprises are trasformig their busiesses by supplemetig their databases with real ad up-to-date customer data. Aalytics, as a catalyst, refies raw data ad aligs
The Forgotten Middle. research readiness results. Executive Summary
The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school
Hypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
Lecture 2: Karger s Min Cut Algorithm
priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.
Reliability Analysis in HPC clusters
Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab
A Secure Implementation of Java Inner Classes
A Secure Implemetatio of Java Ier Classes By Aasua Bhowmik ad William Pugh Departmet of Computer Sciece Uiversity of Marylad More ifo at: http://www.cs.umd.edu/~pugh/java Motivatio ad Overview Preset implemetatio
Pre-Suit Collection Strategies
Pre-Suit Collectio Strategies Writte by Charles PT Phoeix How to Decide Whether to Pursue Collectio Calculatig the Value of Collectio As with ay busiess litigatio, all factors associated with the process
A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design
A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 [email protected] Abstract:
Domain 1 - Describe Cisco VoIP Implementations
Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.
Effective Data Deduplication Implementation
White Paper Effective Data Deduplicatio Implemetatio Eterprises with IT ifrastructure are lookig at reducig their carbo foot prit ad ifrastructure maagemet cost by slimmig dow their data ceters. I cotrast,
Agency Relationship Optimizer
Decideware Developmet Agecy Relatioship Optimizer The Leadig Software Solutio for Cliet-Agecy Relatioship Maagemet supplier performace experts scorecards.deploymet.service decide ware Sa Fracisco Sydey
Flood Emergency Response Plan
Flood Emergecy Respose Pla This reprit is made available for iformatioal purposes oly i support of the isurace relatioship betwee FM Global ad its cliets. This iformatio does ot chage or supplemet policy
Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
Neolane Reporting. Neolane v6.1
Neolae Reportig Neolae v6.1 This documet, ad the software it describes, are provided subject to a Licese Agreemet ad may ot be used or copied outside of the provisios of the Licese Agreemet. No part of
BaanERP 5.0c. EDI User Guide
BaaERP 5.0c A publicatio of: Baa Developmet B.V. P.O.Box 143 3770 AC Bareveld The Netherlads Prited i the Netherlads Baa Developmet B.V. 1999. All rights reserved. The iformatio i this documet is subject
Optimal Adaptive Bandwidth Monitoring for QoS Based Retrieval
1 Optimal Adaptive Badwidth Moitorig for QoS Based Retrieval Yizhe Yu, Iree Cheg ad Aup Basu (Seior Member) Departmet of Computig Sciece Uiversity of Alberta Edmoto, AB, T6G E8, CAADA {yizhe, aup, li}@cs.ualberta.ca
CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD
Books Olie Software Fee Fee Protectio Cosultacy Advice Advice Lies Lies CPD CPD facig today s challeges As a accoutacy practice, maagig relatioships with our cliets has to be at the heart of everythig
HCL Dynamic Spiking Protocol
ELI LILLY AND COMPANY TIPPECANOE LABORATORIES LAFAYETTE, IN Revisio 2.0 TABLE OF CONTENTS REVISION HISTORY... 2. REVISION.0... 2.2 REVISION 2.0... 2 2 OVERVIEW... 3 3 DEFINITIONS... 5 4 EQUIPMENT... 7
RISK TRANSFER FOR DESIGN-BUILD TEAMS
WILLIS CONSTRUCTION PRACTICE I-BEAM Jauary 2010 www.willis.com RISK TRANSFER FOR DESIGN-BUILD TEAMS Desig-builD work is icreasig each quarter. cosequetly, we are fieldig more iquiries from cliets regardig
Sequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
Measures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
MTO-MTS Production Systems in Supply Chains
NSF GRANT #0092854 NSF PROGRAM NAME: MES/OR MTO-MTS Productio Systems i Supply Chais Philip M. Kamisky Uiversity of Califoria, Berkeley Our Kaya Uiversity of Califoria, Berkeley Abstract: Icreasig cost
QUADRO tech. PST Flightdeck. Put your PST Migration on autopilot
QUADRO tech PST Flightdeck Put your PST Migratio o autopilot Put your PST Migratio o Autopilot A moder aircraft hardly remids its pilots of the early days of air traffic. It is desiged to eable flyig as
A GUIDE TO BUILDING SMART BUSINESS CREDIT
A GUIDE TO BUILDING SMART BUSINESS CREDIT Establishig busiess credit ca be the key to growig your compay DID YOU KNOW? Busiess Credit ca help grow your busiess Soud paymet practices are key to a solid
STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA. Maya Maria, Universitas Terbuka, Indonesia
STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA Maya Maria, Uiversitas Terbuka, Idoesia Co-author: Amiuddi Zuhairi, Uiversitas Terbuka, Idoesia Kuria Edah
Configuring Additional Active Directory Server Roles
Maual Upgradig your MCSE o Server 2003 to Server 2008 (70-649) 1-800-418-6789 Cofigurig Additioal Active Directory Server Roles Active Directory Lightweight Directory Services Backgroud ad Cofiguratio
Data Center Ethernet Facilitation of Enterprise Clustering. David Flynn, Linux Networx Orlando, Florida March 16, 2004
Data Ceter Etheret Facilitatio of Eterprise Clusterig David Fly, Liux Networx Orlado, Florida March 16, 2004 1 2 Liux Networx builds COTS based clusters 3 Clusters Offer Improved Performace Scalability
Lesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig
Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals
Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of
Systems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: [email protected] Supervised
Quadrat Sampling in Population Ecology
Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may
Estimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10
FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10 [C] Commuicatio Measuremet A1. Solve problems that ivolve liear measuremet, usig: SI ad imperial uits of measure estimatio strategies measuremet strategies.
Data Analysis and Statistical Behaviors of Stock Market Fluctuations
44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:
