Towards Zero-Overhead Static and Adaptive Indexing in Hadoop

Transcription

1 Nonme mnusript No. (will e inserted y the editor) Towrds Zero-Overhed Stti nd Adptive Indexing in Hdoop Stefn Rihter Jorge-Arnulfo Quiné-Ruiz Stefn Shuh Jens Dittrih the dte of reeipt nd eptne should e inserted lter Astrt Hdoop MpRedue hs evolved to n importnt industry stndrd for mssive prllel dt proessing nd hs eome widely dopted for vriety of use ses. Reent works hve shown tht indexes n improve the performne of seletive MpRedue jos drmtilly. However, one mjor wekness of existing pprohes re high index retion osts. We present HAIL (Hdoop Aggressive Indexing Lirry), novel indexing pproh for HDFS nd Hdoop MpRedue. HAIL retes different lustered indexes over terytes of dt with miniml, often invisile osts nd it drmtilly improves runtimes of severl lsses of MpRedue jos. HAIL fetures two different indexing pipelines, stti indexing nd dptive indexing. HAIL stti indexing effiiently indexes dtsets while uploding them to HDFS. Therey, HAIL leverges the defult replition of Hdoop nd enhnes it with logil replition. This llows HAIL to rete multiple lustered indexes for dtset, e.g. one for eh physil repli. Still, in terms of uplod time, HAIL mthes or even improves over the performne of stndrd HDFS. Additionlly, HAIL dptive indexing llows for utomti, inrementl indexing t jo runtime with miniml runtime overhed. For exmple, HAIL dptive indexing n ompletely index dtset s yprodut of only four MpRedue jos while inurring n overhed s low s 11% for the very first of those jo only. In our experiments, we show tht HAIL improves jo runtimes y up to 68x over Hdoop. This rtile is n extended version of the VLDB 212 pper Only Aggressive Elephnts re Fst Elephnts (PVLDB, 5(11): , 212). S. Rihter, S. Shuh, J. Dittrih Informtion Systems Group Srlnd University J.-A. Quiné-Ruiz Qtr Computing Reserh Institute Qtr Foundtion 1 Introdution MpRedue hs eome the de fto stndrd for lrge sle dt proessing in mny enterprises. It is used for developing novel solutions on mssive dtsets suh s we nlytis, reltionl dt nlytis, mhine lerning, dt mining, nd rel-time nlytis [23]. In prtiulr, log proessing emerges s n importnt type of dt nlysis ommonly done with MpRedue [5,36,18]. In ft, Feook nd Twitter use Hdoop MpRedue (the most populr MpRedue open soure implementtion) to nlyze the huge mounts of we logs generted every dy y their users [43,22,35]. Over the lst yers, lot of reserh works hve foused on improving the performne of Hdoop MpRedue [12, 26, 32, 34]. When improving the performne of MpRedue, it is importnt to onsider tht it ws initilly developed for lrge ggregtion tsks tht sn through huge mounts of dt. However, nowdys Hdoop is often lso used for seletive queries tht im to find only few relevnt reords for further onsidertion 1. For seletive queries, Hdoop still sns through the omplete dtset. This resemles the serh for needle in hystk. For this reson, severl reserhers hve prtiulrly foused on supporting effiient index ess in Hdoop [45, 15,35,33]. Some of these works hve improved the performne of seletive MpRedue jos y orders of mgnitude. However, ll these indexing pprohes hve three min weknesses. First, they require high upfront ost for index retion. This trnsltes to long witing times for users until they n tully strt to run queries. Seond, they n only support one physil sort order (nd hene one lustered index) per dtset. This eomes serious prolem if the worklod demnds indexes for severl ttriutes. Third, they require users to hve good knowledge of the worklod 1 A simple exmple of suh use se would e distriuted grep.

2 2 Stefn Rihter et l. in order to hoose the indexes to rete. This is not lwys possile, e.g. if the dt is nlyzed in n explortory wy or queries re sumitted y ustomers. 1.1 Motivtion Let us see through the eyes of dt nlyst, sy Bo, who wnts to nlyze lrge we log. The we log ontins different fields tht my serve s filter onditions for Bo like visitdte, drevenue, soureip nd so on. Assume Bo is interested in ll soureips with visitdte from 211. Thus, Bo writes MpRedue progrm to filter out extly those reords nd disrd ll others. Bo is using Hdoop, whih will sn the entire input dtset from disk to filter out the qulifying reords. This tkes while. After inspeting the result set Bo detets series of strnge requests from soureip Therefore, he deides to modify his MpRedue jo to show ll requests from the entire input dtset hving tht soureip. Bo is using Hdoop. This tkes while. Eventully, Bo deides to modify his MpRedue jo gin to only return log reords hving prtiulr drevenue. Yes, this gin tkes while. In summry, Bo uses sequene of different filter onditions, eh one triggering new MpRedue jo. He is not extly sure wht he is looking for. The whole endevor feels like going shopping without shopping list. This exmple illustrtes n explortory usge (nd mjor usese) of Hdoop MpRedue [5, 18, 38]. But, this use-se hs one mjor prolem: slow query runtimes. The time to exeute MpRedue jo sed on sn my e very high: it is dominted y the I/O for reding ll input dt [39,33]. While witing for his MpRedue jo to omplete, Bo hs enough time to pik offee (or two) nd this hppens every time Bo modifies the MpRedue jo. This will likely kill his produtivity nd mke his oss unhppy. Now, ssume the fortunte se tht Bo rememers sentene from one of his professors sying full-tle-sns re d; indexes re good 2. Thus, he reds ll the reent VLDB ppers (inluding [33,12,26,32]) nd finds pper tht shows how to rete so-lled trojn index [15]. A trojn index is n index tht my e used with Hdoop MpRedue nd yet does not modify the underlying Hdoop MpRedue nd HDFS engines. Zero-Overhed indexing. Bo finds the trojn index ide interesting nd hene deides to rete trojn index on soureip efore running his MpRedue jos. However, using trojn indexes rises two other prolems: (1.) Expensive index retion. The time to rete the trojn index on soureip (or ny other ttriute) is even muh longer thn running sn-sed MpRedue jo. Thus, if Bo s MpRedue jos use tht index only few times, the index retion osts will never e mortized. So, why would Bo rete suh n expensive index in the first ple? 2 The professor is wre tht for some situtions the opposite is true. (2.) Whih ttriute to index? Even if Bo mortizes index retion osts, the trojn index on soureip will only help for tht prtiulr ttriute. So, whih ttriute should Bo use to rete the index? Bo is wondering how to rete severl indexes t very low ost to solve those prolems. Per-Repli indexing. One dy in utumn 211, Bo reds out nother ide [34] where some reserhers looked t wys to improve vertil prtitioning in Hdoop. The reserhers in tht work relized tht HDFS keeps three (or more) physil opies of ll dt for fult-tolerne. Therefore, they deided to hnge HDFS to store eh physil opy in different dt lyout (row, olumn, PAX, or ny other olumn grouping lyout). As ll dt lyout trnsformtion is done per HDFS dt lok, the filover properties of HDFS nd Hdoop MpRedue were not ffeted. At the sme time, I/O times improved. Bo thinks tht this looks very promising, euse he ould possily exploit this onept to rete different lustered indexes lmost invisile to the user. This is euse he ould rete one lustered index per dt lok repli when uploding dt to HDFS. This would lredy help him lot in severl query worklods. However, Bo quikly figures out tht there re ses where this ide still hs some nnoying limittions. Even if Bo ould rete one lustered index per dt repli t low ost, he would still hve to determine whih ttriutes to index when uploding his dt to HDFS. Afterwrds, he ould not esily revise his deision or introdue dditionl indexes without uploding the dtset gin. Unfortuntely, it sometimes hppens tht Bo nd his ollegues nvigte through dtsets ording to the properties nd orreltions of the dt. In suh ses, Bo nd his ollegues typilly: (1.) do not know the dt ess ptterns in dvne; (2.) hve different interests nd hene nnot gree upon ommon seletion riteri t dt uplod time; (3.) even if they gree whih ttriutes to index t dt uplod time, they might end up filtering reords ording to vlues on different ttriutes. Therefore, using ny trditionl indexing tehnique [19, 1, 2,8,11,45,35,15,33] would e prolemti, euse they nnot dpt well to unknown or hnging query worklods. Adptive indexing. When serhing for solution to his prolem with stti indexing, Bo stumles ross new pproh lled dptive indexing [28], where the generl ide is to rete indexes s side-effet of query proessing. This is similr to the ide of soft indexes [37], where the system piggyks the index retion for given ttriute on single inoming query. However, in ontrst to soft indexes, dptive indexing ims t reting indexes inrementlly (i.e., piggyking on severl inoming queries) in order to void high upfront index retion times. Thus, Bo is exited out the dptive indexing ide sine this ould e the missing piee to solve his remining onern. However, Bo quikly noties tht he nnot simply pply existing

3 Towrds Zero-Overhed Stti nd Adptive Indexing in Hdoop 3 dptive indexing works [17,28,29,21,3,24] in MpRedue systems for severl resons: (1.) Glol index onvergene. These tehniques im t onverging to glol index for n entire ttriute, whih requires sorting the ttriute glolly. Therefore, these tehniques perform mny dt movements ross the entire dtset. Doing this in MpRedue would hurt fult-tolerne s well s the performne of MpRedue jos. This is euse the system would hve to move dt ross dt loks in syn with ll their three physil dt lok replis. We do not pln to rete glol indexes, ut fous on reting prtil indexes tht in totl over the whole dtset. A smll k of the envelope lultion shows tht the possile gins of glol index re negligile in omprison to the overhed of the MpRedue frmework. For instne, if dtset is uniformly distriuted over luster nd oupies 16 HDFS loks on eh dtnode (like the dtset in our experiments in Setion 9) nd we do not hve glol index, then we need to perform 16 index esses on eh dtnode. Sine ll dtnodes n ess their loks in prllel to eh other, we ssume tht the overhed is determined y the highest overhed per dtnode. Overll, our pproh requires t most 318 dditionl rndom reds in HDFS per dtnode in this senrio, whih in turn ost roughly 15ms eh. In totl, this mounts to 4.77s overhed ompred to glol index stored in HDFS. However, even empty MpRedue jos, tht do not red ny dt nor ompute single mp funtion, run for more thn 1s. (2.) High I/O osts. Even if Bo pplied existing dptive indexing tehniques inside dt loks, these tehniques would end up in mny ostly I/O opertions to move dt on disk. This is euse these tehniques onsider minmemory systems nd thus do not ftor in the I/O-ost for reding/writing dt from/to disk. Only one of these works [21] proposes n dptive merging tehnique for disksed systems. However, pplying this tehnique inside HDFS lok would not mke sense in MpRedue sine HDFS loks re typilly loded entirely into min memory nywys when proessing mp tsks. One my think out pplying dptive merging ross HDFS loks, ut this would gin hurt fult-tolerne nd the performne of MpRedue jos s desried ove. (3.) Unlustered index. These works fous on reting unlustered indexes in the first ple nd hene it is only enefiil for highly seletive queries. One of these works [29] introdued lzy tuple reorgnistion in order to onverge to lustered indexes. However, this tehnique needs severl thousnd queries to onverge nd its pplition in disksed system would gin introdue huge numer of expensive I/O opertions. (4.) Centrlized pproh. Existing dptive indexing pprohes were minly designed for single-node DBMSs. Therefore, pplying these works in distriuted prllel systems, like Hdoop MpRedue, would not fully exploit the existing prllelism to distriute the indexing effort ross severl omputing nodes. Despite ll these open prolems, Bo is very enthusisti to omine the ove interesting ides on indexing into new system to revolutionize the wy his ompny n use Hdoop. And this is where the story egins. 1.2 Reserh Questions nd Chllenges This rtile ddresses the following reserh questions: Zero-Overhed indexing. Current indexing pprohes in Hdoop involve signifint upfront ost for index retion. How n we mke indexing in Hdoop so effetive tht it is silly invisile for the user? How n we minimize the I/O osts for indexing or eventully redue them to zero? How n we fully utilize the ville CPU resoures nd prllelism of lrge lusters for indexing? Per-Repli indexing. Hdoop uses dt replition for filover. How n we exploit this replition to support different sort orders nd indexes? Whih hnges to the HDFS uplod pipeline need to e done to mke this effiient? Wht hppens to the involved heksum mehnism of HDFS? How n we teh the HDFS nmenode to distinguish the different replis nd keep trk of the different indexes? Jo exeution. How n we hnge Hdoop MpRedue to utilize different sort orders nd indexes t query time? How n we hnge Hdoop MpRedue to shedule tsks to replis hving the pproprite index? How n we shedule mp tsks to effiiently proess indexed nd non-indexed dt loks without ffeting filover? How muh do we need to hnge existing MpRedue jos? How will Hdoop MpRedue hnge from the user s perspetive? Zero-Overhed Adptive indexing. How n we dptively nd utomtilly rete dditionl useful indexes online t miniml osts per jo? How to index ig dt inrementlly in distriuted, disk-sed system like Hdoop s yprodut of jo exeution? How to minimize the impt of indexing on individul jo exeution times? How to effiiently interleve dt proessing with indexing? How to distriute the indexing effort effiiently y onsidering dtlolity nd index plement ross omputing nodes? How to rete severl lustered indexes t query time? How to support different numer of replis per dt lok? 1.3 Contriutions We propose HAIL (Hdoop Aggressive Indexing Lirry), stti nd dptive indexing pproh for MpRedue systems. The min gol of HAIL is to minimize oth (i) the index retion time when uploding dt nd (ii) the impt of onurrent index retion on jo exeution times. In summry, we mke the following min ontriutions to tkle the questions nd hllenges mentioned ove:

4 4 Stefn Rihter et l. (1.) Zero-Overhed indexing. We show how to effetively piggy-k sorting nd index retion on the existing HDFS uplod pipeline. This wy no dditionl MpRedue jo is required to rete those indexes nd lso no dditionl red of the dt is required t ll. In ft, the HAIL uplod pipeline is so effetive when ompred to HDFS tht the dditionl overhed for sorting nd index retion is hrdly notiele in the overll proess. Therefore, we offer win-win sitution over Hdoop MpRedue nd even over Hdoop++ [15]. We give n overview of HAIL nd its enefits in Setion 2. (2.) Per-Repli indexing. We show how to exploit the defult replition of Hdoop to support different sort orders nd indexes for eh lok repli (Setion 3). Hene, for defult replition ftor of three, up to three different sort orders nd lustered indexes re ville for proessing MpRedue jos. Thus, the likelihood to find suitle index inreses nd hene the runtime for worklod improves. Our pproh enefits from the ft tht Hdoop is only used for ppends: there re no updtes. Thus, one lok is full, it will never e hnged gin. (3.) Jo Exeution. We show how to effetively hnge the Hdoop MpRedue pipeline to exploit existing indexes (Setion 4). Our gol is to do this without hnging the ode of the MpRedue frmework. Therefore, we introdue optionl nnottions for MpRedue jos tht llow users to enrih their queries with expliit speifitions of their seletions nd projetions. HAIL tkes re of performing MpRedue jos using norml dt lok replis or pseudo dt lok replis (or even oth). In ddition, we propose new tsk sheduling, lled HAIL Sheduling, to fully exploit sttilly nd dptively indexed dt loks (Setion 7). The gol of HAIL Sheduling is twofold: (i) to redue the sheduling overhed when exeuting MpRedue jo, nd (ii) to lne the indexing effort ross omputing nodes to limit the impt of dptive indexing. (4.) Zero-Overhed Adptive indexing. We show how to effetively piggyk dptive index retion on the existing MpRedue jo exeution pipeline (Setion 5). The ide is to omine dptive indexing nd zero-overhed indexing to solve the prolem of missing indexes for evolving or unpreditle worklods. In other words, when HAIL exeutes mp redue jo with filter ondition on n unindexed ttriute, HAIL retes tht missing index for ertin frtion of the HDFS loks in prllel. We dditionlly propose set of dptive indexing strtegies tht mkes HAIL wre of the performne nd the seletivity of MpRedue jos (Setion 6). We present lzy nd eger dptive indexing, two tehniques tht llow HAIL to quikly dpt to hnges in users worklods t low indexing overhed. We then show how HAIL n deide whih dt loks to index sed on the seletivities of MpRedue jos. (5.) Exhustive vlidtion. We present n extensive experimentl omprison of HAIL with Hdoop nd Hdoop++ [15] (Setion 9). We use seven different lusters inluding physil nd virtul EC2 lusters of up to 1 nodes. A series of experiments shows the superiority of HAIL over oth Hdoop nd Hdoop++. Another series of slility experiments with different dtsets lso demonstrtes the superiority of using dptive indexing in HAIL. In prtiulr, our experimentl results demonstrte tht HAIL: (i) retes lustered indexes t uplod time lmost for free; (ii) quikly dpts to query worklods with negligile indexing overhed; nd (iii) only for the very first jo HAIL hs smll overhed over Hdoop when reting indexes dptively: ll the following jos re fster in HAIL. Notie tht, this rtile presents n extended version of the initil HAIL system [16] with the following signifint dded vlue: we enrih HAIL with the dptive indexing pipeline, tht llows HAIL to dpt to hnges in query worklods in n utomti, inrementl, nd dynmi wy (ll of ontriution Zero-Overhed Adptive indexing.); we extend the HAIL tsk sheduling in order to lne the index effort t jo exeution time nd exploit pseudo dt loks (hlf of ontriution Jo exeution.); we run lrge numer of new experiments to vlidte our dptive indexing tehniques s well s the extended HAIL tsk sheduling (one third of ontriution Exhustive vlidtion.). 2 Overview In the following, we give n overview of HAIL y ontrsting it with norml HDFS nd Hdoop MpRedue. Therey, we introdue the two indexing pipelines of HAIL. First, stti indexing llows us to rete severl lustered indexes t uplod time. Seond, HAIL dptive indexing retes dditionl indexes s yprodut of tul jo exeution, whih enles HAIL to dpt to unexpeted worklods. For more detiled ontrst to relted work see Setion 8. For now, let s onsider gin our motivting exmple: How n Bo nlyze his log file with Hdoop nd HAIL? 2.1 Hdoop nd HDFS In HDFS nd Hdoop MpRedue, Bo strts y uploding his log file to HDFS using the HDFS lient. HDFS then prtitions the file into logil HDFS loks using onstnt lok size (the HDFS defult is 64MB). Eh HDFS lok is then physilly stored three times (ssuming the defult replition ftor). Eh physil opy of lok is lled repli. Eh repli will sit on different dtnode. Therefore, t lest two dtnode filures my e survived y HDFS. Note tht HDFS keeps informtion on the different replis for n HDFS lok in entrl nmenode diretory. After uploding his log file to HDFS, Bo my run n tul MpRedue jo. Bo invokes Hdoop MpRedue through Hdoop MpRedue JoClient, whih sends his

5 Towrds Zero-Overhed Stti nd Adptive Indexing in Hdoop 5 MpRedue jo to entrl node termed JoTrker. The MpRedue jo onsists of severl tsks. A tsk is exeuted on suset of the input file, typilly n HDFS lok 3. The JoTrker ssigns eh tsk to different TskTrker, whih typilly runs on the sme mhine s n HDFS dtnode. Eh dtnode will then red its suset of the input file, i.e., set of HDFS loks, nd feed tht dt into the MpRedue proessing pipeline whih usully onsists of Mp, Shuffle, nd Redue Phse (see [13,15,14] for detiled desription). As soon s ll results hve een written to HDFS, the JoClient informs Bo tht the results re ville. Notie tht, the exeution time of the MpRedue jo is hevily influened y the size of the input dtset, euse Hdoop MpRedue reds the input dtset entirely in order to perform ny inoming MpRedue jo. 2.2 HAIL In HAIL, Bo nlyzes his log file s follows. He strts y uploding his log file to HAIL using the HAIL lient. In ontrst to the HDFS lient, the HAIL lient nlyzes the input dt for eh HDFS lok, onverts eh HDFS lok diretly to inry olumnr lyout, tht resemles PAX [3] nd sends it to three dtnodes. Then, ll dtnodes sort the dt ontined in tht HDFS lok in prllel using different sort order. The required sort orders n e mnully speified y Bo in onfigurtion file or omputed y physil design lgorithm. For eh HDFS lok, ll sorting nd index retion hppens in min memory. This is fesile s the HDFS lok size is typilly etween 64MB (defult) nd 1GB. This esily fits in the min memory of most mhines. In ddition, in HAIL, eh dtnode retes different lustered index for eh HDFS lok repli nd stores it with the sorted dt. This proess is lled the HAIL stti indexing pipeline. After uploding his log file to HAIL, Bo runs his MpRedue jos, tht n now immeditely exploit the indexes tht were reted y HAIL sttilly (i.e., t uplod time). As efore, Bo invokes Hdoop MpRedue through JoClient whih sends his MpRedue jos to the Jo- Trker. However, his MpRedue jos re slightly modified so tht the system n deide to eventully use ville indexes on the dt lok replis. For exmple, ssume tht dt lok hs three replis with lustered indexes on visitdte, drevenue, nd soureip. In se tht Bo hs MpRedue jo filtering on visitdte, HAIL uses the replis hving the lustered index on visitdte. If Bo is filtering on soureip, HAIL uses the replis hving the lustered index on soureip nd so on. To provide filover nd lod lning, HAIL my fll k to stndrd Hdoop snning for some of the loks. However, even ftoring this 3 Atully it is split. The differene does not mtter here. We will get k to this in Setion 4.2. in, Bo s queries run muh fster on verge, if indexes on the right ttriutes exist. In se tht Bo sumits jos tht filter on unindexed ttriutes (e.g., on durtion), HAIL gin flls k to stndrd full sn y hoosing ny ritrry repli, just like Hdoop. However, in ontrst to Hdoop, HAIL n index HDFS loks in prllel to jo exeution. If nother jo filters gin on the durtion field, the new jo n lredy enefit from the previously indexed loks. So, HAIL tkes inoming jos, whih hve seletion predite on urrently unindexed ttriutes, s hints for vlule dditionl lustered indexes. Consequently, the set of ville indexes in HAIL evolves with hnging worklods. We ll this proess the HAIL dptive indexing pipeline. 2.3 HAIL Benefits (1.) HAIL often improves oth uplod nd query times. The uplod is drmtilly fster thn Hdoop++ nd often fster (or only slightly slower) thn with the stndrd Hdoop even though we (i) onvert the input file into inry PAX, (ii) rete series of different sort orders, nd (iii) rete multiple lustered indexes. From the user-side, this provides win-win sitution: there is no notiele punishment for uplod. For querying, users n only win: if our indexes nnot help, we will fll k to stndrd Hdoop snning; if the indexes n help, query runtimes will improve. Why do we not hve high osts t uplod time? We silly exploit the unused CPU tiks tht re not used y stndrd HDFS. As the stndrd HDFS uplod pipeline is I/O-ound, the effort for our sorting nd index retion in the HAIL uplod pipeline is hrdly notiele. In ddition, sine we prse dt to inry while uploding, we often enefit from smller dtsets triggering less network nd disk I/O. (2.) Even if we did not rete the right indexes t uplod time, HAIL n rete indexes dptively t jo exeution time without inurring high overhed. Why don t we see high overhed? We do not need to dditionlly lod the lok dt to min memory, sine we piggyk on the reding of the mp tsks. Furthermore, HAIL retes indexes inrementlly over severl jo exeutions using different dptive indexing strtegies. (3.) We do not hnge the filover properties of Hdoop. Why is filover not ffeted? All dt stys on the sme logil HDFS lok. We just hnge the physil representtion of eh repli of n HDFS lok. Therefore, from eh physil repli we my reover the logil HDFS lok. (4.) HAIL works with existing MpRedue jos inurring only miniml hnges to those jos. Why does this work? We llow Bo to nnotte his existing jos with seletions nd projetions. Those nnottions re then onsidered y HAIL to pik the right index. Like tht, for Bo the hnges to his MpRedue jos re miniml.

6 6 Stefn Rihter et l. Network Network Bo HAILClient CL DtNode DN 1 DtNode DN 3 OK uplod notify preproess 1 onvert 2 PAX Blok Blok Metdt PCK 2 ACK PCK 1 ACK PAX Blok Blok Metdt ressemle PCK PCK 2 1 forwrd 13 uild 7 ACK HAIL Blok PAX Blok HAIL Blok 1 Blok Metdt Blok Metdt Blok 111 Metdt Index Metdt uild Index Metdt Index Index ressemle 8 forwrd PCK PCK hek ACK 2 ACK 1 ACK 2 ppend knowledge Network get lotion 3 register register HDFS NmeNode Blok diretory HAIL Repli diretory Fig. 1 The HAIL stti indexing pipeline s prt of uploding dt to HDFS 3 HAIL Zero-Overhed Stti Indexing We rete stti indexes in HAIL while uploding dt. One of the min hllenges is to support different sort orders nd lustered indexes per repli s well s to uild those indexes effiiently without muh impt on uplod times. Figure 1 shows the dt flow when Bo uplods file to HAIL. Let s first explore the detils of the stti indexing pipeline. 3.1 Dt Lyout In HDFS, for eh lok, the lient ontts the nmenode to otin the list of dtnodes tht should store the lok replis. Then, the lient sends the originl lok to the first dtnode, whih forwrds this to the seond dtnode nd so on. In the end, eh dtnode stores yte-identil opy of the originl lok dt. In HAIL, the HAIL lient preproesses the file sed on its ontent to onsider end of lines 1 in Figure 1. We prse the ontents into rows y serhing for end of line symols nd never split row etween two loks. This is in ontrst to stndrd HDFS whih splits file into HDFS loks fter onstnt numer of ytes. For eh lok the HAIL lient prses eh row ording to the shem speified y the user 4. If HAIL enounters row tht does not mth the given shem (i.e., d reord), it seprtes this reord into speil prt of the dt lok. HAIL then onverts ll HDFS loks to inry olumnr lyout tht resemles PAX 2. This llows us to index nd ess individul ttriutes more effiiently. The HAIL lient lso ollets metdt informtion from eh HDFS lok (suh s the dt shem) nd retes lok heder (Blok Metdt) for eh HDFS lok 2. We ould nively piggy-k on this existing HDFS uplod pipeline y first storing the originl lok dt s done 4 Alterntively, HAIL n lso suggest n pproprite shem to users through shem nlysis. in Hdoop nd then onverting it to inry PAX lyout in seond step. However, we would hve to re-red nd then re-write eh lok, whih would trigger one extr write nd red for eh repli, e.g., for n input file of 1GB we would hve to py 6GB extr I/O on the luster. This would led to very long uplod times. In ontrst, HAIL does not hve to py ny of tht extr I/O. However, to hieve this drmti improvement, we hve to mke nontrivil hnges in the stndrd Hdoop uplod pipeline. 3.2 Stti Indexing in the Uplod Pipeline To understnd the implementtion of stti indexing in the HAIL uplod pipeline, we first hve to nlyze the norml HDFS uplod pipeline in more detil. In HDFS, while uploding lok, the dt is further prtitioned into hunks of onstnt size 512B. Chunks re olleted into pkets. A pket is sequene of hunks plus heksum for eh of the hunks. In ddition some metdt is kept. In totl pket hs size of up to 64KB. Immeditely efore sending the dt over the network, eh HDFS lok is onverted to sequene of pkets. On disk, HDFS keeps, for eh repli, seprte file ontining heksums for ll of its hunks. Hene, for eh repli two files re reted on lol disk: one file with the tul dt nd one file with its heksums. These heksums re reused y HDFS whenever dt is send over the network. The HDFS lient (CL) sends the first pket of the lok to the first dtnode (DN 1 ) in the uplod pipeline. DN 1 splits the pket into two prts: the first ontins the tul hunk dt, the seond ontins the heksums for those hunks. Then DN 1 flushes the hunk dt to file on lol disk. The heksums re flushed to n extr file. In prllel DN 1 forwrds the pket to DN 2 whih splits nd flushes the dt like DN 1 nd in turn forwrds the pket to DN 3 whih splits nd flushes the dt s well. Yet, only DN 3 verifies the heksum for eh hunk. If the reomputed heksums for eh hunk of

7 Towrds Zero-Overhed Stti nd Adptive Indexing in Hdoop 7 pket mth the reeived heksums, DN 3 knowledges the pket k to DN 2, whih knowledges k to DN 1. Finlly, DN 1 knowledges k to CL. Eh dtnode lso ppends its ID to the ACK. Like tht only one of the dtnodes (the lst in the hin, here DN 3 s the replition ftor is three) hs to verify the heksums. DN 2 elieves DN 3, DN 1 elieves DN 2, nd CL elieves DN 1. If ny CL or DN i reeives ACKs in the wrong order, the uplod is onsidered filed. The ide of sending multiple pkets from CL is to hide the roundtrip ltenies of the individul pkets. Creting this hin of ACKs lso hs the enefit tht CL only reeives single ACK for eh pket nd not three. Notie, tht HDFS provides this heksum mehnism on top of the existing TCP/IP heksum mehnism (whih hs weker orretness gurntees thn HDFS). In HAIL, in order to reuse s muh of the existing HDFS pipeline nd yet to mke this effiient, we need to perform the following hnges. As efore, the HAIL lient (CL) gets the list of dtnodes to use for this lok from the HDFS nmenode 3. But rther thn sending the originl input, CL retes the PAX lok, uts it into pkets 4, nd sends it to DN 1 5. Whenever dtnode DN 1 DN 3 reeives pket, it does neither flush its dt nor its heksums to disk. Still, DN 1 nd DN 2 immeditely forwrd the pket to the next dtnode s efore 8. DN 3 will verify the heksum of the hunks for the reeived PAX lok 9 nd knowledge the pket k to DN 1 2. This mens the semntis of n ACK for pket of lok re hnged from pket reeived, vlidted, nd flushed to pket reeived nd vlidted. We do neither flush the hunks nor its heksums to disk s we first hve to sort the entire lok ording to the desired sort key. On eh dtnode, we ssemle the lok from ll pkets in min memory 6. This is relisti in prtie, sine min memories tend to e >1GB for ny modern server. Typilly, the size of lok is etween 64MB (defult) nd 1GB. This mens tht for the defult size we ould keep out 15 loks in min memory t the sme time. In prllel to forwrding nd ressemling pkets, eh dtnode sorts the dt, retes indexes, nd forms HAIL Blok 7, (see Setion 3.4). As prt of this proess, eh dtnode lso dds Index Metdt informtion to eh HAIL lok in order to speify the index it reted for this lok. Eh dtnode (e.g., DN 1 ) typilly sorts the dt inside lok in different sort order. It is worth noting tht hving different sort orders ross replis does not impt fult-tolerne s ll dt is reorgnized inside the sme lok only, i.e., dt is not reorgnized ross loks. Hene, ll replis of the sme HDFS lok logilly ontin the sme reords with just different order nd therefore n still t s logil replements for eh other. Additionlly, this property helps HAIL to preserve the lod lning pilities of Hdoop. For exmple, when dtnode ontining the repli with mthing sort order for ertin jo is overloded, HAIL might hoose to red from different repli on nother dtnode, just like norml Hdoop. To void overloding dtnodes in the first ple, HAIL employs round roin strtegy for ssigning sort orders to physil replis on top of the repli plement of HDFS. This mens, tht while HDFS lredy res out distriuting HDFS lok replis ross the luster, HAIL res out distriuting the sort orders (nd hene the indexes) ross those replis. As soon s dtnode hs ompleted sorting nd reting its index, it will reompute heksums for eh hunk of lok. Notie tht, heksums will differ on eh repli, s different sort orders nd indexes re used. Hene, eh dtnode hs to ompute its own heksums. Then, eh dtnode flushes the hunks nd newly omputed heksums to two seprte files on lol disk s efore. For DN 3, one ll hunks nd heksums hve een flushed to disk, DN 3 will knowledge the lst pket of the lok k to DN 1 2. After tht DN 3 will inform the HDFS nmenode out its new repli inluding its HAIL lok size, the reted indexes, nd the sort order 11 (see Setion 3.3). Dtnodes DN 2 nd DN 1 ppend their ID to eh ACK 12. Then they forwrd eh ACK k in the hin 13. DN 2 nd DN 1 will forwrd the lst ACK of the lok only if ll hunks nd heksums hve een flushed to their disks. After tht DN 2 nd DN 1 individully inform the HDFS nmenode 14. The HAIL lient lso verifies tht ll ACKs rrive in order 15. Notie, tht it is importnt to hnge the HDFS nmenode in order to keep trk of the different sort orders. We disuss these hnges in Setion HDFS Nmenode Extensions In HDFS, the entrl nmenode keeps diretory Dir lok of loks, i.e., mpping lokid Set Of DtNodes. This diretory is required y ny opertion retrieving loks from HDFS. Hdoop MpRedue exploits Dir lok for sheduling. In Hdoop MpRedue whenever split needs to e ssigned to worker in the mp phse, the sheduler looks up Dir lok in the HDFS nmenode to retrieve the list of dtnodes hving repli of the ontined HDFS lok. Then, the Hdoop MpRedue sheduler will try to shedule mp tsks on those dtnodes if possile. Unfortuntely, the HDFS nmenode does not differentite the replis w.r.t. their physil lyouts. HDFS ws simply not designed for this. Thus, from the point of view of the nmenode ll replis re yte-equivlent nd hve the sme size. In HAIL, we need to llow Hdoop MpRedue to hnge the sheduling proess to shedule mp tsks lose to replis hving suitle index otherwise Hdoop MpRedue would pik indexes rndomly. Hene, we hve to enrih the HDFS nmenode to keep dditionl informtion out the ville indexes. We do this y keeping n dditionl diretory Dir rep mpping (lokid, dtnode)

8 8 Stefn Rihter et l. HAILBlokRepliInfo. An instne of HAILBlokRepliInfo ontins detiled informtion out the types of ville indexes for repli, i.e., indexing key, index type, size, strt offsets, et. As efore, Hdoop MpRedue looks up Dir lok to retrieve the list of dtnodes hving repli for given lok. However, in ddition, HAIL looks up the min memory Dir rep to otin the detiled HAILBlok- RepliInfo for eh repli, i.e., one min memory lookup for eh repli. HAILBlokRepliInfo is then exploited y HAIL to hnge the sheduling strtegy of Hdoop (we will disuss this in detil in Setion 4). 3.4 An Index Struture for Zero-Overhed Indexing In this setion, we riefly disuss our hoie of n pproprite index struture for indexing t miniml osts in HAIL s give some detils on our onrete implementtion. Why Clustered Indexes? An interesting question is why we fous on lustered indexes. For indexing with miniml overhed, we require n index struture tht is hep to rete in min memory, hep to write to disk, nd hep to query from disk. We tried numer of indexes in the eginning of the projet inluding orse-grnulr indexes nd unlustered indexes. After some experimenttion we quikly disovered tht sorting nd index retion in min memory is so fst tht tehniques like prtil or orse-grnulr sorting do not py off for HAIL. Whether you py three or two seonds for sorting nd indexing per lok during uplod is hrdly notiele in the overll uplod proess of HDFS. In ddition, mjor prolem with unlustered indexes is tht they re only ompetitive for very seletive queries s they my trigger onsiderle rndom I/O for non-seletive index trversls. In ontrst, lustered indexes do not hve tht prolem. Whtever the seletivity, we will red the lustered index nd sn the qulifying loks. Hene, even for very low seletivities the only overhed over sn is the initil index node trversl, whih is negligile. Moreover, s unlustered indexes re dense y definition, they require onsiderly more dditionl spe on disk nd require more write I/O thn sprse lustered index. Thus, using unlustered indexes would severely ffet uplod times. Yet, n interesting diretion for future work would e to extend HAIL to support dditionl indexes tht might oost performne, suh s itmp indexes nd inverted lists. 4 HAIL Jo Exeution We now fous on generl jo exeution in HAIL. First, we present from Bo s perspetive how he n enhne MpRedue jos to enefit from HAIL stti indexing (Setion 4.1). We will explin how Bo n write his MpRedue jos (lmost) s efore nd run them extly s when using Hdoop MpRedue. After tht we nlyze from the system s perspetive the stndrd Hdoop MpRedue pipeline nd then ompre how HAIL exeutes jos (Setion 4.2). We will see tht HAIL requires only smll hnges in the Hdoop MpRedue frmework, whih mkes HAIL esy to integrte into newer Hdoop versions (Setion 4.3). Figure 2 shows the query pipeline when Bo runs MpRedue jo on HAIL. Finlly, we riefly disuss the se of seletions on unindexed ttriutes, i.e., when jo requests stti index tht ws not reted, s motivtion for HAIL dptive indexing (Setion 4.4). 4.1 Bo s Perspetive In Hdoop MpRedue, Bo writes MpRedue jo, whih inludes jo onfigurtion lss, mp funtion, nd redue funtion. In HAIL, the MpRedue jo remins the sme (see 1 nd 2 in Figure 2), ut with three tiny hnges: (1) Bo speifies the HilInputFormt (whih uses Hil- ReordReder internlly) in the min lss of the MpRedue jo. By doing this, Bo enles his MpRedue jo to red HAIL Bloks (see Setion 3.2). (2) Bo nnottes his mp funtion to speify the seletion predite nd the projeted ttriutes required y his MpRedue jo 5. For exmple, ssume tht Bo wnts to write MpRedue jo tht performs the following SQL query (exmple from Introdution): SELECT soureip FROM UserVisits WHERE visitdte BETWEEN AND To exeute this query in HAIL, Bo dds to his mp funtion HilQuery nnottion s etween( , 2-1-1)", projetion={@1}) void mp(text key, Text v) {... } Where the in the filter vlue nd the in the projetion vlue denote the ttriute position in the UserVisits reords. In this exmple the third ttriute is visitdte nd the first ttriute is soureip. By nnotting his mp funtion s mentioned ove, Bo indites tht he wnts to reeive in the mp funtion only the projeted ttriute vlues of those tuples qulifying the speified seletion predite. In se Bo does not speify filter predites, HAIL will perform full sn s the stndrd Hdoop. At query time, if the HilQuery nnottion is set, HAIL heks (using the Index Metdt of dt lok) whether n index exists on the filter ttriute. Using suh n index llows us to speed up the jo exeution. HAIL lso uses the Blok Metdt to determine the shem of dt lok. This llows HAIL to red the ttriutes speified in the filter nd projetion prmeters only. (3) Bo uses HilReord ojet s input vlue in the mp funtion. This llows Bo to diretly red the projeted ttriutes without splitting the reord into ttriutes s he 5 Alterntively, HAIL llows Bo to speify the seletion predite nd the projeted ttriutes in the jo onfigurtion lss.

9 Towrds Zero-Overhed Stti nd Adptive Indexing in Hdoop 9 Bo's Perspetive System's Perspetive Bo 2 run Jo JoClient Hdoop MpRedue MpRedue Pipeline Pipeline Split Phse Sheduler Mp Phse JoTrker TskTrker HAILReordReder 1 write Jo MpRedue Jo for eh lok lok i { lotion = lok i.gethostwithindex(@3); reteinputsplit(lotion); } 3 send splits[] for eh split split i { llote split i to losest DtNode storing lok i } 5 llote Mp Tsk - Index ess or full sn - Post-filtering - For eh reord invoke mp(hilreord) - Adptive indexing? Min Clss mp(...) redue(...) hose 4 omputing Node 6 red lok i 7 store filter="@3 etween( , 2-1-1)", projetion={@1}) void mp(text k, HilReord v) { output(v.getint(1), null); }... HAIL Annottion Fig. 2 The HAIL query pipeline DN1 lok i lok i lok i DN3 DN4 DN5 DN6 DN7 DNn HDFS HDFS... would do it in the stndrd Hdoop MpRedue. For exmple, using stndrd Hdoop MpRedue Bo would write the following mp funtion to perform the ove SQL query: Mp Funtion for Hdoop MpRedue (pseudo-ode): void mp(text key, Text v) { String[] ttr = v.tostring().split(","); if (DteUtils.isBetween(ttr[2], " ", "2-1-1")) output(ttr[], null); } Using HAIL Bo writes the following mp funtion: Mp Funtion for HAIL: void mp(text key, HilReord v) { output(v.getint(1), null); } Notie tht, Bo now does not hve to filter out the inoming reords, euse this is utomtilly hndled y HAIL vi the HilQuery nnottion (s mentioned erlier). This nnottion is illustrted in Figure System Perspetive In Hdoop MpRedue, when Bo sumits MpRedue jo JoClient instne is reted. The min gol of the Jo- Client is to opy ll the resoures needed to run the MpRedue jo (e.g. metdt nd jo lss files). But lso, the JoClient fethes ll the lok metdt (BlokLotion[]) of the input dtset. Then, the JoClient logilly reks the input into smller piees lled input splits (split phse in Figure 2) s defined in the InputFormt. By defult, the Jo- Client omputes input splits suh tht eh input split mps to distint HDFS lok. An input split defines the input of mp tsk while n HDFS lok is horizontl prtition of dtset stored in HDFS (see Setion 3.1 for detils on how HDFS stores dtsets). For sheduling purposes, the JoClient retrieves for eh input split ll dtnode lotions hving repli of tht HDFS lok. This is done y lling gethosts() of eh BlokLotion. For instne, in Figure 2, dtnodes DN3, DN5, nd DN7 re the split lotions for split 42 sine lok 42 is stored on suh dtnodes. After this split phse, the JoClient sumits the jo to the JoTrker with the set of input splits to proess 3. Among other opertions, the JoTrker retes mp tsk for eh input split. Then, for eh mp tsk, the JoTrker deides on whih omputing node to shedule the mp tsk, using the split lotions 4. This deision is sed on dt-lolity nd vilility [13]. After this, the JoTrker llotes the mp tsk to the TskTrker (whih performs mp nd redue tsks) running on tht omputing node 5. Only then, the mp tsk n strt proessing its input split. The mp tsk uses ReordReder UDF in order to red its input dt lok i from the losest dtnode 6. Interestingly, it is the lol HDFS lient running on the node where the mp tsk is running tht deides from whih dtnode mp tsk will red its input nd not the Hdoop MpRedue sheduler. This is done when the ReordReder sks for the input strem pointing to lok i. It is worth notiing tht the HDFS lient hooses dtnode from the set of ll dtnodes storing repli of lok 42 (vi the gethosts() method) rther thn from the lotions given y the input split. This mens tht mp tsk might eventully end up reding its input dt from remote node even though it is ville lolly. One the input strem is opened, the ReordReder reks lok 42 into reords nd mkes ll to the mp funtion for eh reord. Assuming tht the MpRedue jo onsists of mp phse only, the mp tsk then writes its output k to HDFS 7. See [15,44, 14] for more detils on the MpRedue exeution pipeline. In HAIL, it is ruil to e non-intrusive to the stndrd Hdoop exeution pipeline so tht users run MpRedue jos extly s efore. However, supporting per-repli indexes in n effiient wy nd without signifint hnges to the stndrd exeution pipeline is hllenging for sev-

10 1 Stefn Rihter et l. erl resons. First, the JoClient nnot simply rete input splits sed only on the defult lok size s eh HDFS lok repli hs different size (euse of indexes). Seond, the JoTrker n no longer shedule mp tsks sed on dt-lolity nd nodes vilility only. The JoTrker now hs to onsider the existing indexes for eh HDFS lok. Third, the ReordReder hs to perform either index ess or full sn of HDFS loks without ny intertion with users, e.g. depending on the vilility of suitle indexes. Fourth, the HDFS lient nnot nymore open n input strem to given HDFS lok sed on dt-lolity nd nodes vilility only: it hs to onsider index lolity nd vilility s well. HAIL overomes these issues y minly providing two UDFs: the HilInputFormt nd the HilReordReder. Notie, tht y using UDFs we llow HAIL to e esy to integrte into newer versions of Hdoop MpRedue. We disuss these two UDFs in the following. 4.3 HilInputFormt nd HilReordReder HAILInputFormt implements different splitting strtegy thn stndrd InputFormts. This strtegy llows HAIL to redue the numer of mp wves per jo, i.e., the mximum numer of mp tsks per mp slot required to omplete this jo. Therey, the totl sheduling overhed of MpRedue jos is drstilly redued. We disuss the detils of the HAIL Splitting strtegy in Setion 7. HAILReordReder is responsile for retrieving the reords tht stisfy the seletion predite of MpRedue jos (s illustrted in the MpRedue Pipeline of Figure 2). Those reords re then pssed to the mp funtion. For exmple in Bo s query of Setion 4.1, we need to find ll reords hving visitdte etween nd To do so, for eh dt lok required y the jo, we first try to open n input strem to lok repli hving the required index. For this, HAIL instruts the lol HDFS Client to use the newly introdued gethostswithindex() method of eh BlokLotion so s to hoose the losest dtnode with the desired index. Let us first fous on the se where suitle, sttilly reted index is ville so tht HAIL n open n input strem to n indexed repli. One tht input strem hs een opened, we use the informtion out seletion predites nd ttriute projetions from the HilQuery nnottion or from the jo onfigurtion file. When performing n index-sn, we red the index entirely into min memory (typilly few KB) to perform n index lookup. This lso implies reding the qulifying lok prts from disk into min memory nd post-filtering reords (see Setion 3.4). Then, we reonstrut the projeted ttriutes of qulifying tuples from PAX to row lyout. In se tht no projetion ws speified y users, we then reonstrut ll ttriutes. Finlly, we mke ll to the mp funtion for eh qulifying tuple. For d reords (see Setion 3.1), HAIL psses them diretly to the mp funtion, whih in turn hs to del with them (just like in stndrd Hdoop MpRedue). For this, HAIL psses reord to the mp funtion with flg to indite d reord or not. 4.4 Prolem: Missing Stti Indexes Finlly, let us now disuss the seond se when Bo sumits jo whih filters on n unindexed ttriute (e.g. on durtion). Here, the HilReordReder must ompletely sn the required ttriutes of unindexed loks, pply the seletion predite nd perform tuple reonstrution. Notie tht, with stti indexing, there is no wy for HAIL to overome the prolem of missing indexes effiiently. This mens tht when the ttriutes used in the seletion predites of the worklod hnge over time, the only wy to dpt the set of ville indexes is to uplod the dt gin. However, this hs the signifint overhed of n dditionl uplod, whih goes ginst the priniple of zero-overhed indexing. Thus, HAIL introdues n dptive indexing tehnique tht offers muh more elegnt nd effiient solution to this prolem. We disuss this tehnique in the following Setion. 5 HAIL Zero-Overhed Adptive Indexing We now disuss the dptive indexing pipeline of HAIL. The ore ide is to rete missing ut promising indexes s yproduts of full sns in the mp phse of MpRedue jos. Similr to the stti indexing pipeline, our gol is gin to ome loser towrds zero overhed indexing. Therefore, we dopt two importnt priniples from our stti indexing pipeline. First, we piggyk gin on proedure tht is nturlly reding dt from disk to min memory. This llows HAIL to ompletely sve the dt red ost for dptive index retion. Seond, s mp tsks re usully I/Oound, HAIL gin exploits unused CPU time when omputing lustered indexes in prllel to jo exeution. In Setion 5.1, we strt with generl overview of the HAIL dptive indexing pipeline. In Setion 5.2, we fous on the internl omponents for uilding nd storing lustered indexes inrementlly. In Setion 5.3, we present how HAIL esses the indexes reted t jo runtime in wy tht is trnsprent to the MpRedue jo exeution pipeline. Finlly, in Setion 6, we introdue three dditionl dptive indexing tehniques tht mke the indexing overhed over MpRedue jos lmost invisile to users. 5.1 HAIL Adptive Indexing in the Exeution Pipeline For our motivting exmple, let s ssume Bo ontinues to nlyze his logs nd noties some suspiious tivities, e.g. mny user visits with very short durtion, inditing spm ot tivities. Therefore, Bo suddenly needs different jos for his nlysis tht selets user visits with short durtions. However, rell tht unfortuntely he did not rete stti index on ttriute durtion t uplod time whih would help

11 Towrds Zero-Overhed Stti nd Adptive Indexing in Hdoop 11 HAIL Input Split 1 proess Mp Redue HAILReordReder Blok 42 Blok Metdt Index Metdt Index TskTrker 3 Detil View of TskTrker 5 red 3 mp lok Mpper mp(k, V) {...} d TskTrker 5 m pss to indexer 2 6 AdptiveIndexer NmeNode Pseudo Blok 42 Blok 42 Blok 42 Blok 42 HDFS Repli Repli Repli Repli + d... DN 3... DN 5... DN 7... Fig. 3 HAIL dptive indexing pipeline. write d Blok 42 Blok Metdt Index Metdt Index d register 7 TskTrker 7 for these new jos. In generl, s soon s Bo (or one of his ollegues) sends new jo (sy jo d ) with seletion predite on n unindexed ttriute (e.g. on ttriute durtion, whih we will denote s d in the following.), HAIL nnot enefit from index sns nymore. However, HAIL tkes these jos s hints on how to dptively improve the repertoire of indexes for future jos. HAIL piggyks the retion of lustered index over ttriute durtion on the exeution of jo d. Without ny loss of generlity, we ssume tht jo d projets ll ttriutes from its input dtset. Figure 3 illustrtes the generl workflow of the HAIL dptive indexing pipeline. The figure shows how HAIL proesses mp tsks of jo d when no suitle index is ville (i.e., when performing full sn) in more detil. As soon s HAIL shedules mp tsk to speifi TskTrker 6, e.g. TskTrker 5, the HAILReordReder of the mp tsk first reds the metdt from the HAILInputSplit 1 7. With this metdt, the HAILReordReder heks whether suitle index is ville for its input dt lok (sy lok 42 ). As no index on ttriute d is ville, the HAILReordReder simply opens n input strem to the lol repli of lok 42 stored on DtNode 5. Then, the HAILReordReder: (i) lods ll vlues of the ttriutes required y jo d from disk to min memory 2 ; (ii) reonstruts reords (s our HDFS loks re in olumnr lyout); nd (iii) feeds the mp funtion with eh reord 3. Here lies the euty of HAIL: n HDFS lok tht is potentil ndidte for indexing ws ompletely trnsferred to min memory s prt of the jo exeution proess. In ddition to feeding the entire lok 42 to the mp funtion, HAIL n rete lustered index on ttriute d to speed up future jos. For this, the HAILReordReder psses lok 42 to the AdptiveIndexer s soon s the mp funtion finished proessing this dt lok 4. 8 The AdptiveIndexer, in turn, sorts the dt in lok 42 ording to ttriute d, ligns other 5 6 A Hdoop instne responsile to exeute mp nd redue tsks. 7 Tht ws otined from the HAILInputFormt vi getsplits(). 8 Notie tht, ll mp tsks (even from different MpRedue jos) running on the sme node intert with the sme AdptiveIndexer inttriutes through reordering, nd retes sprse lustered index 5. Finlly, the AdptiveIndexer stores this index with opy of lok 42 (sorted on ttriute d) s pseudo dt lok repli 6. Additionlly, the AdptiveIndexer registers the new reted index for lok 42 with the HDFS NmeNode 7. In ft, the implementtion of the dptive indexing pipeline solves some interesting tehnil hllenges. We disuss the pipeline in more detil in the reminder of this setion. 5.2 AdptiveIndexer Adptive indexing is n utomti proess tht is not expliitly requested y users nd therefore should not unexpetedly impose signifint performne penlties on users jos. Piggyking dptive indexing on mp tsks llows us to ompletely sve the red I/O-ost. However, the indexing effort is shifted to query time. As result, ny dditionl time involved in indexing will potentilly dd to the totl runtime of MpRedue jos. Therefore, the first onern of HAIL is: how to mke dptive index retion effiient? To overome this issue, the ide of HAIL is to run the mpping nd indexing proesses in prllel. However, interleving mp tsk exeution with indexing ers the risk of re onditions etween mp tsks nd the AdptiveIndexer on the dt lok. In other words, the AdptiveIndexer might potentilly reorder dt inside dt lok, while the mp tsk is still onurrently reding the dt lok. One might think out opying dt loks efore indexing to del with this issue. Nevertheless, this would entil the dditionl runtime nd memory overhed of opying suh memory hunks. For this reson, HAIL does not interleve the mpping nd indexing proesses on the sme dt lok. Insted, HAIL interleves the indexing of given dt lok (e.g. lok 42 ) with the mpping phse of the sueeding dt lok (e.g. lok 43 ), i.e., HAIL keeps two HDFS loks in memory t the sme time. For this, HAIL uses produeronsumer pttern: mp tsk ts s produer y offering dt lok to the AdptiveIndexer, vi ounded loking queue, s soon s it finishes proessing the dt lok; in turn, the AdptiveIndexer is onstntly onsuming dt loks from this queue. As result, HAIL n perfetly interleve mp tsks with indexing, exept for the first nd lst dt lok to proess in eh node. It is worth noting tht the queue exposed y the AdptiveIndexer is llowed to rejet dt loks in se ertin limit of enqueued dt loks is exeeded. This prevents the AdptiveIndexer to run out of memory euse of overlod. Still, future MpRedue jos with seletion predite on the sme ttriute (i.e., on ttriute d) n t their turn tke re of indexing the rejeted dt loks. One the AdptiveIndexer pulls dt lok from its queue, it proesses the dt lok using two stne. Hene, the AdptiveIndexer n end up y indexing dt loks from different MpRedue jos t the sme time.