Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt
Motivation Hadoop-tailored filesystems (e.g. CloudStore) and highperformance computing filesystems (e.g. PVFS) are tailored to considerably different workloads Existing investments in HPC systems and Hadoop systems should be usable for both workloads Avoid dedicating separate hardware for each type of workload Goal: Examine the performance of both types of workloads running concurrently on the same filesystem Goal: collect I/O traces from concurrent workload runs, for parallel filesystem simulator work
MapReduce-oriented filesystems Large-scale batch data processing and analysis Single cluster of unreliable commodity machines for both storage and computation Data locality is important for performance Examples: Google FS, Hadoop DFS, CloudStore
Hadoop DFS architecture 2"3,,4(567("$'8).%'.9$%!"#$%&'%(!)*%$+,$%(-".),&"/(!"0,$".,$1 "##$%&&"'())$*'$'+",*)-.!
High-Performance Computing filesystems 2)3456%$7,$+"&'%(8,+9:.)&3(7)/%;1;.%+; High-throughput, lowlatency workloads!!"#$%&$'()#$*)&+,-(.% -/&0123,.('4-(/56! 7'2$"&02&)'08,60*/'/&0, 2(9*)&0,/15,6&('/#0, 2-)6&0'6+,$"#$%6*005, :'"5#0,:0&.001,&$09! ;3*"2/-,.('4-(/58, 6"9)-/&"(1,2$024*("1&"1#! <=/9*-068,>?@A+,B)6&0'+, Architecture: separate compute and storage clusters, high-speed bridge between them Typical workload: >/1@A simulation checkpointing.$/01#()2314#(% 56'7840((9):%69'( Examples: PVFS, Lustre, PanFS!"#$%&'%(!)*%$+,$%(-".),&"/(!"0,$".,$1 "#$%&'()*%(&)+(#,$%-!
Running each workload on the non-native filesystem Two-sided problem: running HPC workloads on a Hadoop filesystem, and Hadoop workloads on an HPC filesystem Different interfaces: HPC workloads need a POSIX-like interface and shared writes Hadoop is write-once-read-many Different data layout policies
Running HPC workloads on a Hadoop filesystem Chosen filesystem: CloudStore Downside of Hadoop s HDFS: no support for shared writes (needed for HPC N-1 workloads) Cloudstore has HDFS-like architecture, and shared write support
Running Hadoop workloads on an HPC filesystem Chosen HPC filesystem: PVFS PVFS is open-source and easy to configure Tantisiriroj et al. at CMU have created a shim to run Hadoop on PVFS Shim also adds prefetching, buffering, exposes data layout
The two concurrent workloads IOR checkpointing workload writes large amounts of data to disk from many clients N-1 and N-N write patterns Hadoop MapReduce HTTP attack classifier (TFIDF) Using a pre-generated attack model, classify HTTP headers as normal traffic or attack traffic
Tracing infrastructure We gather traces to use for our parallel filesystem simulator Existing tracing mechanisms (e.g. strace, Pianola, Darshan) don t work well with Java or CloudStore Solution: our own tracing mechanisms for IOR and Hadoop
Tracing IOR workloads 2$"')&3(456(#,$7/,"89 Trace shim intercepts I/O calls, sends to stdio!!"#$%&'()*&$#+,-"%'&./0&$#11'2&'%34'&,5&',4)5 #$%&'()*&.$/#' #$%&'()* #$%&'()*&+*,-) #$%&'()*&0.##$ 89,*9&9;=)>?@*<-)88&AB=>?C*),:DE*;9)F> <((8)9>?8;G)>?)A:&9;=) #$%&'()*&1234 #$%&'()*&560.# 89:;< #$%&'()*&73/!"#$%&'%(!)*%$+,$%(-".),&"/(!"0,$".,$1!"
2$"')&3(4"5,,6(7"68%59'% Tracing Hadoop #$%&'$(+0%.%1234.+.$'%/ #$%&'$()*'+,-.'/ 56(+7()*'+,-.'/ :6%& ;$%".%!!"#$$%&'(")*+,&-.*/0&1(")2(3*4256-'2/! 78"/%420&'9-0::%;9-<='>-$+<?0<@@@@:&'$&'(")2&AB35 Tracing shim wraps filesystem interfaces, sends I/O calls! C"-2&9*42-6-'2/-&/$#*9*2#&'$&D("%&E+%>':F>'%>'5'(2"/-& to Hadoop logs! E:F&$%2("'*$+-&4$,,2#&'$&!"#$$%&%2(='"-G&4$,- 56(+70%.%1234.+.$'%/ 9%:;;3<*;=- #$%&'$(+0%.%84.34.+.$'%/ 56(+70%.%84.34.+.$'%/ (;$/%.><?)*'2%/'@<3):@<-.%$.A.)/'@<'2:A.)/'@<;3'$%.);2B3%$%/@<CCCD<E<$'-4*.<F'*%3-':</-G!"#$%&'%(!)*%$+,$%(-".),&"/(!"0,$".,$1!"
Experimental Setup System: 19 nodes, 2-core 2.4 GHz Xeon, 120 GB disks IOR baseline: N-1 strided workload, 64 MB chunks IOR baseline: N-N workload, 64 MB chunks TFIDF baseline: classify 7.2 GB of HTTP headers Mixed workloads: IOR N-1 and TFIDF, IOR N-N and TFIDF Checkpoint size adjusted to make IOR and TFIDF take the same amount of time
Naive performance predictions Each workload will perform better on its native filesystem Each workload will be slowed down considerably in the mixed experiments
Experimental results Classification throughput (MB/s) 20 15 10 5 0 TFIDF classification throughput, standalone and with IOR CloudStore Baseline with IOR N-1 with IOR N-N PVFS
Experimental results Write throughput (MB/s) 90 80 70 60 50 40 30 20 10 0 IOR checkpointing on CloudStore N-1 N-N IOR checkpointing on PVFS Standalone Mixed N-1 N-N
Experimental Results Runtime (seconds) 1000 900 800 700 600 500 400 300 200 100 Runtime comparison of mixed vs. sequential workloads 0 Sequential runtime Mixed runtime PVFS N-1 PVFS N-N CloudStore N-1 CloudStore N-N
Conclusions Developed I/O tracing mechanisms for IOR benchmarks and Hadoop MapReduce Analyzed performance of mixed MapReduce and HPC benchmarking workloads on PVFS and CloudStore TFIDF on PVFS is barely slowed down by IOR All other mixed workloads significantly slowed If only total elapsed time matters, the mixed workloads are faster Future work: use experimental results to improve parallel filesystem simulator
Questions?