Cloud computing for fire engineering. Chris Salter Hoare Lea, London, United Kingdom, ChrisSalter@hoarelea.com

Cloud computing for fire engineering Chris Salter Hoare Lea, London, United Kingdom, ChrisSalter@hoarelea.com Abstract: Fire Dynamics Simulator (FDS) is a computing tool used by various research and industrial groups for the validation and verification of fire scenarios and for research purposes. FDS requires a high level of computing power to run these models and requires a large initial investment in computing power. As an alternative, cloud based modeling resources are available for users to run these simulations. This paper discusses Amazon EC2 and the use of it to model with FDS fire scenarios and the performance of models run on the cloud and locally. 1. Introduction Cloud computing, in its simplest form, is a term used to describe computing resources that are not owned or operated by an end user but are instead accessed directly over the internet. NIST describes cloud computing as: A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. [1] Cloud computing is being used throughout a number of industries [2] [6] but is yet to break into the fire engineering sector, other than the work conducted by Ang [7]. He states that for fire engineering, cloud computing can present a cost saving to the end user rather than the purchase and maintenance of an in-house computer cluster or modelling system. However, the performance of the system is not evaluated. Fire engineering can make use of cloud computing for modelling purposes. Fire Dynamics Simulator (FDS) [8] is considered an industry standard CFD software, used for fire engineering modelling, with it being extensively validated and verified. FDS is freely available and is able to run on all modern computers. The FDS user manual states: FDS requires a fast CPU and a substantial amount of random-access memory (RAM) to run efficiently. For minimum specifications, the system should have a 1 GHz CPU, and at least 512 MB RAM. [9] It goes on to state that the speed of the CPU (Central Processing Unit) affects the speed of the modelling and that the amount of RAM affects the size of the meshes (computational domain) that can be held in memory at any time. It can therefore be assumed that the faster the processor and more RAM that is available to the model, the faster a model will run. A fast running model, whilst perhaps the not most important function for research work, is a welcome addition to industry models, where models and timescales can affect the entire design process.

Therefore, industry spend considerable money and resources into maintaining and purchasing fast computers to run FDS, as Ang s research demonstrates. The cloud offers a different approach where companies only pay for the time that they use on an externally held computer system. If the speed of cloud modelling exceeds the speeds of a locally run model on a workstation, it may be worth considering using the cloud without local machines. This work investigates the ability of the cloud computing to run FDS models in the cloud and this compares to running on locally held resources. 2. Further Instructions FDS uses meshes to set out the computational domain for the calculations. In the past, all meshes would have been run on a single computer, with a single core. However, FDS has always tried to decrease the run time of models, whilst maintaining the accuracy that end users require. Part of this has been the introduction of running FDS using MPI (Message passing Interface) to offload different meshes onto different CPU s, so each mesh calculation could be run concurrently. Again, in the past, this may have meant the introduction of Beowulf clusters, or even a HPC (High Performance Computing) cluster. With the invention of multicore CPU s, it is possible for a single machine to run multiple meshes on the same model at once using the same technology as splitting the model over different machines. This now gives the end user a greater flexibility and ease of setup. It should be noted that speeds ups are not always assured, even splitting the model up into different meshes. It should be noted that this can be a fine art and relies on fire placement, mesh sizing and even mesh location. By splitting a model up into different meshes, there is usually a decrease in overall computing wall clock time. FDS is not really ideally suited to a HPC cluster, due to the way it calculates using meshes each mesh is assign to a CPU core for that core to work on. Using a 128 core cluster is no use if the FDS model does not use 128 meshes. With the release of FDS 6.1.0 in May 2014, OpenMP was added again to the calculation methods. This is supposed to split the calculations over a number of CPU s, regardless of the number of meshes within the model and therefore the overall computing time. This work also looks into the speed of an OpenMP model, compared to a MPICH run model. Previous work in this area has been carried out to move Finite Element Modelling into a cloud based environment by creating an in-house cloud [10] and in using publicly available resources [5]. These papers show that where multi core processing can be used and the models in question can run over these multiple cores, the model will show an improvement in modelling speed. Work conducted by Fenn et al show that performance of an Amazon EC2 cloud cluster is slower than a computer run locally to the research group. They state that this performance hit is due to the virutalisation techniques used to create the Amazon EC2 instances [11]. Part of this is considered to be down to the networking between the different instances used to make the cluster [12]. This is not a factor for FDS models, where, unless the model is significantly large enough to require a cluster, a smaller number of cores/meshes will suffice. This work looks in whether or not cloud computing resources can be used to run FDS models and if so, what sort of performance can be expected. 3. Cloud Computing Service

There are a number of cloud computing providers that offer computing services to users. Yet, not all cloud services are the same. A range of platforms and services exits. These are broadly broken down into Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). These three categories offer services to the end user. For example, SaaS offers a specific software to the end user, such as Google Documents or Office Online. PaaS offers a computing platform to the end user, essentially some computer power. Examples include Windows Azure and Google App Enginer. Lastly, IaaS offers the end user a complete desktop (the computing infrastructure) to achieve what they want. Examples of this include Amazon Elastic Compute Cloud (EC2) and Rackspace servers. To run FDS in the cloud, until a service is offered as a SaaS or PaaS system, then an IaaS system has to be used. In this work, Amazon EC2 has been used. This was chosen on the basis that it is a well-documented service, with prior work having been undertaken on it in different research sectors. This would allow comparisons to be drawn between this work and others. Amazon s EC2 service allows you to create and setup a computer within 15 minutes. It can run Windows or Linux and Amazon offer a range of services so you can pick and choose one that will complement the FDS model that you are running. It offers a number of instances an instance is the equivalent to a computer. These are: General Purpose - These are general use instances that have a mixed amount of compute power and RAM. Compute Optimised - These instances tiers are ideally suited to computationally heavy applications, like engineering and science applications. Memory Optimised - These instances have a higher RAM limit than some of the other instances. GPU Optimised - These instances contain high power graphics cards for applications that can take advantage of the computational speed of graphics processing units. Storage Optimised - These instances offer solid state hard drives for applications that require fast read and writes to the hard drive. Amazon uses a range of commercially available equipment to make up its data-centres - some of which is better than others in terms of speed. They therefore offer all their instances with what they call an ECU (Elastic Compute Unit) - this allows for comparison between each and every individual instance fairly. An ECU is about equivalent to a 1.0-1.2 GHz 2007 AMD Opteron CPU. These ECU s are split over a number of different virtual CPU s (or vcpu s). These vcpu s correlate to the number of a cores of a processor and the ECU s correspond to the speed of the individual cores of the machine. A higher ECU should increase the speed of the FDS modelling (as it s a faster processor) and if you can split the model into a number of different meshes or use OpenMP, more vcpu s should speed up the model as well (as you are providing more cores). Likewise, each tier has a different amount of RAM and FDS can be RAM intensive (a rule of thumb appears to be 1GB RAM used for every 1 million cells within a model). These two factors combined mean that to choose the correct instance, the whole aspect of the model needs to be considered. For example, if the instances are investigated in more detail, as shown in Table 1, then some of the general purpose instances may be considered to be insufficient for some FDS models.

Table 1: Amazon Instance Types Instance Name vcpu ECU Speed (ECU/vCPU) RAM (GiB) General Purpose m3.medium 1 3 3.00 3.75 m3.large 2 6.5 3.25 7.5 m3.xlarge 4 13 3.25 15 m3.2xlarge 8 26 3.25 30 Compute Optmised c3.large 2 7 3.50 3.75 c3.xlarge 4 14 3.50 7.5 c3.2xlarge 8 28 3.50 15 c3.4xlarge 16 55 3.44 30 c3.8xlarge 32 108 3.38 60 Memory Optimised r3.large 2 6.5 3.25 15.25 r3.xlarge 4 13 3.25 30.5 r3.2xlarge 8 26 3.25 61 r3.4xlarge 16 52 3.25 122 r3.8xlarge 32 104 3.25 244 There are two general purposes tiers, the M3 and the T2. The T2 tier is not suitable for FDS modelling as these tiers offer burst speeds for applications and instances where the CPU would not be 100 percent utilised (which FDS will) at all times. These have therefore not been included in the summary in Table 1. Likewise, there is little to be gained by running FDS on a storage or GPU optimised instance as FDS will not benefit from these two instances as FDS cannot make use of the GPU for calculations and CPU speed and RAM is more important to the modelling process than the write speed of the disk. However, there is also the cost of the instances to take into account - for example, the m3.xlarge offers 4 vcpu's and 15GB of RAM, compared to the c3.xlarge which offers 4 vcpu's but only 7.5GB RAM - however, the m3.xlarge costs $0.532 per hour, whereas the c3.xlarge costs only $0.376 per hour. Therefore, even if the c3.xlarge is slower, it may cost less overall to run the model. It can be seen that regardless of the instance chosen, that the instance should be tailored to the model being run to ensure that the cost of the model is kept to as low as practical. The costs of the models are shown in Table 2. Table 2: Amazon EC2 Instance Costs Instance Name Cost per house ($ per hour) r3.8xlarge 3.500 c3.8xlarge 3.008 r3.4xlarge 1.944 c3.4xlarge 1.504 r3.2xlarge 1.080 m3.2xlarge 1.064 c3.2xlarge 0.752 r3.xlarge 0.600 m3.xlarge 0.532 c3.xlarge 0.376

r3.large 0.300 m3.large 0.266 c3.large 0.188 m3.medium 0.133 Table 2 shows, the differences in costs between some of the instances are fairly minimal and therefore it could be considered worthwhile choosing the best that money could afford. The prices in Table 2 are the costs of the instances using Windows. If a Linux instance is used, the cost of the instance per hour is reduced (as the cost does not have to include costs of licensing to Microsoft). As FDS runs under Linux, this might be the preferred option of some users. Windows was used in this instance, purely for the ease of setup and use. Amazon payment methods for the EC2 instances mean that you pay by the hour and it can be hard to calculate how long an FDS model will take to complete and it is therefore hard to consider the total cost of running a model. Work by Overholt has created a tool that allows for an estimate to be made on the run time of a model, based on the.out file that is produced by FDS as it is running and includes data on the runtime of previous time-steps [13]. As a system where the cost is considered per hour, the race to the finish principle applies, where a cheaper instance may not be cost effective if a more expensive instance can complete its task before the cheaper instance has finished. It is therefore recommended that the users pay attention to the differences between the instances and the costs. FDS has a number of example scenarios that are distributed with the program to demonstrate the different capabilities FDS offers. Some of these files are a set of benchmarking files to allow users to record the run time of the mode on different equipment and see the effects. Two files are to be used, bench1.fds and scale1.fds. bench1.fds is a small scenario, designed to test the speed of the single core. scale1.fds is a multi core test with eight meshes. These benchmarks are to be run on each machine in a number of different configurations. As discussed previously, version 6.1.0 of FDS enables the use of OpenMP to speed up single core models. This paper will run the models in different configurations to test the benchmarks of the model on different Amazon EC2 instances and local machines. The user guide suggests that by using 4 cores in an OpenMP model, speed of modelling is to increase by a factor of 2 (two FDS files were tested, one with 643 cells and the other with 1283). The number of cores that the model is spread over can be controlled on Windows by specifying the number of threads available with the command: set OMP_NUM_THREADS=4 By changing these on each run of the model, the number of cores being used for the model can be changed to see how the effect of the number of cores has on the computation time, especially on the machines with a large number of cores. Alongside the OpenMP run of the scale1.fds model, an MPICH run will be undertaken to see how this compares to the OpenMP run of scale1.fds. This should provide some indication as to whether it makes more sense for the modeller to create models with either multiple meshes or to rely on OpenMP and use a single mesh (purely from a performance point of view - there may be reasons why a single mesh is required). This model contains eight meshes and therefore would ideally run the best on an eight core system (one mesh per core). However, by running on the machines without eight cores, the effect of OpenMP can be investigated (the OpenMP can use as many physical cores as the device has - a comparison of an OpenMP

model run with six cores can then be investigated against an MPICH model that uses only six cores for an eight mesh model). 4. Discussion The local machines are machines that are already owned by the company for computer modelling. More details of these machines can be seen in Table 3. These machines were benchmarked using the same version of as the Amazon models. The local machines also had a number of additional programs running that the Amazon instances would not have (such as corporate firewalls and other security software). This might impact performance slightly but it is considered this is reasonable (as any cluster or machine may have additional programs running on it). Care was taken to ensure that nothing was used on the machine whilst the benchmarks were run, the same as the online instances. Table 3: Hoare Lea Local Machines Machine CPU Name Speed Number of RAM (GB) Name (GHz) Physical Cores HL-001961 Intel Xeon X5660 2.8 6 24 HL-003910 Intel Xeon E3-1240 V2 3.4 4 32 HL-004292 Intel Xeon E5-2670 V2 2.5 10 128 These machines are fairly high end computers, designed to complete models as quick as possible for consultancy work, built within a number of limitations. The HL-004292 actually has a total of 20 cores (the machine contains two Intel Xeon E5-2670 V2 CPU s). Only the general purposes instances (m type) and the compute optimized instances (c type) were tested, as these differed in speed per vcpu, shown in Table 1. The memory optimized ones were not tested as it was considered that these would be comparable to the general purpose instances as the speed per vcpu is the same as the general purpose instances and budgetary reasons. Table 4: bench1.fds Results using OpenMP Machine Name Time (s) Cores 1 2 3 4 5 6 7 8 9 10 m3.medium 1120 m3.large 490 571 c3.large 471 551 m3.xlarge 483 367 434 449 c3.xlarge 458 349 419 430 Intel Xeon X5660 586 445 394 368 Intel Xeon E3-1240 V2 431 331 309 325 339 355 m3.2xlarge 562 425 383 359 411 420 458 465 c3.2xlarge 457 342 303 285 341 357 369 375 Intel Xeon E5-2670 v2 509 406 348 283 274 274 266 264 268 271

Figure 1: becnh1.fds Using OpenMP This shows that, in general, as you add more cores, the time taken to run the model decreases, something that is validated by the modelling conducted by the developers of, as discussed earlier. The developers state that the time taken decreases by about 50% with about four cores for the model. There is a point reached where the addition of more cores actually increases the time taken to run the model - this can clearly be seen in the case of the m3.large instance, where the addition of a second core actually decreases the speed. The result for the m3.medium appears to be an outlier - Table 1 indicates that the m3.medium should be 3.00 ECU units per core, and therefore slower than the other instances which are 3.25 ECU units per core. The results show that the m3.medium instances takes almost double the amount of time as the m3.large (and local machines), rather than a slightly higher increase. This would initially appear to be down to the lower amount of RAM that the instance has, and therefore swapping to disk often (slowing the calculation). Yet, the c3.large instance also has the same amount of RAM and therefore should see the same slow down in this instance if the RAM was the issue. Based on the results, this instance could potentially be a shared CPU with another Amazon instance but without confirmation, it can only be considered an outlier to the rest of the instances.

Likewise, there was little difference between the single core results of the other m3 models - this is to be expected, considering that the number of ECU s per core are the same and so, according to Amazon, the same performance for each instance is likely to be seen. However, the results then differ, based on the scaling of the model. The two core m3.large instance slows down once the calculations are split between the cores and whilst the m3.xlarge decreases for two cores, the modelling speed decreases once the third and fourth core are added, compared to to local machines which tend to reach a plateau. It seems that the addition of cores greater than half the number of cores on an Amazon instance will slow the model down - Amazon markets these as multiple cores but does not mention if these are hyper threaded cores (the wording is ambiguous). Hyper threaded CPU s appear to be a separate processor to the operating system but are in fact virtual CPU s - the aim being that certain tasks on a computer might require different areas of a CPU to conduct calculations. Hyper threading tries to make use of the unused areas of each processor and therefore appears as two CPUs. Prior research seems to indicate that the Amazon instances do make use of hyper threading and this is validated by the degradation in performance once 50% of the cores are used for modelling purposes. This hyper threading could potentially explain the slow down for the single core m3.medium instance - if this instance does not contain a physical CPU core, then this could explain the slow down. Table 5: scale1.fds Results using OpenMP Machine Name Time (s) Cores 1 2 3 4 5 6 7 8 9 10 m3.medium 28263 m3.large 13111 15841 c3.large 12587 15078 m3.xlarge 13080 10780 12998 13632 c3.xlarge 13039 10915 13348 14082 Intel Xeon X5660 16170 12934 11901 11392 Intel Xeon E3-1240 V2 10905 8826 8469 8820 m3.2xlarge 13971 11410 10576 10135 11702 12095 12671 13251 c3.2xlarge 12333 10126 9401 8986 10510 10954 11268 11916 Intel Xeon E5-2670 v2 12919 10465 9705 9475 9242 9451 9562 9402 9373 9029 The result for scale1.fds show the same trends as the bench1.fds modelling scores. This indicate that the by adding more cores, a speed up will occur on the modelling until the point on the EC2 instance where all the physical cores are utilised. Where hyper threaded cores start to be used, performance rapidly drops. In the case where cores exceed the size of the model, such as on HL-004292 which has ten physical cores on a single CPU, the continued addition of cores does not translate into a speed increase of the modelling process. Once about four cores are added to an OpenMP model, the speed increase is minimal and some in some instances, the modelling takes slightly longer (though, this can potentially be attributed to the

differences in the modelling process or standard background processes on the computer affecting the computation speed as this is not consistent along the model). This results show the same trend as the bench1.fds results, indicating that that the EC2 vcpu s contain hyper threaded physical CPU s amongst the number of vcpus offered and therefore can slow the modelling process down if the models used utilise these hyper threaded cores to conduct work on. Figure 2: scale1.fds Results Using OpenMP One of the issues that immediately became apparent was the setup of MPICH and FDS 6. Both packages are offered as an installer for Windows, however, the author was unable to get the installation working with the MPICH install. Therefore it was not possible to run the MPICH tests as a comparison to the OpenMP tests on the EC2 instances. With the time available for testing and running, the cause of this issue was unable to be determined. These tests were undertaken on the local machines as these had a fully working setup. This provides proof of the work conduct by Fenn et al in that EC2 instances may require specialist knowledge to setup and run. It should be noted that Amazon allows EC2 instances to be created and then saved so that an instance can be created with the modelling working and then saved for future use, though this does incur a regular charge.

Table 6: MPICH Run Times on Local Machine Computer Cores Time (s) Intel Xeon X5660 6 7,468.18 Intel Xeon E3-1240 V2 4 5,134.76 Intel Xeon E5-2670 V2 10 1,944.53 These models were run with the maximum number of physical cores that the computer had access to, using MPICH, rather than OpenMP. The model was built using eight meshes and therefore be spread over eight cores. On the machines with less than eight cores, some processors were allocated more than one mesh. The results indicate that the scale1.fds, the models ran quicker using MPICH than they did using OpenMP. In the instance of the Xeon E5-2670, there was an 80% speed up, this was less pronounced in the other machines (potentially due to multiple meshes being assigned to the same core) but the results were still quicker (42% quicker on the Xeon E3-1240 and 35% on the Xeon X5660) than using OpenMP. Therefore, it seems beneficial to split the modelling domain up to run the models on multiple cores by MPICH if at all possible. It is noted that this sample size is small and therefore there should be some consideration that this specific model may show results that are not indicative as a whole of the modelling process. Further work should be conducted in this area. Whilst the sample size was small, it appears that MPICH offers a fairly significant speed increase over the OpenMP implementation and therefore is probably better at being used for multi core processing. A more in depth study should be undertaken to ensure that this is the case. 5. Conclusion In conclusion, it can be seen that Amazon EC2 offers a service that allows users to rapidly set up and run models within a cloud based environment. This may require some initial technical knowledge and setup. This allows users to setup and run models when local resources are not present or are fully utilised, increasing the user s ability to increase and decrease modelling capacity as required. Whilst previous research has indicated that EC2 clusters are not viable as a full size research cluster, due a to slow down in the modelling caused by latency between the Amazon instances, this research shows that for the use of a single instance (which should be able to cover most models), that Amazon EC2 is a perfectly adequate alternative for a local modelling machine, providing performance comparable to local machines. This work reaches the same conclusion of Hill and Humphry [14], that smaller scale clusters/modelling on EC2 is feasible. By moving modelling into the cloud, end users can avoid the initial startup investments into a single high powered modelling machine and pay for the work that is only undertaken. The research shows that the number of cores advertised by Amazon are not the physical cores the instance has access to, but the virtual CPU cores (Amazon themselves call them virtual CPU s in the marketing documentation) the instance has access to. Users would have to take this into account on the instance selection and would therefore, potentially require an instance costing more per hour. Without taking this into account, the user is likely to see a performance degradation and may end up paying more money for the instance than may be required. Therefore, users should be aware of the limitations of the EC2 instances beforehand. The work in this paper has not covered the issues that might arise from modelling in the cloud, such as potential privacy issues (the data is stored on systems outside the control of the

end user) and downloading the data (can generate large results files that on a slow connection may be slow to download). Likewise, the paper has not covered the best method of recording speeds - this work has used an example scenario bundled with the program. Alternative benchmarking files might show a difference between the various instances and machines, however, it is considered that the FDS files used give a reasonable benchmark of the run time, as the same files were used across all the instances. References: [1] P. Mell and T. Grance, The NIST Definition of Cloud Computing: Recommendations of the National Institute of Standards and Technology, National Institute of Standards and Technology, 2011. [2] C. Evangelinos and C. N. Hill, Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon s EC2, in In The 1st Workshop on Cloud Computing and its Applications (CCA, 2008. [3] A. Apostu, E. Rednic, and F. Puican, Modeling Cloud Architecture in Banking Systems, Procedia Economics and Finance, vol. 3, pp. 543 548, Jan. 2012. [4] X. Man, S. Usui, S. Jayanti, L. Teo, and T. D. Marusich, A High Performance Computing Cloud Computing Environment for Machining Simulations, Procedia CIRP, vol. 8, pp. 57 62, Jan. 2013. [5] J.-P. Ebejer, S. Fulle, G. M. Morris, and P. W. Finn, The emerging role of cloud computing in molecular modelling., Journal of Molecular Graphics & Modelling, vol. 44, pp. 177 87, Jul. 2013. [6] J. L. Alvaro and B. Barros, A New Cloud Computing Architecture for Music Composition, Journal of Network and Computer Applications, vol. 36, no. 1, pp. 429 443, Jan. 2013. [7] E. Ang, Cloud Versus In-house Computing for Open Source Fire Modelling, Fire Technology, Nov. 2014. [8] National Institute of Standards and Technology, Fire Dynamics Simulator. 2013. [9] K. McGrattan, R. McDermott, S. Hostikka, J. Floyd, C. Weinschenk, and K. Overholt, Fire Dynamics Simulator (Sixth Edition) User s Guide, NIST, 1019, 2010. [10] I. Ari and N. Muhtaroglu, Design and Implementation of a Cloud Computing Service for Finite Element Analysis, Advances in Engineering Software, vol. 60 61, pp. 122 135, Jun. 2013. [11] M. Fenn, J. Holmes, and J. Nucciarone, A Performance and Cost Analysis of the Amazon Elastic Compute Cloud (EC2) Cluster Compute Instance, 2010. [12] G. Sakellari and G. Loukas, A Survey of Mathematical Models, Simulation Approaches and Testbeds Used for Research in Cloud Computing, Simulation Modelling Practice and Theory, vol. 39, pp. 92 103, Dec. 2013. [13] K. Overholt, FDS Runtime Estimator, Kristopher Overholt. [Online]. Available: http://www.koverholt.com/fds-runtime-estimator/. [Accessed: 08-Jul-2014]. [14] Z. Hill and M. Humphrey, A Quantitative Analysis of High Performance Computing with Amazon s EC2 Infrastructure: The Death of the Local Cluster?, in Proceedings of the 10th IEEE/ ACM International Conference on Grid Computing (Grid 2009), Banff, 2009.