Towards Auction-Based HPC Computing in the Cloud *

Size: px
Start display at page:

Download "Towards Auction-Based HPC Computing in the Cloud *"

Transcription

1 Computer Technology and Application 3 (2012) D DAVID PUBLISHING Towards Auction-Based HPC Computing in the Cloud * Moussa Taifi, Justin Y. Shi and Abdallah Khreishah Computer and Information Sciences Department, Temple University, Philadelphia PA 19122, USA Received: June 01, 2012 / Accepted: July 03, 2012 / Published: July 25, Abstract: Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect the most reasonable costs of computations. While fixed cloud pricing provides an attractive low entry barrier for compute-intensive applications, both the consumer and supplier of computing resources can see high efficiency for their investments by participating in auction-based exchanges. There are huge incentives for the cloud provider to offer auctioned resources. However, from the consumer perspective, using these resources is a sparsely discussed challenge. This paper reports a methodology and framework designed to address the challenges of using HPC (High Performance Computing) applications on auction-based cloud clusters. The authors focus on HPC applications and describe a method for determining bid-aware checkpointing intervals. They extend a theoretical model for determining checkpoint intervals using statistical analysis of pricing histories. Also the latest developments in the SpotHPC framework are introduced which aim at facilitating the managed execution of real MPI applications on auction-based cloud environments. The authors use their model to simulate a set of algorithms with different computing and communication densities. The results show the complex interactions between optimal bidding strategies and parallel applications performance. Key words: Auction-based cloud computing, fault tolerance, cloud HPC (high performance computing). 1. Introduction The economy of scale offers cloud computing virtually unlimited cost effective processing potentials. While it is in general difficult to assess the real cost of a computation task, the auction-based provisioning scheme offers a reasonable pricing structure. Theoretically, prices under fair market conditions should reflect the most reasonable costs of computations. The fairness is ensured by the mutual agreements between the sellers and the buyers. From the consumer s perspective, among all computing applications, High Performance Computing (HPC) applications are the biggest potential Corresponding author: Moussa Taifi, Ph.D. candidate, research fields: dependable cloud computing, high performance computing, auction-based cloud computing, fault tolerance. moussa.taifi@temple.edu. * This paper is an extended version of SpotMPI: a framework for auction-based HPC computing using amazon spot instances published in the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011). beneficiaries. From the seller s perspective, HPC applications represent the most reliable income stream since they are the most resource intensive users. Theoretically, resource usage efficiency is also maximized under the auction-based provisioning schemes. Traditional HPC applications are typically optimized for hardware features to obtain processing efficiency. Since transient component errors can halt the entire application, it has become increasingly important to create autonomic applications that can automate checkpoint and re-starting with little loss of useful work. Although the existing HPC applications are not suitable for volatile computing environments, with an automated Checkpoint-Restart (CPR) HPC toolkit, it is plausible that practical HPC applications could gain additional cost advantages using auction-based resources by dynamically minimized CPR overheads. Unlike existing MPI (Message Passing Interface)

2 500 Towards Auction-Based HPC Computing in the Cloud fault tolerance tools, the authors emphasize on dynamically adjusting optimal CPR intervals in order to offset the large number of out-bid failures typical in the volatile auction-based computing platform. The authors introduce a formal model and a HPC application toolkit, SpotHPC, to facilitate the practical execution of real MPI applications on volatile auction-based cloud platform. In section 2, the background and context of the current research are described. In section 3, the authors establish models for estimating running times of HPC applications using auction-based cloud resources. The proposed models take into account the time complexities of the HPC application, the overheads of checkpoint-restart, and the publicly available resource bidding history. They seek to unravel the inter-dependencies between the applications computing/communication complexities, the number of required processors, bidding prices and the eventual processing costs. The authors then introduce the SpotHPC toolkit and show how it can automate MPI application processing using volatile resources and the guidance of the formal models. In section 4, the proposed models are applied to recent bidding histories of Amazon EC2 HPC resources. Preliminary results for two HPC application types with different computing and communication complexities are reported. Conclusions are given in section 5 and give potential future research directions. 2. Background 2.1 Auction-Based Computing: Spot Instances Amazon is one of the first cloud computing vendors to provide at least two types of cloud instances: on-demand instances and spot instances. An on-demand instance has a fixed price. Once ordered, it provides service according to Amazon s Service Level Agreement (SLA). A spot instance is a type of resource whose availability is controlled by the current bidding price and the auctions market. There are unique characteristics of the auction-based computing platform. First, a stable computing environment can potentially be established by bidding the on-demand prices. Lower costs can be gained if the applications can tolerate partial failures. Thus, the most fault resilient implementation will gain the best possible cost effectiveness. Third, given an application, its required processing time and/or budget requirements, as well as the bidding history of required resources, it is possible to develop an optimized bidding strategy to meet the desired target(s). There are three special features of Amazon s spot instance pricing policy: A successful bid does not guarantee the exclusive resource access for the entire requested duration. The Amazon engine can terminate access at any time if a higher bid is received; Amazon does not charge a partial hour (job terminated before reaching the hour boundary) if the termination is caused by out-bidding. Otherwise, the partial hour is charged in full if the user terminates the job; Amazon will only charge the user the highest market price that may be less than the user s successful bid. The authors have chosen two types of Amazon EC2 HPC resources for this study. The cc1.4xlarge and the cg1.4xlarge are cluster HPC instances that provide cluster level performance (23 GB of memory, 8 cores, 10 Gigabit Ethernet). The main difference is the presence of GPUs (Graphical processing units, 2 x NVIDIA Tesla Fermi M2050) in the cg1.4xlarge which provides more power for compute intensive applications (Table 1). Fig. 1 records a sample market price history for the cc1.4xlarge instance type from May 5 to May 11, This instance type shows typical user behavior for more legacy HPC applications. The cg1.4xlarge instance type illustrates resources for HPC applications that can benefit from GPU processing. Since many legacy HPC applications are not yet suitable for GPU processing, cg1.4xlarge pricing history sees less fluctuations.

3 Towards Auction-Based HPC Computing in the Cloud 501 Table 1 Amazon HPC resource types. Instance type Description 23 GB memory, 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core (Nehalem cc1.4xlarge architecture)), 1690 GB storage, 64-bit platform, 10 Gigabit Ethernet 22 GB memory, 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core Nehalem cg1.4xlarge architecture), 2 x NVIDIA Tesla Fermi M2050 GPUs, 1690 GB storage, 64-bit platform,10 Gigabit Ethernet been developed specifically for HPC applications. These improvements have demonstrated hopeful features [1-3]. These improvements show the diminishing overhead of virtual machine monitors such as Xen [4]. Due to the severity of declining MTBFs, fault tolerance for MPI applications has also progressed. These developments inspired the design and development of SpotHPC. 2.3 Checkpoint-Restart (CPR) MPI Applications Fig. 1 Market prices of cc1.4 instance in may HPC in the Cloud Although HPC applications are the biggest potential beneficiaries of cloud computing, except for a few simple applications, there are still many practical concerns: Most mathematical libraries rely on optimized numerical codes that exploit common hardware features for extreme efficiencies. Some of the hardware features are not mapped in virtual machines. Hardware cache is one of the examples. Consequently, HPC applications suffer more performance drawbacks in addition to the normal virtualization overhead; Many HPC applications have high inter-processor communication demand. Current virtualized networks have difficulty meeting these high demands; All existing HPC applications handle only two communication states: success and failure. While success is a reliable state, failure is not. Existing applications treat timeout identical to failure. Consequently, any transient component failure can halt the entire application. Using volatile spot instances for these applications is a serious challenge. Initially, low-end cloud services provided little guarantee on the deliverable performance for HPC applications. Recently, high end cloud resources have Much research has been done in the past to provide seamless fault tolerance for MPI applications. FT-MPI [5] uses interactive process fault tolerance. Starfish [6] supports a number of CPR protocols and LA-MPI [7] provides message level failure tolerance. Egida [8] experimented with message logging grammar specification as means for fault tolerance. Cocheck [9] extends the Condor [10] scheduler to provide a set of fault tolerance methods that can be used by MPI applications. The authors choose the OpenMPI s coordinated CPR because of its cluster wide checkpoint advantage since more fine grained strategies will not work in this highly volatile environment [11-13]. The challenge faced by applications willing to use the volatile nature of spot instances is of a different kind than regular clusters. The behavior of the spot instances can be analyzed as a fail-stop mechanism. The Amazon engine uses the market information and the user s bid to terminate an instance with no prior notice [14]. This is called an out-of-bid failure. These out-of-bid failures require applications to adapt their run time to be frequently interrupted. While there are many different fault tolerance libraries for MPI, the authors choose the Open MPI s coordinated CPR mechanism [15]. The high volatility of spot instance platform makes many fine grained checkpoint strategies impractical. These include local checkpoints [11], multilevel checkpoints [12] and using pairs or groups of nodes to provide redundancy [13]. The OpenMPI s coordinated CPR is an

4 502 Towards Auction-Based HPC Computing in the Cloud all-or-nothing single tasks CPR interface, although with higher overheads, that guarantees to work correctly regardless the number of processors. Other single task CPR efforts involving spot instances include map-reduce applications [16-17]. A map-reduce application does not require inter-task communications. Parallel processing can be controlled external to the individual tasks. Therefore, spot instances can be used as accelerators via a simple job monitor that tracks and restarts dead jobs automatically [18]. Another noticeable effort studying the spot instances includes Refs. [19-20]. These are based on simulating the behavior of a single instance under different bids. Their work outlined the inherent tradeoff between completion time and budget. In Ref. [19], a decision model is proposed that describes a simulator that is able to determine under a set of conditions the total expected time of a single application. Another study, Ref. [20], discussed a set of checkpoint strategies that can maximize the use of the spot instances while minimizing the costs. Resource allocation strategies are also identified in Refs. [21-22]. This research work uses monetary and runtime estimation techniques to simulate the runtime of grid-based applications on such volatile infrastructure. These experiments also provide a heuristical study of the effect/benefits of generic fault tolerant techniques such as checkpointing, task duplication and migration. Additionally the research carried out in Ref. [23], explores the spot instance markets and captures the long term behavior of spot instances pricing. This includes the distributions of the prices, inter-price times as well as the difficulties related to fitting analytical distributions to a young market where sparse variations exist so far. In addition, the research in Ref. [24], points out the existence of different epochs in the pricing behavior and implications for using the spot price data. This report contributes to the understanding of the usefulness of the data that is publicly available and how decision models need to be aware of changes in pricing policies and the suppliers announcements concerning new prices and new pricing regions. While these research effort have clarified some of the challenges of spot instances, only few, such as Ref. [25], have touched specifically on HPC applications and how even the nature of the application, being high performance or high throughput, impacts the fault tolerance strategies used to mitigate the interruptions due to market fluctuations. While describing strategies for single applications is crucial to the understanding of the spot resources and auction-based computing in general, this knowledge is not fully usable in the context of HPC computing and there are many issues that need examination when applications are meant to scale to higher orders of magnitudes. To the best of the authors knowledge, there has been no direct evaluation of practical MPI applications on spot instances. The volatile auction-based computing platform challenges the established HPC programming practices. 2.4 Evaluating MPI Applications Using Auction-Based Platforms For HPC applications using large number of processors, the CPR overhead is the biggest cost factor. Without CPR optimization, it is unlikely to gain practical acceptance for MPI application to run on the volatile auction-based platforms. The authors report a theoretical model based on application resource time complexities [26] and optimal CPR models [27-28]. In addition, they describe a toolkit named SpotHPC that can support autonomic MPI application using spot instance clusters. This toolkit can monitor spot instances and bidding prices, automate checkpointing at bidding price (and history) adjusted optimal intervals and automatically restart application after out-bidding faults. 3. Theoretical Model The auction prices vary dynamically depending on the supply and demand in the Amazon market place.

5 Towards Auction-Based HPC Computing in the Cloud 503 There are no guidelines from Amazon as how the prices are set. Unlike other projects (e.g., Ref. [29]) that use autoregressive models to maximize the profit for fictitious cloud providers, the paper focuses on the intrinsic characteristics of users application and the bidding history. The authors are interested in the inherent dependencies between these characteristics and their impact on the optimal CPR intervals the largest cost factor for MPI applications to run on a volatile platform. 3.1 Bid-Aware Optimal CPR Interval The authors assume that the time between consecutive out-of-bid failures is exponentially distributed with rate. This allows the out-of-bid failures to be modeled the same way as traditional failures but at different rates depending on the bid price, which is referred as. Thus, the authors can extend the previous works on optimal CPR interval for distributed memory applications. This paper refers to the original CPR interval work by Ref. [28], which is extended by Ref. [27] and later adapted to MPI by Ref. [30]. The authors start their discussion using the same symbols. Similar to Ref. [30], the authors obtain the expected application running time with checkpoints and failures. Important assumptions are that out-of-bid failure occurs only once per checkpoint interval and all failures are independent: 2 This leads to the optimal CPR interval by Refs. [27-28, 30]: 2 A crucial difference between stable clusters and spot instances clusters is that an out-of-bid failure will force an application downtime that is absent for component failures. This means that the restart time cannot happen until the average downtime per out-of-bid failure has elapsed. can be obtained using the price history and the current bid. The expected running time using spot instances becomes (1) Another difference between traditional clusters and spot based clusters is that the failure rate is not fixed for all the instances. Instead the failure rate is a function of the bid proposed by the user and the market price. This situation requires using statistical tools to determine the failure rate of an instance given a bid price. The authors obtain the price history and determine the empirical cumulative distribution of the failures given a specific bid price. Then various fits of this distribution are obtained. Fig. 2 shows that by increasing the bid price, the CDF of the availability of the instance under that bid price also increases. This measure of price stability is used as a measure of failure free runtime. This means that the higher the bid price, the longer the availability time of an instance at that bid price. Since the proposed model relies on an exponential distribution, the authors use the corresponding exponential fit s parameters and simulate the runtime of various applications using this exponential failure rate. 3.2 SpotHPC Framework Due to the lack of toolkits that deal explicitly with auction-based HPC, the authors develop a framework to run distributed applications such as MPI on virtual clusters composed of failure-prone spot instances. SpotHPC is composed of four components: cluster orchestration, checkpoint/restart service, checkpoint forecaster and a monitoring service (Fig. 3). These modules are initially installed on an on-demand instance that is free of out-of-bid failures. The cluster monitoring service pulls the status and bidding prices of all instances continuously. The interactive bidding price and dynamic price history are used by the CPR calculator to generate the next optimal CPR interval. A composite timing model (next section) is responsible for estimating the total processing times. The checkpoint service saves the state of the MPI

6 504 Towards Auction-Based HPC Computing in the Cloud Fig. 4 Clustering workflow using HPCFY. Fig. 2 CDF of price probability per bid price for cc1.4xlarge instances. Fig. 3 SpotHPC architecture design. application in the users EBS (elastic block storage) volume at dynamically adjusted intervals. Any out-bid failure will cause the application to halt. Upon a future winning bid, the cluster orchestration rebuilds the virtual cluster to match the pre-failure configuration and the application is automatically restarted from the last checkpoint using the Open MPI restart library and the stored checkpoint. The current design of this framework includes the OpenMPI coordinated CPR library [15] for the checkpoint service based on the BLCR project [31]. The cluster orchestration is managed by the HPCFY library (Fig. 4) [32]. The OpenMPI and BLCR libraries facilitate the execution of automatic CPR at optimal intervals. The orchestration service, HPCFY, facilitates the creation and management of HPC clusters using virtual cloud resources such as Amazon EC2 instances. The HPC user is assumed to have an on-demand instance that will be used to install the SpotHPC packages and from which they deploy their own auction-based clusters. This process starts by requesting a set of VMs from the cloud controller by specifying a virtual machine image and type as well as the number of instances and the bid price to be used. The cloud controller allocates the requested number of spot VMs when the market price is favorable to the bid price. Subsequently, one of the nodes gets selected as the head node of the virtual cluster. The user then sends the cluster configuration to the puppet master that deploys the cluster configuration to the rest of the nodes. The worker nodes send back their statuses and get any new modification at regular intervals. Once the cluster stabilized/converged and the configuration is uniform on the nodes, the user is able to launch parallel and distributed applications from the head node. The project is openly available and can be easily downloaded and extended. Currently, this project supports popular HPC packages such as MPI, Hadoop/MapReduce with the large scale data mining package Mahout. Also, it provides automatic configuration of distributed user accounts security configurations and cluster monitoring using the Ganglia project [33]. Following the launch of the application, the monitoring system keeps track of the running application and performs checkpoints and restarts until the application is successfully run.

7 Towards Auction-Based HPC Computing in the Cloud Computational Results 4.1 Steady State Timing Model To evaluate, an estimate of the failure free processing time is needed. The authors use the steady state timing model [26] to determine required running time based on major component usage complexities. Table 2 shows the symbols used in timing models. The general problem of assessing the processing time of a parallel application is difficult. There are too many hard to quantify factors. However, a steady state timing model can capture the intrinsic dependencies between major time consuming elements, such as computing, communication and input/output, by using instrumented capabilities like and. The idea is to eliminate the non-essential constant factors. Thus contrasting timing models can reveal non-trivial parallel processing insights [34]. In this paper, the authors choose to study two typical algorithm classes (Table 3) for the use of spot instance computing. Timing models in general can be applied to all deterministic algorithm classes [26]. 4.2 Evaluation of Checkpointing Overheads on Amazon EC2 Cluster Instances A central goal of checkpointing libraries is to decrease the overhead incurred when saving the state of a parallel application. To quantify the effect of checkpointing on real machine, the authors conducted an experiment on the Amazon EC2 using cluster instances of type cc2.8xlarge and the MPI-based NAS benchmark. The cc2.8xlarge instances and the benchmark used are described in Table 4. The authors conducted an experiment on 4x cc2.8xlarge nodes by using 1 MPI task on each node. This experiment consisted of running classes C and D of the SP NAS benchmark and observing the effect of a set of checkpoint frequencies. The goal of this experiment is to show the importance of choosing a correct checkpoint frequency. Fig. 5 shows the slowdown incurred by two application sizes compared Table 2 Definition of symbols and variables for modeling the runtime. Symbol Description Interval of application-wide checkpoint Expected rate of out-of-bid failures corresponding to bid over t Time needed to create and store a checkpoint Time needed to read and recover a checkpoint Average out-of-bid downtime Estimated time needed to run the application with no checkpoints and no failures Expected running time between checkpoints Expected total running time Total observed time Number of processing units Instrumented processor capacity in number of computational steps per second Instrumented network capacity in bytes per second Problem size Number of iterations Parallel processing time Table 3 Algorithm classes A 1 and A 2. Compute and Sample communication Timing model application complexities :, 16 Molecular force simulation :, 16 Linear solvers Table 4 Checkpoint overhead experimental setup. AMI name Description 60.5 GB memory, 88 EC2 Compute Units, cc2.8xlarge 3370 GB storage, 64-bit platform, 10 Gigabit Ethernet Benchmark Description SP, scalar-pentadiagonal kernel, non-linear NAS-NPB 2.3 PDE solver Fig. 5 Impact of checkpoint frequency on the runtime.

8 506 Towards Auction-Based HPC Computing in the Cloud to failure free runtimes. In both cases the authors notice that increasing the checkpoint frequency increases the slowdown of the application. While failures do not include in this scenario, it can be found that over protective checkpointing strategies can lead to slowdowns of 2 to 3 times the failure free runtimes. On the other hand, due to the occurrence of failures, there should be a balance between the out-of-bid failure rate of the cluster nodes and the checkpoint frequency. For this effect, the authors use the model developed in section 6 to determine the optimal checkpoint interval/frequency based on an application characteristic and a corresponding bidding strategy. 4.3 Evaluation of Bid-Aware Optimal CPR Interval The bid-aware CPR interval is validated using non-optimal intervals. In Fig. 6, the behavior of the speed up is visualized under different CPR intervals. Fig. 6 shows the clear advantages of bid-aware optimal CPR intervals that have avoided longer completion times and higher total costs. The authors also notice that as the bid increases the advantage of optimal CPR interval decreases. This is because at higher bids, frequent checkpointing is not needed as much. 4.4 Bidding Price and Application Processing Time The authors are also interested in understanding, given the price history, how a new bid would affect the total processing time. Once this is done, the authors can then derive a number of other important metrics, such as speedup, efficiency, total cost, speedup per dollar, and efficiency per dollar deploying different numbers of processing units. In the following calculations, the authors assume: The application uses the bid-aware optimal CPR intervals; The HPC application is run under the optimal granularity and optimal degree of parallelism which allows the authors to set the synchronization overhead to zero; Fig. 6 A 1 speedup using 100 spot instances and different CPR intervals. The Amazon resources deliver the advertised capabilities. Then the steady state timing models in Table 3 can be plugged directly into Eq. (1): (2) (3) Eqs. (2)-(3) capture the intrinsic dependencies between critical factors, such as bidding price, price history, the number of spot instances ( and overall processing time. To minimize errors, program instrumentations are conducted to get the ranges of and. Table 5 shows the value ranges in our calculations. Results are reported in Figs First, it is observed that HPC applications can indeed gain practical feasibility using spot instances under optimized CPR intervals. As indicated by the Amdahl s law [35], the effect of diminishing returns is also clearly visible when the number of spot instances increases for the same algorithms. For (with linear communication complexity), speed up and efficiency drop significantly when spot instances are greater than 200. For (with communication complexity), speedup and efficiency drop much earlier.

9 Towards Auction-Based HPC Computing in the Cloud 507 Table 5 Critical parameters. Variable Range 200 to 1,000 instances measured algorithmic step per second using cc1.4xlarge Network speed: 250 MBps measured Problem size: 10 4 to 10 5 Number of iterations: 10 3 to 10 6 Fig. 7 A 1 using 200 to 1,000 spot instances with N =100,000 to 1,000,000 iterations. Fig. 8 A 2 using 200 to 1,000 spot instances for ns =10,000 and 1,000 iterations. The authors also notice that for, the bidding prices have much bigger impact on speedup than. The added dimension of bidding price reveals cost effectiveness of different configurations. Although higher bids can deliver better performance, the cost effectiveness actually decreases (see Speedup per Dollar charts). Therefore, the users should use these figures to optimize budget, processing deadline or anything in between. The non-trivial insight is the high price sensitivity for algorithms with high communication complexities. The cost effectiveness is also difficult to visualize without the proposed tools. These results provide the ground for selecting the best number of processors (spot instances) and the most promising bidding price for a given objective. 5. Conclusions Finding the optimal bidding strategy for any application is a difficult problem. For specific applications, the proposed approach gives reasonable predictions that can guide the choice of a promising bidding strategy based on the intrinsic dependencies of critical factors. The timing model along the bid-aware CPR model provide an effective tool to determine the optimal bid as well as the optimal number of processing units needed for completing a specific application. This research paves the ground for more specialized pricing models for cloud providers by giving more insights about the return of investment. For example, since the speedup gain slows down when the number of processors reaches a level, it makes sense to give lower prices as volume discounts that is sensitive to the communication complexities. The new pricing models may change users behavior that in term would also affect the providers that eventually would reach equilibrium. Meanwhile, the resource utilization is maximized. Other innovative ideas are also possible. For example, self-healing applications [36] could enjoy much better cost advantages by setting bidding ranges to organize defensive rings to protect the users core interests while maintaining the lowermost cost structures. Spot instances give the provider much freedom in dispatching resources for meeting dynamic users needs. This ultimate freedom allows for the ultimate computational efficiency and fair revenue/cost generation. It also challenges the HPC community for highly efficient and more flexible programming means that can automatically exploit cheaper resources on the fly.

10 508 Towards Auction-Based HPC Computing in the Cloud Acknowledgment This research is supported in part by the National Science Foundation grant CNS and educational resource grants from Amazon.com. References [1] L. Youseff, R. Wolski, B. Gorda, C. Krintz, Evaluating the performance impact of Xen on MPI and process execution for HPC systems, in: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006, p. 1. [2] C. Vecchiola, S. Pandey, R. Buyya, High-performance cloud computing: A view of scientific applications, in: Proceedings of 10th International Pervasive Systems, Algorithms, and Networks (ISPAN), 2009, pp [3] A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, D. Epema, Performance analysis of cloud computing services for many-tasks scientific computing, IEEE Transactions on Parallel and Distributed Systems 22 (6) (2011) [4] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield, Xen and the art of virtualization, in: Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003, pp [5] G.E. Fagg, J. Dongarra, FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world, in: Proceedings of the 7th European PVM/MPI Users Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2000, pp [6] A. Agbaria, R. Friedman, Starfish: fault-tolerant dynamic MPI programs on clusters of workstations, in: Proceedings of the 8th International Symposium on High Performance Distributed Computing, 1999, pp [7] R. Graham, S. Choi, D. Daniel, N. Desai, R. Minnich, C. Rasmussen, L. Risinger, M. Sukalski, A network-failure-tolerant message-passing system for terascale clusters, International Journal of Parallel Programming 31 (4) (2003) [8] S. Rao, L. Alvisi, H. Viny, Egida: An extensible toolkit for low-overhead fault-tolerance, in: 29th Annual International Symposium on Fault-Tolerant Computing, Digest of Papers. IEEE, 1999, pp [9] G. Stellner, Cocheck: Checkpointing and process migration for MPI, in: Proceedings of the 10th International Parallel Processing Symposium (IPPS 96), IEEE Computer Society, 1996, pp [10] M. Litzkow, T. Tannenbaum, J. Basney, M. Livny, Checkpoint and migration of Unix processes in the condor distributed processing system, Technical Report, [11] J. Hursey, J. M. Squyres, T.I. Mattox, A. Lumsdaine, The design and implementation of checkpoint/restart process fault tolerance for open MPI, in: Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007, pp [12] A. Moody, G. Bronevetsky, K. Mohror, B.R. Supinski, Design, modeling, and evaluation of a scalable multi-level checkpointing system, in: Proceedings of International High Performance Computing, Networking, Storage and Analysis (SC) Conference, 2010, pp [13] G. Zheng, L. Shi, L. Kale, FTC-Charm++: An in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI, in: Proceedings of 2004 IEEE International Conference on Cluster Computing, 2004, pp [14] Amazon HPC Cluster Instances, 2011, available online at: [15] J. Hursey, Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems, Ph.D. Dissertation, Indiana University, Bloomington, IN, USA, July [16] J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communications of the ACM 51 (1) (2008) [17] D. Borthakur, The Hadoop Distributed File System: Architecture and Design, 2007, available online at: [18] N. Chohan, C. Castillo, M. Spreitzer, M. Steinder, A. Tantawi, C. Krintz, See spot run: Using spot instances for MapReduce workflows, in: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, 2010, p. 7. [19] A. Andrzejak, D. Kondo, S. Yi, Decision model for cloud computing under SLA constraints, in: Proceedings of IEEE International Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2010, pp [20] S. Yi, D. Kondo, A. Andrzejak, Reducing costs of spot instances via checkpointing in the Amazon elastic compute cloud, in: 2010 IEEE 3rd International Conference on Cloud Computing, 2010, pp [21] W. Voorsluys, R. Buyya, Reliable provisioning of spot instances for compute-intensive applications, arxiv: v1, [22] W. Voorsluys, S. Garg, R. Buyya, Provisioning spot market cloud resources to create cost-effective virtual clusters, in: Proceedings of the 11th International Conference on Algorithms and Architectures for Parallel Processing, 2011, pp [23] B. Javadi, R. Buyya, Comprehensive statistical analysis

11 Towards Auction-Based HPC Computing in the Cloud 509 and modeling of spot instances in public cloud environments, Technical Report CLOUDS-TR , The University of Melbourne, [24] O. Ben-Yehuda, M. Ben-Yehuda, A. Schuster, D. Tsafrir, Deconstructing Amazon EC2 spot instance pricing, Technion-Israel Institute of Technology, Technical Report CS , [25] M. Taifi, J. Shi, A. Khreishah, SpotMPI: A framework for auction-based HPC computing using Amazon spot instances, in: Proceedings of ICA3PP, 2011, pp [26] J. Shi, Program scalability analysis, in: International Conference on Distributed and Parallel Processing, Georgetown University, Washington D.C., October [27] J. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems 22 (3) (2006) [28] J. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM 17 (9) (1974) [29] Q. Zhang, E. Gurses, R. Boutaba, J. Xiao, Dynamic resource allocation for spot markets in clouds, in: Proceedings of the 11th USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, [30] W. Gropp, E. Lusk, Fault tolerance in MPI programs, Special Issue of International Journal of High Performance Computing Applications 18 (2002) [31] P. Hargrove, J. Duell, Berkeley lab checkpoint/restart (BLCR) for Linux clusters, Journal of Physics: Conference Series 46 (2006) [32] M. Taifi, HPCFY-Virtual HPC Cluster Orchestration Library, available online at: [33] M. Massie, B. Chun, D. Culler, The ganglia distributed monitoring system: design, implementation, and experience, Parallel Computing 30 (7) (2004) [34] K. Blathras, D. Szyld, Y. Shi, Timing models and local stopping criteria for asynchronous iterative algorithms, Journal of Parallel and Distributed Computing 58 (3) (1999) [35] G. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, in: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference (AFIPS 67), ACM, pp [36] J. Shi, M. Taifi, A. Khreishah, J. Wu, Sustainable GPU computing at scale, in: 14th IEEE International Conference in Computational Science and Engineering, 2011, pp

Process Replication for HPC Applications on the Cloud

Process Replication for HPC Applications on the Cloud Process Replication for HPC Applications on the Cloud Scott Purdy and Pete Hunt Advised by Prof. David Bindel December 17, 2010 1 Abstract Cloud computing has emerged as a new paradigm in large-scale computing.

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications

EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications Jiang Dejun 1,2 Guillaume Pierre 1 Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing Abstract. Cloud

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Provisioning Spot Market Cloud Resources to Create Cost-Effective Virtual Clusters

Provisioning Spot Market Cloud Resources to Create Cost-Effective Virtual Clusters Provisioning Spot Market Cloud Resources to Create Cost-Effective Virtual Clusters William Voorsluys, Saurabh Kumar Garg, and Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Laboratory

More information

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment Daeyong Jung 1, SungHo Chin 1, KwangSik Chung 2, HeonChang Yu 1, JoonMin Gil 3 * 1 Dept. of Computer

More information

Infrastructure as a Service (IaaS)

Infrastructure as a Service (IaaS) Infrastructure as a Service (IaaS) (ENCS 691K Chapter 4) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ References 1. R. Moreno et al.,

More information

Dynamic resource management for energy saving in the cloud computing environment

Dynamic resource management for energy saving in the cloud computing environment Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan

More information

Profit-driven Cloud Service Request Scheduling Under SLA Constraints

Profit-driven Cloud Service Request Scheduling Under SLA Constraints Journal of Information & Computational Science 9: 14 (2012) 4065 4073 Available at http://www.joics.com Profit-driven Cloud Service Request Scheduling Under SLA Constraints Zhipiao Liu, Qibo Sun, Shangguang

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam

Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam Computer Technology and Application 4 (2013) 532-537 D DAVID PUBLISHING Performance of the Cloud-Based Commodity Cluster Van-Hau Pham, Duc-Cuong Nguyen and Tien-Dung Nguyen School of Computer Science and

More information

Network Infrastructure Services CS848 Project

Network Infrastructure Services CS848 Project Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud

More information

An Open MPI-based Cloud Computing Service Architecture

An Open MPI-based Cloud Computing Service Architecture An Open MPI-based Cloud Computing Service Architecture WEI-MIN JENG and HSIEH-CHE TSAI Department of Computer Science Information Management Soochow University Taipei, Taiwan {wjeng, 00356001}@csim.scu.edu.tw

More information

Dynamic Resource Pricing on Federated Clouds

Dynamic Resource Pricing on Federated Clouds Dynamic Resource Pricing on Federated Clouds Marian Mihailescu and Yong Meng Teo Department of Computer Science National University of Singapore Computing 1, 13 Computing Drive, Singapore 117417 Email:

More information

Resource Scalability for Efficient Parallel Processing in Cloud

Resource Scalability for Efficient Parallel Processing in Cloud Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, newlin_rajkumar@yahoo.co.in

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Cloud Computing. Alex Crawford Ben Johnstone

Cloud Computing. Alex Crawford Ben Johnstone Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a

More information

HPC ABDS: The Case for an Integrating Apache Big Data Stack

HPC ABDS: The Case for an Integrating Apache Big Data Stack HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox gcf@indiana.edu http://www.infomall.org

More information

Exploiting Performance and Cost Diversity in the Cloud

Exploiting Performance and Cost Diversity in the Cloud 2013 IEEE Sixth International Conference on Cloud Computing Exploiting Performance and Cost Diversity in the Cloud Luke M. Leslie, Young Choon Lee, Peng Lu and Albert Y. Zomaya Centre for Distributed and

More information

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM Akmal Basha 1 Krishna Sagar 2 1 PG Student,Department of Computer Science and Engineering, Madanapalle Institute of Technology & Science, India. 2 Associate

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Yahoo! Research Abstract Data analytics are key applications

More information

Cost-Performance of Fault Tolerance in Cloud Computing

Cost-Performance of Fault Tolerance in Cloud Computing Cost-Performance of Fault Tolerance in Cloud Computing Y.M. Teo,2, B.L. Luong, Y. Song 2 and T. Nam 3 Department of Computer Science, National University of Singapore 2 Shanghai Advanced Research Institute,

More information

- Behind The Cloud -

- Behind The Cloud - - Behind The Cloud - Infrastructure and Technologies used for Cloud Computing Alexander Huemer, 0025380 Johann Taferl, 0320039 Florian Landolt, 0420673 Seminar aus Informatik, University of Salzburg Overview

More information

Efficient and Enhanced Load Balancing Algorithms in Cloud Computing

Efficient and Enhanced Load Balancing Algorithms in Cloud Computing , pp.9-14 http://dx.doi.org/10.14257/ijgdc.2015.8.2.02 Efficient and Enhanced Load Balancing Algorithms in Cloud Computing Prabhjot Kaur and Dr. Pankaj Deep Kaur M. Tech, CSE P.H.D prabhjotbhullar22@gmail.com,

More information

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department

More information

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES Introduction Amazon Web Services (AWS), which was officially launched in 2006, offers you varying cloud services that are not only cost effective, but also

More information

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

WORKFLOW ENGINE FOR CLOUDS

WORKFLOW ENGINE FOR CLOUDS WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

Box Leangsuksun+ * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston,

Box Leangsuksun+ * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston, N. Saragol * Hong Ong# Box Leangsuksun+ K. Chanchio* * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston, LA, USA Introduction

More information

Cloud computing doesn t yet have a

Cloud computing doesn t yet have a The Case for Cloud Computing Robert L. Grossman University of Illinois at Chicago and Open Data Group To understand clouds and cloud computing, we must first understand the two different types of clouds.

More information

Optimizing Shared Resource Contention in HPC Clusters

Optimizing Shared Resource Contention in HPC Clusters Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD

INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD M.Rajeswari 1, M.Savuri Raja 2, M.Suganthy 3 1 Master of Technology, Department of Computer Science & Engineering, Dr. S.J.S Paul Memorial

More information

How To Secure Cloud Computing

How To Secure Cloud Computing Resilient Cloud Services By Hemayamini Kurra, Glynis Dsouza, Youssif Al Nasshif, Salim Hariri University of Arizona First Franco-American Workshop on Cybersecurity 18 th October, 2013 Presentation Outline

More information

C-Meter: A Framework for Performance Analysis of Computing Clouds

C-Meter: A Framework for Performance Analysis of Computing Clouds C-Meter: A Framework for Performance Analysis of Computing Clouds Nezih Yigitbasi, Alexandru Iosup, and Dick Epema {M.N.Yigitbasi, D.H.J.Epema, A.Iosup}@tudelft.nl Delft University of Technology Simon

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Cloud Computing and Amazon Web Services

Cloud Computing and Amazon Web Services Cloud Computing and Amazon Web Services Gary A. McGilvary edinburgh data.intensive research 1 OUTLINE 1. An Overview of Cloud Computing 2. Amazon Web Services 3. Amazon EC2 Tutorial 4. Conclusions 2 CLOUD

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Advanced File Sharing Using Cloud

Advanced File Sharing Using Cloud Advanced File Sharing Using Cloud Sathish.Y #1, Balaji.S *2, Surabhi.S #3 Student, Department of CSE,Angel College of Engineering and Technology,Tirupur,India. [1] Asst.Prof, Department of CSE,Angel College

More information

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction Vol. 3 Issue 1, January-2014, pp: (1-5), Impact Factor: 1.252, Available online at: www.erpublications.com Performance evaluation of cloud application with constant data center configuration and variable

More information

Performance Analysis of Cloud-Based Applications

Performance Analysis of Cloud-Based Applications Performance Analysis of Cloud-Based Applications Peter Budai and Balazs Goldschmidt Budapest University of Technology and Economics, Department of Control Engineering and Informatics, Budapest, Hungary

More information

THE vision of computing as a utility has reached new

THE vision of computing as a utility has reached new IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 201X 1 Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances Sangho Yi, Member, IEEE, Artur Andrzejak, and Derrick

More information

Methodology for predicting the energy consumption of SPMD application on virtualized environments *

Methodology for predicting the energy consumption of SPMD application on virtualized environments * Methodology for predicting the energy consumption of SPMD application on virtualized environments * Javier Balladini, Ronal Muresano +, Remo Suppi +, Dolores Rexachs + and Emilio Luque + * Computer Engineering

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu

More information

Data Mining with Hadoop at TACC

Data Mining with Hadoop at TACC Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical

More information

A Quantitative Analysis of High Performance Computing with Amazon s EC2 Infrastructure: The Death of the Local Cluster?

A Quantitative Analysis of High Performance Computing with Amazon s EC2 Infrastructure: The Death of the Local Cluster? Preliminary version. Final version appears In Proceedings of the 10 th IEEE/ ACM International Conference on Grid Computing (Grid 2009). Oct 13-15 2009. Banff, Alberta, Canada. A Quantitative Analysis

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

Experimental Investigation Decentralized IaaS Cloud Architecture Open Stack with CDT

Experimental Investigation Decentralized IaaS Cloud Architecture Open Stack with CDT Experimental Investigation Decentralized IaaS Cloud Architecture Open Stack with CDT S. Gobinath, S. Saravanan PG Scholar, CSE Dept, M.Kumarasamy College of Engineering, Karur, India 1 Assistant Professor,

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Efficient Load Balancing using VM Migration by QEMU-KVM

Efficient Load Balancing using VM Migration by QEMU-KVM International Journal of Computer Science and Telecommunications [Volume 5, Issue 8, August 2014] 49 ISSN 2047-3338 Efficient Load Balancing using VM Migration by QEMU-KVM Sharang Telkikar 1, Shreyas Talele

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 36 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 36 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 36 An Efficient Approach for Load Balancing in Cloud Environment Balasundaram Ananthakrishnan Abstract Cloud computing

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Experimental Study of Bidding Strategies for Scientific Workflows using AWS Spot Instances

Experimental Study of Bidding Strategies for Scientific Workflows using AWS Spot Instances Experimental Study of Bidding Strategies for Scientific Workflows using AWS Spot Instances Hao Wu, Shangping Ren Illinois Institute of Technology 10 w 31 St. Chicago, IL, 60616 hwu28,ren@iit.edu Steven

More information

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Dynamic Load Balancing of Virtual Machines using QEMU-KVM Dynamic Load Balancing of Virtual Machines using QEMU-KVM Akshay Chandak Krishnakant Jaju Technology, College of Engineering, Pune. Maharashtra, India. Akshay Kanfade Pushkar Lohiya Technology, College

More information

An On-Line Algorithm for Checkpoint Placement

An On-Line Algorithm for Checkpoint Placement An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute

More information

Converting A High Performance Application to an Elastic Cloud Application

Converting A High Performance Application to an Elastic Cloud Application Converting A High Performance Application to an Elastic Cloud Application Dinesh Rajan, Anthony Canino, Jesus A Izaguirre, and Douglas Thain Department of Computer Science and Engineering University of

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Migration of Virtual Machines for Better Performance in Cloud Computing Environment

Migration of Virtual Machines for Better Performance in Cloud Computing Environment Migration of Virtual Machines for Better Performance in Cloud Computing Environment J.Sreekanth 1, B.Santhosh Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,

More information

Optimal Service Pricing for a Cloud Cache

Optimal Service Pricing for a Cloud Cache Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,

More information

Building Cost-Effective Storage Clouds A Metrics-based Approach

Building Cost-Effective Storage Clouds A Metrics-based Approach Building Cost-Effective Storage Clouds A Metrics-based Approach Ning Zhang #1, Chander Kant 2 # Computer Sciences Department, University of Wisconsin Madison Madison, WI, USA 1 nzhang@cs.wisc.edu Zmanda

More information

Key words: cloud computing, cluster computing, virtualization, hypervisor, performance evaluation

Key words: cloud computing, cluster computing, virtualization, hypervisor, performance evaluation Hypervisors Performance Evaluation with Help of HPC Challenge Benchmarks Reza Bakhshayeshi; bakhshayeshi.reza@gmail.com Mohammad Kazem Akbari; akbarif@aut.ac.ir Morteza Sargolzaei Javan; msjavan@aut.ac.ir

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Ensuring Reliability and High Availability in Cloud by Employing a Fault Tolerance Enabled Load Balancing Algorithm G.Gayathri [1], N.Prabakaran [2] Department of Computer

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers

Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers Todd Deshane, Demetrios Dimatos, Gary Hamilton, Madhujith Hapuarachchi, Wenjin Hu, Michael McCabe, Jeanna

More information

AMAZING: An Optimal Bidding Strategy for Amazon EC2 Cloud Spot Instance

AMAZING: An Optimal Bidding Strategy for Amazon EC2 Cloud Spot Instance : An Optimal Bidding Strategy for Amazon EC2 Cloud Spot Instance ShaoJie Tang, Jing Yuan, Xiang-Yang Li Department of Computer Science, Illinois Institute of Technology, Chicago, IL 666 Department of Computer

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

SR-IOV: Performance Benefits for Virtualized Interconnects!

SR-IOV: Performance Benefits for Virtualized Interconnects! SR-IOV: Performance Benefits for Virtualized Interconnects! Glenn K. Lockwood! Mahidhar Tatineni! Rick Wagner!! July 15, XSEDE14, Atlanta! Background! High Performance Computing (HPC) reaching beyond traditional

More information

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida Amazon Web Services Primer William Strickland COP 6938 Fall 2012 University of Central Florida AWS Overview Amazon Web Services (AWS) is a collection of varying remote computing provided by Amazon.com.

More information

An Efficient Hybrid P2P MMOG Cloud Architecture for Dynamic Load Management. Ginhung Wang, Kuochen Wang

An Efficient Hybrid P2P MMOG Cloud Architecture for Dynamic Load Management. Ginhung Wang, Kuochen Wang 1 An Efficient Hybrid MMOG Cloud Architecture for Dynamic Load Management Ginhung Wang, Kuochen Wang Abstract- In recent years, massively multiplayer online games (MMOGs) become more and more popular.

More information

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Journal of Information & Computational Science 9: 5 (2012) 1273 1280 Available at http://www.joics.com VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Yuan

More information

A Chromium Based Viewer for CUMULVS

A Chromium Based Viewer for CUMULVS A Chromium Based Viewer for CUMULVS Submitted to PDPTA 06 Dan Bennett Corresponding Author Department of Mathematics and Computer Science Edinboro University of PA Edinboro, Pennsylvania 16444 Phone: (814)

More information

Building Platform as a Service for Scientific Applications

Building Platform as a Service for Scientific Applications Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department

More information

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

Analysis and Modeling of MapReduce s Performance on Hadoop YARN Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University tang_j3@denison.edu Dr. Thomas C. Bressoud Dept. of Mathematics and

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Statistical Modeling of Spot Instance Prices in Public Cloud Environments

Statistical Modeling of Spot Instance Prices in Public Cloud Environments 2011 Fourth IEEE International Conference on Utility and Cloud Computing Statistical Modeling of Spot Instance Prices in Public Cloud Environments Bahman Javadi, Ruppa K. Thulasiram, and Rajkumar Buyya

More information

REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters

REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters Tong Liu, Saeed Iqbal, Yung-Chin Fang, Onur Celebioglu, Victor Masheyakhi and Reza Rooholamini Dell Inc. {Tong_Liu,

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Cloud Computing and E-Commerce

Cloud Computing and E-Commerce Cloud Computing and E-Commerce Cloud Computing turns Computing Power into a Virtual Good for E-Commerrce is Implementation Partner of 4FriendsOnly.com Internet Technologies AG VirtualGoods, Koblenz, September

More information

Efficient Cloud Management for Parallel Data Processing In Private Cloud

Efficient Cloud Management for Parallel Data Processing In Private Cloud 2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Efficient Cloud Management for Parallel Data Processing In Private

More information