The Case for Resource Sharing in Scientific Workflow Executions
|
|
|
- Jonah Butler
- 9 years ago
- Views:
Transcription
1 The Case for Resource Sharing in Scientific Workflow Executions Ricardo Oda, Daniel Cordeiro, Rafael Ferreira da Silva 2 Ewa Deelman 2, Kelly R. Braghetto Instituto de Matemática e Estatística Universidade de São Paulo (USP) São Paulo, SP Brazil 2 Information Sciences Institute University of Southern California (USC) Marina del Rey, CA USA {odaric,danielc,kellyrb}@ime.usp.br, {rafsilva,deelman}@isi.edu Abstract. Scientific workflows have become mainstream for conducting largescale scientific research. The execution of these applications can be very costly in terms of computational resources. Therefore, optimizing their resource utilization and efficiency is highly desirable, even in computational environments where the processing resources are plentiful, such as clouds. In this work, we study the case of exploring shared multiprocessors within a single virtual machine. Using a public cloud provider and real-world applications, we show that the use of dedicated processors can lead to sub-optimal performance of scientific workflows. This is a first step towards the creation of a self-aware resource management system inline with the state-of-the-art multitenant platforms.. Introduction Modern scientific experiments are automatized by means of workflows systems. Workflow systems have received increased attention from the research community since their first appearance in the 980s. Scientific workflows have become mainstream for conducting large-scale scientific research [Taylor et al. 2007]. Workflows can be seen as process networks that are typically used as data analysis pipelines or for comparing observed and predicted data, and that can include a wide range of components, e.g. for querying databases, for data transformation and data mining steps, for execution of simulation codes on high-performance computers, etc. [Ludäscher et al. 2006]. According to [Callaghan et al. 20], scientific workflow applications may include both high-performance computing (HPC) and high-throughput computing (HTC) tasks. While HPC couples high-speed networks with high-performance computers to provide specialized collections of resources to applications, HTC is characterized by the use of many computing resources over long periods of time to accomplish a computational task. In the past decade, HPC and HTC techniques have been used to probe vast amounts of raw scientific data. The processing of such data is very costly in terms of computational resources. Therefore, it is highly desirable that scientific applications maximize resource utilization and thereby efficiency, even in computational environments where the processing resources are plentiful, such as in multiprocessor computers, grids, and clouds. Scientific workflow management systems (SWfMSs) are a class of tools created to ease the development, management, and execution of complex workflows. SWfMSs
2 allow scientists to execute their HPC and HTC applications (expressed as workflows) seamlessly on a range of distributed platforms. In order to support a large variety of scientific applications, SWfMSs must make assumptions about the tasks they will execute and the underlying execution environment. Notably, most SWfMSs assume that each task requires a fully dedicated processor for execution. The one-task-per-processor assumption is mainly justified by software requirements related to the execution on clusters and grids. In these platforms, users often have exclusive access to the computational resources. However, with the large use of virtual machines (VMs) on multitenant platforms, where cloud computing is the most prominent example, similar assumptions may no longer be reasonable. A single physical machine can multiplex the execution of several VMs, without the user being aware of this. In this work, we study the case of exploring shared multiprocessors within a single VM. Using a public cloud computing provider and well-studied scientific applications, we show that the use of dedicated processors can lead to sub-optimal execution performance of scientific workflows. This work is a first step towards the creation of a multipurpose, self-aware resource management system inline with the state-of-the-art multitenant platforms. 2. Scientific Workflows Scientific workflows allow researchers to express multi-step computational tasks, for example: retrieve data from an instrument or a database, reformat the data, run analyses, and post-process results. A scientific workflow is basically composed of data processing tasks and the connections existing among these tasks. A workflow model is an abstract representation of a workflow that defines the tasks that must be performed and their (partial) execution order. A task is an atomic unit of work in the workflow model. The execution order of the tasks can be defined in terms of the dependency relationships that may exist among them, or in function of their data transfers (i.e., data flow). Therefore, a workflow model is often represented as a graph in a formal framework or in a workflow specification language. A workflow instance is a specific execution of a workflow, i.e. is an instantiation of a workflow model with its own input data and configuration parameters. In this work, a scientific workflow is modeled as a Directed Acyclic Graph (DAG), where nodes represent individual computational tasks and the edges represent data and control dependencies between tasks. Figure 2 shows an example of workflows modeled as DAGs. Despite their simplicity, DAGs enable the representation of the most common data flow patterns used to model scientific workflows, such as pipeline, data distribution (data parallelism), and data aggregation (data synchronization) all of them illustrated in the workflow model of Figure 2a. 2.. Execution on Cloud Computing Platforms Several works [Hoffa et al. 2008, Juve et al. 2009, Deelman et al. 2008] have shown that clouds are a viable computing platform for the execution of scientific applications. They can provide a large amount of on-demand, elastic resources at lower costs than those involved in acquiring and maintaining a high-performance cluster or supercomputer. Public cloud computing platforms offer virtualized resources that can be provisioned and released at any time. As a result, one can adapt the set of computational resources to the
3 specific resource requirements of workflow instances, even when the requirements vary during the instances execution [Deelman et al. 202]. Due to virtualization, scheduling techniques similar to the classic ones used to execute scientific workflows in grids can be extended to clouds. These techniques are designed to optimize specific performance goals such as total completion time (makespan) and average completion time of tasks. In addition, there is a clear tradeoff between performance and cost. Performance can be improved by provisioning additional resources at the expense of high costs. Finding a solution that satisfies both criteria (multi-objective problem) is known to be an NP-hard problem, thus several heuristics have been developed to tackle this problem [Malawski et al. 202, Fard et al. 202]. In this context, optimizing the use o computational resources (e.g., by using resource sharing) is a feasible approach to reduce costs with minimal impact on the overall workflow execution performance Scientific Workflow Management System (SWfMS) A scientific workflow management system (SWfMS) is a software tool that supports the modeling, instantiation, execution, monitoring, and analysis of scientific workflows. Several SWfMSs [Wolstencroft et al. 203, Ludäscher et al. 2006, Giardine et al. 2005, Deelman et al. 205] have been developed to allow scientists to: (i) describe their applications as high-level abstractions decoupled from the specifics of workflow execution; and (ii) execute these applications seamlessly on possibly complex, multi-site distributed platforms that can accommodate large-scale executions. Taverna [Wolstencroft et al. 203], Kepler [Ludäscher et al. 2006], Galaxy [Giardine et al. 2005], and Pegasus [Deelman et al. 205] are examples of open-source, general-purpose SWfMSs that have been largely used by the scientific community. Although these systems implement different automation approaches, most of them share common architectural components: (i) Execution environment represents the target execution infrastructure; (ii) Task Manager abstracts the complexity of the physical platform and provides mechanisms to execute workflow tasks in a scalable, efficient, and robust manner; (iii) Workflow Engine manages and monitors the state of the workflow execution; and (iv) Workflow Planner maps high-level workflow descriptions onto distributed resources. In this work, we have particular interest in the resource manager (execution environment), which controls the software and hardware components used in the workflow execution The Pegasus SWfMS We used the Pegasus SWfMS [Deelman et al. 205] as a case study and a driver for this work. Pegasus bridges the scientific domain and the execution environment by automatically mapping high-level abstract workflows descriptions onto distributed resources. It manages data on behalf of the user, infers the required data transfers, registers data into catalogs, and captures performance information while maintaining a common user interface for workflow submission. In Pegasus, workflows are described abstractly as DAGs, where nodes represent individual computational tasks and the edges represent data and control dependencies between tasks, without any information regarding physical resources or physical locations of data and executables. The abstract workflow description
4 is represented as a DAX (DAG in XML), describing all tasks, their dependencies, their required inputs, their expected outputs, and their invocation arguments. Figure shows an overview of the Pegasus architecture. The Workflow Mapper component of Pegasus is in charge of planning the workflow execution. During execution, Pegasus translates the abstract workflow into an executable workflow, determining the executables, data, and computational resources required for the execution. Pegasus maps executables to their installation paths or to a repository of stageable binaries defined in a Transformation Catalog. Workflow execution with Pegasus includes data management, monitoring, and failure handling, and is managed by DAG- Workflow Mapper Replica Catalog Tranformation Catalog Site Catalog Local Execution serial order Shell-based Engine Engine and Scheduler Execution Enviroment Test Environment Abstract Workflow Workflow Planning Executable Workflow HTCondor DAGMan Vanilla Universe HTCondor Pool Local Cloud HTC HPC Configuration Options Pegasus Cluster PegasusLite PMC HTCondor Schedd Grid Universe Remote Execution Figure. Overview of the Pegasus Man [Frey 2002]. Individual workflow SWfMS architecture. tasks are managed by a task scheduler (HTCondor [Thain et al. 2005]), which supervises their execution on local and remote resources. HTCondor supports a number of different execution layouts. The mapping of abstract workflow tasks to executable workflow jobs depends on the underlying execution infrastructure. In this work, we target execution platforms where jobs are executed in a non-shared file system environment. 3. Experimental Settings Most SWfMSs assume that activities must be executed isolated, one per available CPU or core. The multitenant architecture of cloud computing platforms does not provide the same level of isolation expected by HPC applications: the same physical machine can host more that one virtual machine without the user knowledge. We argue that on those platforms a moderate use of shared multiprocessors can actually improve the performance and decrease the cost of the execution. In the remaining of the text we will evaluate the execution of two real-world scientific applications in a public cloud computing platform. 3.. Resource Management Within Pegasus, HTCondor manages the available resources through slots of execution, in which tasks can be scheduled to perform their computation. Typically, HTCondor creates one slot per CPU-core. For example, in a platform composed of eight singlecore execution nodes, the resource manage will advertise eight slots of execution when using the default configuration. For this experiment, we configured HTCondor to create multiple slots from a single core. As a result, tasks can be mapped into slots within the same core, i.e. task executions will share the same CPU core. Memory and disk partitions can also be configured among the slots, however the analysis of the performance impact of disk and memory requirements is out of the scope of this work. Thus, we rely in the default configuration, which distributes them equally. HPC HTC
5 3.2. Scientific Applications Montage (Figure 2a): A workflow created by the NASA Infrared Processing and Analysis Center as an open-source toolkit to generate custom mosaics of astronomical images [Jacob et al. 2009]. In the workflow, the geometry of the output mosaic is calculated from the input images. The inputs are then re-projected to have the same spatial scale and rotation, the level of background emissions in all images is uniformed, and the reprojected, corrected images are co-added to form the output mosaic. The size of the workflow depends on the number of input images of the mosaic. Epigenomics (Figure 2b): A workflow that automates the execution of genome sequencing operations developed at the USC Epigenome Center [epi 205] to map the epigenetic state of human cells on a genome-wide scale. The workflow processes multiple sets of genome sequences in parallel. These sequences are split into subsets (chunks), that are posteriorly filtered to remove noisy and contaminating sequences, reformatted, and then mapped to a reference genome. The mapped sequences are finally merged and indexed for later analysis. The size of the workflow depends on the chunking factor used on the input data. (a) Montage (b) Epigenomics Figure 2. Examples of instances of the workflows used in the experiments Experiment Conditions Workflows were executed on the Google Compute Engine (GCE), the Infrastructure as a Service (IaaS) component of Google Cloud Platform [goo 205]. For each workflow run, we deployed a HTCondor pool of VMs by using PRECIP [Azarnoosh et al. 203], an experiment management tool that automates resource provisioning of cloud resources. In this experiment, we used n-standard- instance type. In an n series, a virtual CPU is implemented as a single hardware hyper-thread on a 2.6GHz Intel Xeon E5 (Sandy Bridge), 2.5GHz Intel Xeon E5 v2 (Ivy Bridge), or 2.3GHz Intel Xeon E5 v3 (Haswell).
6 An n-standard- machine is composed of one virtual CPU of 2.75 GCEUs (Google Compute Engine Units) and 3.75GB of memory. The image used to instantiate the VMs operates over Ubuntu 5.04 and uses Pegasus as the SWfMS, and HTCondor as the resource manager. We deployed 9 instances of the n-standard- machine, where one was used as a central node (submit node), and the remaining instances as execution nodes. Runs of each workflow instance were performed with different datasets or input parameter options. For the Montage workflow, a single dataset was used, and the degree input parameter was set to 0., 0.5,.0, 2.0, and 4.0 degrees, which leverages its size (in terms of number of tasks). Table shows the number of tasks per type for the Montage workflow. Two different input datasets (TAQ and HEP) were used for the Epigenomics workflow, which the number of tasks per task type are shown in Table 2. Task Table. Number of tasks per task type for the Montage workflow for different values of the degree parameter. Task TAQ HEP chr2 fast2bfq fastqsplit 2 7 filtercontams map mapmerge 3 8 pileup sol2sanger Table 2. Number of tasks per task type for the Epigenomics workflow for the TAQ and HEP datasets. In each execution of a workflow instance, each core of the execution nodes was partitioned into, 2, 4, or 8 slots. Therefore, each workflow run had 8 cores and between 8 to 64 total slots (i.e., 8 n slots, where n denotes the number of partitions). Table 3 summarizes the experimental settings. We characterize execution profiles for the scientific workflows described in the previous section by using the Kickstart [Vockler et al. 2006, Juve et al. 205] profiling tool from Pegasus. Kickstart monitors and records information about the execution of individual workflow tasks. It captures fine-grained profiling data such as process I/O, runtime, memory usage, and CPU utilization. Workflow profiles are then used to support the analyses of the experiment results.
7 TAQ Workflow Case Partitions Total Input Size Number of Executions Montage 0. 2, 4, MB 8 Montage 0.5 2, 4, MB 8 Montage.0 2, 4, MB 8 Montage 2.0 2, 4, MB 8 Montage 4.0 2, 4, 8.8 GB 8 Epigenomics TAQ 2, 4, 8.0 GB 8 Epigenomics HEP 2, 4, 8.79 GB 8 Table 3. Experimental settings. The number of partitions denote the number of slots advertised by the resources. 4. Results and Discussion In this section, we evaluate the impact of CPU-core partitioning on the overall performance of the workflow executions. We also investigate the performance gain of individual workflow tasks. Experiment results for partition (i.e., no resource sharing) are used as the baseline for comparison. Figure 3 shows the average walltime for both workflows. The walltime represents the turnaround time to execute all workflow tasks, which includes the execution of all computing jobs, and the extra jobs added by Pegasus for data management (e.g., data staging, cleanup tasks, and directory management). For the Montage workflow, the performance gain/loss is negligible for small degree instances (0., 0.5, and.0) due to the very low number of tasks, and thus very low degree of parallelism (see Table and Figure 2a), to the short duration of its tasks (see Figure 4), and the overhead inherent to the workflow management system, e.g. timespan to release next task [Chen and Deelman 20]. For higher degrees (2.0 and 4.0), the performance of the workflow execution is improved up to 9% as shown in Table 4. Figure 4 displays the average task runtime per task type and degree for the Montage workflow. Although the use of partitioned slots slightly impacts the runtime of individual tasks, the parallelism gain overcomes this loss Part. Montage Wall Time by Partitions Epigenomics Wall Time by Partitions Part HEP Figure 3. Workflow makespan evaluation for different partition sizes. For the Epigenomics workflow, an overall performance gain up to 6% is observed for partitions of size 2. Epigenomics execution is mostly dominated by the
8 0 Part. Montage 0. Degrees 0 Part. Montage 0.5 Degrees Montage.0 Degrees Part. 00 Montage 2.0 Degrees Part Montage 4.0 Degrees Part. 0.0 Figure 4. Average task runtime per task type and different degree values for the Montage workflow. map task (Figure 5), which process a significant amount of data and is also CPUintensive [Juve et al. 203, Ferreira da Silva et al. 203]. A very large partitioning size significantly impacts the runtime of map tasks and consequently slows down the workflow execution.
9 chr2 chr Epigenomics TAQ Part Epigenomics HEP Part. 0.0 fastqsplit fast2bfq filtercontams map mapmerge pileup sol2sanger 0.0 fastqsplit fast2bfq filtercontams map mapmerge pileup sol2sanger Figure 5. Average task runtime per task type and different degree values for the Epigenomics workflow. The slowdown experienced by individual tasks is mainly due to the decrease of the CPU utilization. For each experiment case, we measured the average CPU usage rate F for each task type T, defined as follows: F (T ) = T t T ( ) CPU timet runtime t () where T is the set of all tasks of type T within a workflow instance. The higher the value of F the higher is the CPU utilization (at most, which means 00%). Figure 6 shows the average CPU utilization rate for the Montage (top) and Epigenomics (bottom) workflows. Note that for the Montage workflow only measurements for 2.0 and 4.0 degrees are shown since they yield noticeable performance gain. Overall, the Epigenomics workflow is CPU-intensive, while Montage workflow is predominantly I/Ointensive [Juve et al. 203]. This result suggests that resource sharing optimizations for Epigenomics is limited to a small number of partitions as shown in Figure 5. Map tasks present an exponential decrease of CPU utilization as the number of partitions increase, which significantly affects the task runtime, and thereby slows down the workflow execution. On the other hand, linear CPU usage decreases (e.g.,,, and for the Montage workflow), at the expense of slightly slowing down task executions, yield fair performance gains. Table 4 summarizes the experimental results to evaluate the impact of shared multiprocessor (partitions) when running scientific workflows on cloud resources. The table highlights the average workflow makespan (walltime), and the average performance gains (+) or losses ( ) for different partition sizes. 5. Concluding Remarks Cloud computing platforms are known by their performance unpredictability. Virtual machines allow cloud providers to make a more clever use of the physical resources, at the expense of hiding details about the physical machines from users.
10 chr2 chr2 Montage 2.0 Degrees Part. Montage 4.0 Degrees Part Epigenomics TAQ Part 2 Part 4 Part 8 Part Epigenomics HEP Part 2 Part 4 Part 8 Part map filtercontams pileup map filtercontams pileup fastqsplit fast2bfq mapmerge sol2sanger fastqsplit fast2bfq mapmerge sol2sanger Figure 6. Average CPU utilization for the Montage (top) and Epigenomics (bottom) workflows per task type. Workflow Case Baseline 2 Partitions 4 Partitions 8 Partitions Walltime Gain Walltime Gain Walltime Gain Walltime Gain Montage % % % % % % % % % % % % % % % % % % % % Epigenomics TAQ % % % % HEP % % % % Table 4. Summary of performance gains for different partition sizes. The baseline is defined as runs with CPU-cores partitioned into a single slot, i.e. no resource sharing. We have studied the indirect impact of this unpredictability on scientific applications, generally designed for HPC platforms. Using two real-world scientific workflows and a well-known public cloud provider, we have analyzed the consequences of executing multiple workflow tasks concurrently in a same core.
11 In summary, shared multiprocessors within a single virtual machine can improve the execution of scientific workflows, however they should be sparingly used. Experimental results show that larger workflows benefit more from resource sharing. On public clouds, reducing the execution time also implies in lower costs due to lower lease times. As expected, CPU intensive jobs are likely to suffer performance degradation. Thus, even if the workflow execution time can be improved, ideally the use of resource sharing must be considered in a case-by-case basis for each task. As future work we are developing an automatic slot partitioning scheme. Using tasks profiling, a specialized scheduling can group low CPU tasks to take advantage of resource sharing. Also, dynamic allocation and configuration of resources can be mixed to attend specific needs of an ongoing workflow execution. Acknowledgments This research was funded by CNPq (process #34403/203-4), NAPSoL-PRP-USP and CAPES-Procad. We would like to thank Google for the awarded cloud platform credits that has made possible the experiments reported in this text. This research was also supported by the National Science Foundation under the SI 2 SSI program, award number References (205). Google Cloud. [Accessed ]. (205). USC Epigenome Center. [Accessed ]. Azarnoosh, S., Rynge, M., Juve, G., et al. (203). Introducing PRECIP: An API for managing repeatable experiments in the cloud. In The 203 IEEE International Conference on Cloud Computing Technology and Science - Volume 02, CLOUDCOM 3, pages Callaghan, S., Maechling, P., Small, P., et al. (20). Metrics for heterogeneous scientific workflows: A case study of an earthquake science application. Int. J. High Perform. Comput. Appl., 25(3): Chen, W. and Deelman, E. (20). Workflow overhead analysis and optimizations. In Proceedings of the 6th workshop on Workflows in support of large-scale science, pages 20. ACM. Deelman, E., Juve, G., and Berriman, G. B. (202). Using clouds for science, is it just kicking the can down the road? In The 2nd International Conference on Cloud Computing and Services Science, CLOSER 2, pages Deelman, E., Singh, G., Livny, M., Berriman, B., and Good, J. (2008). The cost of doing science on the cloud: The Montage example. In International Conference for High Performance Computing, Networking, Storage and Analysis, 2008, SC 08, pages 2. Deelman, E., Vahi, K., Juve, G., et al. (205). Pegasus, a workflow management system for science automation. Future Generation Computer Systems, 46(0):7 35.
12 Fard, H. M., Prodan, R., Barrionuevo, J. J. D., and Fahringer, T. (202). A multiobjective approach for workflow scheduling in heterogeneous environments. In The 2th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2, pages Ferreira da Silva, R., Juve, G., Deelman, E., et al. (203). Toward fine-grained online task characteristics estimation in scientific workflows. In 8th Workshop on Workflows in Support of Large-Scale Science, WORKS 3, pages Frey, J. (2002). Condor dagman: Handling inter-job dependencies. University of Wisconsin, Dept. of Computer Science, Tech. Rep. Giardine, B., Riemer, C., Hardison, R. C., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res., 5(0): Hoffa, C., Mehta, G., Freeman, T., et al. (2008). On the use of cloud computing for scientific workflows. In The IEEE Fourth International Conference on escience, escience 08, pages Jacob, J. C., Katz, D. S., Berriman, G. B., et al. (2009). Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking. Int. J. Comput. Sci. Eng., 4(2): Juve, G., Chervenak, A., Deelman, E., et al. (203). Characterizing and profiling scientific workflows. Future Generation Computer Systems, 29(3): Juve, G., Deelman, E., Vahi, K., et al. (2009). Scientific workflow applications on Amazon EC2. In The 5th IEEE International Conference on e-science, e-science 09, pages Juve, G., Tovar, B., Ferreira da Silva, R., et al. (205). Practical resource monitoring for robust high throughput computing. In Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, HPCMASPA 5, page to appear. Ludäscher, B., Altintas, I., Berkley, C., et al. (2006). Scientific workflow management and the Kepler system. Concurr. Comput. : Pract. Exper., 8(0): Malawski, M., Juve, G., Deelman, E., and Nabrzyski, J. (202). Cost- and deadlineconstrained provisioning for scientific workflow ensembles in IaaS clouds. In The International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2, pages 22: 22:. Taylor, I., Deelman, E., Gannon, D., and Shields, M. (2007). Workflows for e-science. Springer. Thain, D., Tannenbaum, T., and Livny, M. (2005). Distributed computing in practice: The Condor experience. Concurr. Comput. : Pract. Exper., 7(2-4): Vockler, J. S., Mehta, G., Zhao, Y., Deelman, E., and Wilde, M. (2006). Kickstarting remote applications. In International Workshop on Grid Computing Environments, GCE 06. Wolstencroft, K., Haines, R., Fellows, D., et al. (203). The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Research, 4(W):W557 W56.
Creating A Galactic Plane Atlas With Amazon Web Services
Creating A Galactic Plane Atlas With Amazon Web Services G. Bruce Berriman 1*, Ewa Deelman 2, John Good 1, Gideon Juve 2, Jamie Kinney 3, Ann Merrihew 3, and Mats Rynge 2 1 Infrared Processing and Analysis
SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION
SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION Kirandeep Kaur Khushdeep Kaur Research Scholar Assistant Professor, Department Of Cse, Bhai Maha Singh College Of Engineering, Bhai Maha Singh
Scientific Workflow Applications on Amazon EC2
Scientific Workflow Applications on Amazon EC2 Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta USC Information Sciences Institute {gideon,deelman,vahi,gmehta}@isi.edu Bruce Berriman NASA Exoplanet
Data Sharing Options for Scientific Workflows on Amazon EC2
Data Sharing Options for Scientific Workflows on Amazon EC2 Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Benjamin P. Berman, Bruce Berriman, Phil Maechling Francesco Allertsen Vrije Universiteit
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser {bux leser}@informatik.hu-berlin.de The 2nd international workshop on Scalable Workflow Enactment Engines and Technologies
Scientific Workflows in the Cloud
Scientific Workflows in the Cloud Gideon Juve and Ewa Deelman Abstract The development of cloud computing has generated significant interest in the scientific computing community. In this chapter we consider
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice Eddy Caron 1, Frédéric Desprez 2, Adrian Mureșan 1, Frédéric Suter 3, Kate Keahey 4 1 Ecole Normale Supérieure de Lyon, France
Rethinking Data Management for Big Data Scientific Workflows
Rethinking Data Management for Big Data Scientific Workflows Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani, Ewa Deelman Information Sciences Institute - University of Southern California Marina Del
Rethinking Data Management for Big Data Scientific Workflows
Rethinking Data Management for Big Data Scientific Workflows Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani, Ewa Deelman Information Sciences Institute - University of Southern California Marina Del
The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance
The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance G. Bruce Berriman, Ewa Deelman, Gideon Juve, Mats Rynge and Jens-S. Vöckler G. Bruce Berriman Infrared Processing
ANALYSIS OF WORKFLOW SCHEDULING PROCESS USING ENHANCED SUPERIOR ELEMENT MULTITUDE OPTIMIZATION IN CLOUD
ANALYSIS OF WORKFLOW SCHEDULING PROCESS USING ENHANCED SUPERIOR ELEMENT MULTITUDE OPTIMIZATION IN CLOUD Mrs. D.PONNISELVI, M.Sc., M.Phil., 1 E.SEETHA, 2 ASSISTANT PROFESSOR, M.PHIL FULL-TIME RESEARCH SCHOLAR,
Cloud Computing. Alex Crawford Ben Johnstone
Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a
Hosted Science: Managing Computational Workflows in the Cloud. Ewa Deelman USC Information Sciences Institute
Hosted Science: Managing Computational Workflows in the Cloud Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu [email protected] The Problem Scientific data is being collected at an
On the Performance-cost Tradeoff for Workflow Scheduling in Hybrid Clouds
On the Performance-cost Tradeoff for Workflow Scheduling in Hybrid Clouds Thiago A. L. Genez, Luiz F. Bittencourt, Edmundo R. M. Madeira Institute of Computing University of Campinas UNICAMP Av. Albert
Performance Analysis of Cloud-Based Applications
Performance Analysis of Cloud-Based Applications Peter Budai and Balazs Goldschmidt Budapest University of Technology and Economics, Department of Control Engineering and Informatics, Budapest, Hungary
Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure
Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure Emmanuell D Carreño, Eduardo Roloff, Jimmy V. Sanchez, and Philippe O. A. Navaux WSPPD 2015 - XIII Workshop de Processamento
Minimal Cost Data Sets Storage in the Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1091
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures Maciej'Malawski 1,2,'Piotr'Nowakowski 1,'Tomasz'Gubała 1,'Marek'Kasztelnik 1,' Marian'Bubak 1,2,'Rafael'Ferreira'da'Silva
Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820
Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 This white paper discusses the SQL server workload consolidation capabilities of Dell PowerEdge R820 using Virtualization.
Deploying Business Virtual Appliances on Open Source Cloud Computing
International Journal of Computer Science and Telecommunications [Volume 3, Issue 4, April 2012] 26 ISSN 2047-3338 Deploying Business Virtual Appliances on Open Source Cloud Computing Tran Van Lang 1 and
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction
Vol. 3 Issue 1, January-2014, pp: (1-5), Impact Factor: 1.252, Available online at: www.erpublications.com Performance evaluation of cloud application with constant data center configuration and variable
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
Early Cloud Experiences with the Kepler Scientific Workflow System
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
Li Sheng. [email protected]. Nowadays, with the booming development of network-based computing, more and more
36326584 Li Sheng Virtual Machine Technology for Cloud Computing Li Sheng [email protected] Abstract: Nowadays, with the booming development of network-based computing, more and more Internet service vendors
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
AN IMPLEMENTATION OF E- LEARNING SYSTEM IN PRIVATE CLOUD
AN IMPLEMENTATION OF E- LEARNING SYSTEM IN PRIVATE CLOUD M. Lawanya Shri 1, Dr. S. Subha 2 1 Assistant Professor,School of Information Technology and Engineering, Vellore Institute of Technology, Vellore-632014
WORKFLOW ENGINE FOR CLOUDS
WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds
Cluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
Optimal Service Pricing for a Cloud Cache
Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,
Resource Scalability for Efficient Parallel Processing in Cloud
Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the
Monitoring Elastic Cloud Services
Monitoring Elastic Cloud Services [email protected] Advanced School on Service Oriented Computing (SummerSoc 2014) 30 June 5 July, Hersonissos, Crete, Greece Presentation Outline Elasticity in Cloud
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
Semantic Workflows and the Wings Workflow System
To Appear in AAAI Fall Symposium on Proactive Assistant Agents, Arlington, VA, November 2010. Assisting Scientists with Complex Data Analysis Tasks through Semantic Workflows Yolanda Gil, Varun Ratnakar,
Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams
Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three
Efficient Data Management Support for Virtualized Service Providers
Efficient Data Management Support for Virtualized Service Providers Íñigo Goiri, Ferran Julià and Jordi Guitart Barcelona Supercomputing Center - Technical University of Catalonia Jordi Girona 31, 834
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Grid Computing Vs. Cloud Computing
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid
Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications
Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications Alessandro Ferreira Leite Université Paris-Sud/University of Brasilia [email protected] Claude Tadonki MINES
Duke University http://www.cs.duke.edu/starfish
Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists
Permanent Link: http://espace.library.curtin.edu.au/r?func=dbin-jump-full&local_base=gen01-era02&object_id=154091
Citation: Alhamad, Mohammed and Dillon, Tharam S. and Wu, Chen and Chang, Elizabeth. 2010. Response time for cloud computing providers, in Kotsis, G. and Taniar, D. and Pardede, E. and Saleh, I. and Khalil,
Virtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
Science Clouds: Early Experiences in Cloud Computing for Scientific Applications Kate Keahey and Tim Freeman
Science Clouds: Early Experiences in Cloud Computing for Scientific Applications Kate Keahey and Tim Freeman About this document The Science Clouds provide EC2-style cycles to scientific projects. This
SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM
SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM 1 KONG XIANGSHENG 1 Department of Computer & Information, Xinxiang University, Xinxiang, China E-mail:
Building Platform as a Service for Scientific Applications
Building Platform as a Service for Scientific Applications Moustafa AbdelBaky [email protected] Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department
Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments
International Journal of High Performance Computing Applications OnlineFirst, published on December 4, 2009 as doi:10.1177/1094342009356432 1 Introduction Grids and Clouds: Making Workflow Applications
Data Semantics Aware Cloud for High Performance Analytics
Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement
FREE AND OPEN SOURCE SOFTWARE FOR CLOUD COMPUTING SERENA SPINOSO ([email protected]) FULVIO VALENZA (fulvio.valenza@polito.
+ FREE AND OPEN SOURCE SOFTWARE FOR CLOUD COMPUTING SERENA SPINOSO ([email protected]) FULVIO VALENZA ([email protected]) + OUTLINE INTRODUCTION OF CLOUD DEFINITION OF CLOUD BASIC CLOUD COMPONENTS
CloudAnalyst: A CloudSim-based Visual Modeller for Analysing Cloud Computing Environments and Applications
CloudAnalyst: A CloudSim-based Visual Modeller for Analysing Cloud Computing Environments and Applications Bhathiya Wickremasinghe 1, Rodrigo N. Calheiros 2, and Rajkumar Buyya 1 1 The Cloud Computing
Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment
Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,
White Paper. Cloud Native Advantage: Multi-Tenant, Shared Container PaaS. http://wso2.com Version 1.1 (June 19, 2012)
Cloud Native Advantage: Multi-Tenant, Shared Container PaaS Version 1.1 (June 19, 2012) Table of Contents PaaS Container Partitioning Strategies... 03 Container Tenancy... 04 Multi-tenant Shared Container...
Mixing Hadoop and HPC Workloads on Parallel Filesystems
Mixing Hadoop and HPC Workloads on Parallel Filesystems Esteban Molina-Estolano *, Maya Gokhale, Carlos Maltzahn *, John May, John Bent, Scott Brandt * * UC Santa Cruz, ISSDM, PDSI Lawrence Livermore National
A CP Scheduler for High-Performance Computers
A CP Scheduler for High-Performance Computers Thomas Bridi, Michele Lombardi, Andrea Bartolini, Luca Benini, and Michela Milano {thomas.bridi,michele.lombardi2,a.bartolini,luca.benini,michela.milano}@
Solution for private cloud computing
The CC1 system Solution for private cloud computing 1 Outline What is CC1? Features Technical details Use cases By scientist By HEP experiment System requirements and installation How to get it? 2 What
Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010
Cloud Design and Implementation Cheng Li MPI-SWS Nov 9 th, 2010 1 Modern Computing CPU, Mem, Disk Academic computation Chemistry, Biology Large Data Set Analysis Online service Shopping Website Collaborative
Challenges of Managing Scientific Workflows in High-Throughput and High- Performance Computing Environments
Challenges of Managing Scientific Workflows in High-Throughput and High- Performance Computing Environments Ewa Deelman USC Informa1on Sciences Ins1tute h8p://www.isi.edu/~deelman Funding from DOE, NSF,
CiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
wu.cloud: Insights Gained from Operating a Private Cloud System
wu.cloud: Insights Gained from Operating a Private Cloud System Stefan Theußl, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien March 23, 2011 1 / 14 Introduction In statistics we
Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual
Hadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
Online Failure Prediction in Cloud Datacenters
Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
