Research Statement for Henri Casanova Advances in networking technology have made it possible to deploy distributed scientific applications on platforms that aggregate large numbers of diverse and distant resources. This distributed computing vision, recently popularized as grid computing, holds the promise of application executions at unprecedented scale, capacity, and performance. My research interests spans several theoretical and practical aspects of parallel and distributed computing, with an emphasis on computing on large-scale platforms. Large-scale distributed computing platforms have been in production for various scientific applications and with various modes of operation, going from high-end systems like the TeraGrid to volunteer computing systems like SETI@home. While sound engineering has enabled these various flavors of grid computing, many challenges are still to be addressed for grid computing to become widely available to a large range of applications and users, affording them both efficiency and ease-of-use. My recent research has focused on three fundamental such challenges in this context: (i) designing efficient and practical application scheduling and resource allocation strategies; (ii) measuring, understanding, modeling, and simulating platforms and applications; (iii) developing software methodologies and tools that make application deployment both straightforward and efficient. In my Ph.D. thesis I explored issues of performance prediction and scheduling. I developed stochastic models for capturing application behavior in scenarios that exhibit uncertainty in both data transfer times and computation times [1], which is common in large-scale platforms. During my Ph.D. I also developed NetSolve [2, 3], a software infrastructure for deploying applications on what was to become known as grid platforms. This work led me to co-author a chapter of the first Grid book [4]. I joined UCSD in 1999 and in 2001 I created the Grid Research And Innovation Laboratory (GRAIL), which has counted around 10 members, mostly graduate students and postdoctoral researchers. During the last four years at UCSD we have been able to explore the three aforementioned challenges as part of several NSF-funded projects, some of which are highlighted in what follows. Scheduling and Deploying Large-Scale Parameter Sweep Applications This work started upon my arrival at UCSD. At the time, grid computing was in its infancy and a critical goal was to enable the first generation of applications. In this view, I first focused on the simple yet popular parameter sweep application model: large numbers of independent computations that can be performed in a simple master-worker fashion. This seemingly straightforward model poses logistics of deployment challenges, which I had partially addressed with my work on NetSolve. I proceeded to extend this work to handle data locality, which is often a critical issue for application performance, in addition to the load balancing issue. I first developed novel scheduling heuristics for deploying parameter sweep applications that have complex data-sharing and data-locality requirements onto systems that consists of multiple compute sites, where each site contains one or more clusters with shared data storage. One of these heuristics, XSufferage, was compared in simulation to heuristics from the literature and showed significant performance improvements [5]. In addition, I explored how this heuristic could be made adaptive, periodically refining its 1
scheduling decisions to adapt to fluctuations in delivered platform performance. Simulation as well as real-world experiments showed that, thanks to adaptation, XSufferage indeed outperforms competing algorithms even in the presence of performance fluctuations [6]. This work has led to a software tool, APST [7, 8], which utilizes existing services to deploy applications and automatically schedules them with XSufferage. This software is currently in production for several scientific applications. Divisible Load Scheduling The above work was in the context of applications that have non-trivial data sharing and locality requirements. Equally important, and yet simpler, applications are ones that consist of a large number of independent, roughly identical tasks that all use distinct input, thus posing no data locality concerns. A recent theoretical framework has been proposed that approximates such low-granularity applications as a load that is continuously divisible. The Divisible Load Scheduling (DLS) problem consists in orchestrating communications and computations in a way that minimizes application execution time, with a key concern for pipelining of communication and computation (in addition to the usual load balancing issue). Effective pipelining can be achieved with scheduling algorithms that use multiple rounds. Together with a Ph.D. student, we first attacked the multi-round DLS problem from the theoretical perspective and have made four clear contribution over the state of the art. First, we have improved on the only previously proposed multi-round algorithm by extending it to account for network latencies. Second, we have developed the first multi-round algorithm that is applicable to heterogeneous platforms. Third, this algorithm automatically computes an optimal number of rounds, which was not done in previous approaches. Fourth, we have extended our approach to tolerate uncertainties in data transfer and computation times. This work resulted in a number of publications [9, 10, 11, 12], and provided the theoretical foundations for practical DLS in distributed and heterogeneous platforms. We then implemented our algorithms as part of the aforementioned APST software, which we extended to support divisible load. We have validated this practical implementation for several applications (e.g., MPEG-4 encoding) on a wide-area testbed [13]. Desktop Grids The previous two research projects target platforms that, while subject to some fluctuations in the performance delivered by resources, are relatively stable in terms of resource availability (i.e., infrequent downtimes). At the other extreme are so-called Desktop Grid platforms that aggregate the idle cycles of large numbers of individually owned desktop PCs. These cost-effective platforms have been popularized by projects such as SETI@home, and several software infrastructures are available today. The main challenge here is the volatility of the resources, which can be reclaimed by their owners without notice. Most applications successfully deployed on these platforms to date consist of large numbers of independent tasks and the performance metric is the task completion rate over long periods of time, which is ideally suited to volatile resources. In this project we explored the feasibility of running applications that consist of moderate numbers of tasks (e.g., comparable to the number of available desktop resources) with the objective of minimizing application execution time. Enabling such applications on desktop grids will dramatically increase their utility to a wider range of users, but requires techniques for intelligent resource selection 2
and computation redundancy, so as to mask volatility. First, we conducted measurements of availability on a real desktop grid to (i) analyze and understand the temporal structure and statistical properties of resource availability and (ii) obtain trace data that can be used for simulation. We measured host and CPU availability of a deployment of the Entropia desktop grid software at SDSC, obtaining the first high-quality measurement dataset of the effective power delivered by an enterprise desktop grid [14]. With this trace data as a basis for simulation we then studied several resource selection strategies that perform resource ranking, resource exclusion, and task replication. We designed these strategies guided by our analysis of our availability measurements (e.g., our dataset shows that CPU clock rate is a reasonable predictor of host performance in a typical desktop grid in spite of host volatility). Overall, we found that a heuristic that performs intelligent resource exclusion, based on clock rates, and does moderate task replication, based on time-outs, achieves by far the best performance, within a factor 1.7 of optimal in practice [15]. Measurement, Modeling, and Simulation A constant theme in my research is the need to understand large-scale, distributed computing platforms, which requires measurement data obtained on real platforms. While measurement methodologies and datasets are common place in areas such as network research, they are scarce in the grid computing area. One contribution of my work is the aforementioned measurement dataset collected for a desktop grid [14]. Another is the development a benchmark probes that exercise basic grid computing functionality [16, 17], which we have integrated as part of a grid monitoring system and have used to collect periodic measurements on a production platform [18]. Measurement datasets can be analyze to extract fundamental properties of the underlying platform, which I have used to reason about application deployment and performance [19]. Further, a collection of such datasets combined with technology surveys, can be used to develop generators of realistic synthetic platform configurations, which are fundamental for enabling research (ie.g., as seen in the internetworking area). We have developed a generator for grid platforms in [20]. With realistic platform models it then becomes possible to instantiate realistic simulations, which are necessary for conducting controlled and repeatable experiments. I have developed the SimGrid [21] simulator, which has gained popularity in the scheduling community, has been used by over 30 other researchers, most of whom have published their results, and whose second version has recently been released [22]. Our work on platform measurement, on generation of realistic synthetic platforms, and on the development of a simulation framework provides the necessary foundations for allowing the scientific study of large-scale computing platforms and applications. Future Directions I hereafter describe three broad research directions that I would like to follow in the upcoming years. Note that my work on platform measurement, characterization, modeling, and simulation will support this future research. Workflow Applications My recent work in the area of scheduling research and of development has been for applications that consist of independent tasks. Another class of applications that has gained increased popularity in the last few years is that of scientific workflows, which correspond to multiple components logically interconnected with 3
dependencies, branching, and iterations. While our APST software supports execution of workflows, and is used by several workflow applications, we have not conducted the necessary scheduling research to achieve high performance. In spite of the large number of efforts focusing on software support for workflows, the area of workflow scheduling on large-scale platforms remains relatively unexplored. A few results in the scheduling literature can be used as a basis for workflow scheduling (e.g., DAG scheduling algorithms), and I will focus on the many additional research advances needed to apply such results in practice. Job Scheduling In application scheduling, the goal is to optimize the performance metric of one application. By contrast, job scheduling consists in optimizing some aggregate metric across applications belonging to different users. Job scheduling is traditionally done on single systems by batch schedulers, which are sophisticated but often optimize metrics that are not user-centric (e.g., resource utilization). As we build large-scale, shared systems, there is an opportunity for taking a fresh look at job scheduling. I see three particularly interesting challenges. First is the definition of an aggregate, user-centric performance metrics that quantify both performance and fairness, and that can be optimized in a tractable way. We have recently made a preliminary contribution to answer this question by proposing a metric that can be optimized for the restricted case in which all applications are divisible loads [23]. Second is the question of resource sharing in in the presence of dynamic resource availability, as expected on any large-scale system. This question is largely unexplored for grid computing and I plan to use our desktop grid research on application scheduling as a basis for job scheduling strategies. Third, is the issue of decentralized job scheduling for better scalability and resilience. Several researchers have proposed just coupling multiple legacy batch-schedulers, but I believe that there is a great opportunity for a more forwardlooking approach that develops more fundamental systems principles. Very Large-Scale Computing The grid community has yet to address the challenges posed by very large-scale platforms. Virtual Organizations with tens of thousands of individual resources no longer seem in the distant future, and even larger systems are emerging with the advent of mobile technology and sensor networks. The traditional scheduling approach that consists in examining the whole universe of resources, although feasible today on most platforms, will not scale. Consequently, new ways for applications to express their resource requirements as well as new techniques for scoping the resource universe in a view to conducting approximate but scalable resource selection are needed. Together with several collaborators we have recently initiated a 5-year project, funded by NSF/ITR, that explores these issues, and I am particularly focusing on resource selection and scheduling. The first step is to design a simple resource requirement description language that is amenable to efficient searches in large-scale environments. An interesting question is then to identify the best design point between scalability and quality of the resource selection. While such issues have been partially explored in the peer-to-peer community for file sharing applications, many new challenges arise for supporting broader applications that have a computational component (e.g., complex application structure, data/computation locality, time sequencing, stability and predictability of performance). This work will build on but also challenge the current state-of-the-art of grid resource discovery and selection in a view to enable truly large-scale platforms in the 5-10 year range. 4
References [1] H. Casanova, M. Thomason, and J. Dongarra. Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments. Journal of Parallel and Distributed Computing, 58(1):68 91, 1999. [2] H. Casanova and J. Dongarra. NetSolve: A Network-Enabled Server for Solving Computational Science Problems. In Proceedings of Supercomputing 1996 (SC 96), Nov. 1996. [3] H. Casanova and J. Dongarra. Using Agent-Based Software for Scientific Computing in the NetSolve System. Parallel Computing, 24:1777 1790, 1998. [4] J. Dongarra, H. Casanova, C. Johnson, and M. Miller. Application-Specific Tools. In I. Foster and C. Kesselman, editors, Computational Grids: Blueprint for a New Computing Infrastructure. M. Kaufmann Publishers, Inc., 1999. [5] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman. Heuristics for Scheduling Parameter Sweep Applications in Grid environments. In Proceedings of the 9th Heterogeneous Computing Workshop (HCW 00), pages 349 363, May 2000. [6] H. Casanova, G. Obertelli, F. Berman, and R. Wolski. The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. Scientific Programming, 8(3):111 126, 2001. Extended version of a paper in Proceedings of Super Computing 2000 (SC 00). [7] H. Bal, H. Casanova, J. Dongarra, and S. Matsuoka. Application-Level Tools. In I. Foster and C. Kesselman, editors, Grid 2: Blueprint for a New Computing Infrastructure. M. Kaufmann Publishers, Inc., 2nd edition, 2003. [8] H. Casanova and F. Berman. Parameter Sweeps on the Grid with APST. In F. Berman, G. Fox, and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality. Wiley Publishers, Inc., 2003. [9] Y. Yang and H. Casanova. UMR: a Multi-Round Algorithm for Scheduling Divisible Workloads. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), April 2003. [10] O. Beaumont, H. Casanova, A. Legrand, Y. Robert, and Y. Yang. Scheduling Divisible Loads on Star and Tree Networks: Results and Open Problems. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2004. to appear. [11] Y. Yang and H. Casanova. Multi-Round algorithms for Scheduling Divisible Workloads. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2005. to appear. [12] Y. Yang and H. Casanova. RUMR: Robust Scheduling for Divisible Workloads. In Proceedings of the 12th IEEE Symposium on High-Performance Distributed Computing (HPDC-12), pages 114 125, June 2003. 5
[13] K. van der Raadt, Y. Yang, and H. Casanova. Practical Divisible Load Scheduling on Grid Platforms with APST-DV. submitted to the International Parallel and Distributed Processing Symposium (IPDPS 05), 2004. [14] D. Kondo, M. Taufer, C. L. Brooks, H. Casanova, and A. Chien. Characterizing and Evaluating Desktop Grids: An Empirical Study. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 04), April 2004. [15] D. Kondo, A. Chien, and H. Casanova. Resource Management for Short-Lived Applications on Enterprise Desktop Grids. In Proceedings of SC 04, November 2004. [16] A. Snavely, G. Chun, H. Casanova, R. Van der Wijngaart, and M. Frumkin. Benchmarks for Grid Computing: A Review of Ongoing Efforts and Future Directions. SIG- METRICS Performance Evaluation Review, 30(4):27 32, 2003. [17] G. Chun, H. Dail, H. Casanova, and A. Snavely. Benchmark Probes for Grid Assessment. In Proceedings of the High-Performance Grid Computing Workshop, April 2004. [18] S. Smallen, M. Murray, C. Mills Olschanonowsky, A. Snavely, and H. Casanova. Benchmarking and Measuring Grid Platforms: Software Tools and Results on the TeraGrid. Poster at SC 04, 2004. [19] H. Casanova. Network Modeling Issues for Grid Application Scheduling. International Journal of Foundations of Computer Science (IJFCS), 2005. to appear. [20] Y.-S. Kee, H. Casanova, and A. Chien. Realistic Modeling and Synthesis of Resources for Computational Grids. In Proceedings of SC 04, November 2004. [21] H. Casanova. SimGrid: A Toolkit for the Simulation of Application Scheduling. In Proceedings of the 1st IEEE International Symposium on Cluster Computing and the Grid (CCGrid 01), pages 430 437, May 2001. [22] A. Legrand, L. Marchal, and H. Casanova. Scheduling Distributed Applications: The SimGrid Simulation Framework. In Proceedings of the Third IEEE International Symposium on Cluster Computing and the Grid (CCGrid 03), May 2003. [23] L. Marchal, Y. Yang, H. Casanova, and Y. Robert. A Realistic Network/Application Model for Scheduling Divisible Loads on Large-Scale Platforms. submitted to the International Parallel and Distributed Processing Symposium (IPDPS 05), 2004. 6