On Modeling CPU Utilization of MapReduce Applications
|
|
|
- Melinda Booker
- 10 years ago
- Views:
Transcription
1 On Modeling CPU Utilization of MapReduce Applications Nikzad Babaii Rizvandi 1,2, Young Choon Lee 1, Albert Y. Zomaya 1 1 Centre for Distributed and High Performance Computing School of IT, University of Sydney, Australia 2 National ICT Australia (NICTA), Australian Technology Park [email protected] Abstract In this paper, we present an approach to predict the total CPU utilization in terms of CPU clock tick of applications when running on MapReduce framework. Our approach has two key phases: profiling and modeling. In the profiling phase, an application is run several times with different sets of MapReduce configuration parameters to profile total CPU clock tick of the application on a given platform. In the modeling phase, multi linear regression is used to map the sets of MapReduce configuration parameters (number of Mappers, number of Reducers, size of File System (HDFS) and the size of input file) to total CPU clock ticks of the application. This derived model can be used for predicting total CPU requirements of the same application when using MapReduce framework on the same platform. Our approach aims to eliminate error-prone manual processes and presents a fully automated solution. Three standard applications (WordCount, Exim Mainlog parsing and Terasort) are used to evaluate our modeling technique on pseudo-distributed MapReduce platforms. Results show that our automated model generation procedure can effectively characterize total CPU clock tick of these applications with average prediction error of 3.5%, 4.05% and 2.75%, respectively. Keyword- CPU utilization, CPU clock tick, MapReduce, Modeling, Prediction, Regression 1. Introduction Cloud computing has received a lot of attention from both research community and industry due to the deployment and growth of commercial cloud platforms and the associated services (e.g., Amazon EC2, Microsoft Azure and Google AppEngine) [2]. These cloud services enable customers to change, or dynamically supplement, their own IT infrastructures with large choices of computational and storage resources that are accessible on demand. On the other side, cloud providers charge customers based on their usage or reservation of datacenter resources (CPU hours, storage capacity, network bandwidth, etc) which results in a typical dependency between service level agreements (SLAs) and resource availability. Therefore, the accurate prediction of resource usage in such a scenario is important in that customers are much facilitated to decide how many nodes and for how long they should hire them from the cloud 1
2 providers [3]. Moreover, such a prediction can be used by cloud providers to guide scheduling and resource management decisions, and realistic workload generators to evaluate the choice of policies prior to full production deployment. Recently, businesses have started using MapReduce as a popular computation framework for processing large amounts of data in both public and private clouds. For example, many web-based service providers like Facebook utilize MapReduce for analysing their core business and mining their produced data. Therefore, understanding performance characteristics in MapReduce-style computations brings significant benefit to application developers in terms of the improvement in application performance and resource utilization. Generally, MapReduce users run a small number of applications for a long time. For example, Facebook uses Hadoop (the Apache implementation of MapReduce in Java) to read log files produced daily and filter database information depending on incoming queries. These tasks are repeated millions of times per day. Another example is a similar application in Yahoo! where the majority of jobs (around 80-90%) are based on Hadoop. These jobs include searching among large quantities of data, indexing the documents, and returning appropriate information. Just like Facebook, these applications run many times per day for different purposes. One major factor that directly influences the performance of MapReduce applications is tuning configuration parameters of these applications. MapReduce users face an issue in how to properly set these configuration parameters [4]. Clearly, the profiling and modeling performance of MapReduce applications and their resource usage with different values of configuration parameters are of great practical importance for both cloud users and service providers. For example, Amazon can use the result of such modeling to effectively schedule MapReduce applications on Amazon Elastic MapReduce service. In this paper, we present a technique to model resource usage of MapReduce applications in terms of total CPU utilization based on CPU clock tick. We have chosen four major configuration parameters: number of Mappers, number of Reducers, size of file system (Hadoop Distributed File System or HDFS), and size of input file. For a given MapReduce platform, applications run iteratively with different values of those parameters and total CPU clock tick of these applications are gathered. Then for each application, a linear model is constructed by applying multilinear regression on the set of configuration parameters values (as input) and obtained total CPU clock ticks of the application (as output). Obviously, the proposed modeling technique can be extended for other configuration parameters or used for modeling other resources such as storage, network bandwidth and memory. Although, our modeling technique can be applied to other applications on different platforms, two issues should be taken into account: first, the obtained model of an application on a specific platform may not be used for predicting the same application on another platform and second, the modeling of an application on a platform is not applicable to predicting other applications on the same platform. 2
3 2. Related Work MapReduce, introduced by Google in 2004 [5], is a framework for processing large quantities of data on distributed systems. The computation of this framework consists of map phase and reduce phase; hence, the name is MapReduce (Figure 1). The MapReduce framework can be seen as a practical application of the traditional data parallel model, single program, multiple data (SPMD). The process of converting an algorithm into independent mappers and reducers causes MapReduce to be inefficient for algorithms with sequential nature. Typical MapReduce applications are designed for computing on significantly large quantities of data instead of making complicated computation on a small amount of data [6]. Due to its simple structure, MapReduce suffers from some serious issues especially in scheduling, energy efficiency and resource allocation. Therefore, predicting an application resource requirement before running on a native system can make a significant contribution to modifying MapReduce scheduling and resource allocation. MapReduce: Early works on analyzing/improving MapReduce performance started almost since 2005; such as an approach by Zaharia et al [7] that addressed problem of improving the performance of Hadoop for heterogeneous environments. Their approach was based on the critical assumption in Hadoop that works on homogeneous cluster nodes where tasks progress linearly. Hadoop utilizes these assumptions to efficiently schedule tasks and (re)execute the stragglers. Their work introduced a new scheduling policy to overcome these assumptions. Besides their work, there are many other approaches to enhance or analysis the performance of different parts of MapReduce frameworks, particularly in scheduling [8], energy efficiency [3, 9-15] and workload optimization[16]. A statistics-driven workload modeling was introduced in [10] to effectively evaluate design decisions in scaling, configuration and scheduling. The framework in this work was utilized to make appropriate suggestions to improve the energy efficiency of MapReduce. A modeling method was proposed in [9] for finding the total execution time of a MapReduce application. It used Kernel Canonical Correlation Analysis to obtain the correlation between the performance feature vectors extracted from MapReduce job logs, and map time, reduce time, and total execution time. These features were acknowledged as critical characteristics for establishing any scheduling decisions. Recent works in [17-18] reported a basic model for MapReduce computation utilizations. Here, at first, the map and reduce phases were modeled using dynamic linear programming independently; then, these phases were combined to build a global optimal strategy for MapReduce scheduling and resource allocation. In [1, 19-23], linear regression is applied to model the total number of CPU tick clocks/execution time of an application needs to execute and four MapReduce configuration parameters. These configuration parameters are: number of Mappers, number of Reducers, size of file system and size of input file. System profiling and modeling: system profiling is a mechanism used to gather information about the both software and hardware configurations of a system in order to model it properly. In high performance computing systems, profiling system specifications is utilized for modeling different parts of the system. For example, in 3
4 Figure 1.MapReduce workflow [1] [24] a combination of performance modeling and prediction was applied to reduce execution times with respect to their predefined energy usage upper limit. After creating models for both execution time and energy consumption, key parameters of models are estimated by executing a program for a small number of times and then regressing the estimated parameters. In recent work[25], the idea of profiling and modeling was used for power metering in virtual machines (VMs).The relation between the major components in hardware such as CPU, memory and disk and energy are modeled as. Then multilinear regression is used to find the coefficients by profiling of thousands of traces. The work in [26] is probably the closest work to this paper. Wood et al. in [24] present a linear regression-based model to predict the CPU resource overhead between running an application in VM and the native system. The values of a set of 11 variables capturing the real values of CPU, disk and network resources in time of an application in native system are extracted by the SysStat tool [27]. The CPU usage value in time is also captured in VM using XenTop and XenMon tools. Then a linear regression model is used to form a linear relation between and these 11 variables as: where are the 11 input variables of the native system at time. In our design, we use SysStat to capture the total CPU utilization (in terms of CPU clock tick) of an application in time in MapReduce. After calculating the total CPU clock tick of the application, we model the relationship between the four MapReduce configuration parameters and using linear regression as: Then, we apply the derived model to the same application but with different MapReduce configuration parameters and use it for predicting the CPU resource requirements of the application before it actually runs. 4
5 Figure 3. the flow of the MapReduce application(left) and CPU utilization time series extracted from actual system (right). This value is then converted to total CPU clock tick based on the platform s operating frequency. 3. Application Modeling in MapReduce 3.1. Problem definition In distributed computing systems, MapReduce has been known as a large-scale data processing or CPU intensive job [4, 28-30] which implies that CPU utilization is the most important part of running an application on MapReduce. Therefore, predicting the CPU capacity that an application needs becomes important for customers to hire enough CPU resources from cloud providers and for cloud providers to schedule incoming jobs properly. In this paper we are going to study the dependency between MapReduce configuration parameters and the CPU utilization of the system. We expect that the CPU utilization of an application in MapReduce is highly correlated and proportional to MapReduce configuration parameters. Therefore by modelling, it becomes possible to predict the finish time of an experiment; and this has a significant impact on job scheduling. Clearly, there is high dependency between the value of total CPU clock tick of an application and changing these configuration parameters. The straightforward benefit of finding a model between the MapReduce configuration parameters and total CPU clock tick is that one can approximately calculate the best values of these parameters to optimize the values of CPU clock tick of the application. This means that if one is to run a given application in a cloud, by modeling it can approximately find how many virtual nodes the application needs and for how long Profiling CPU utilization For each application, we generate a set of experiments with different values of four MapReduce configuration parameters on a given platform. While running each experiment, the average CPU utilization of the experiment is gathered to build a trace for future use as training data for the model (this data can be gathered easily in Linux using the SysStat monitoring package which has low overhead). Within the system, we sample the CPU usage of the experiment in the native system from the time the 5
6 Figure 4. Prune algorithm Figure 5. Modeling and prediction algorithms mappers start to the time when the reducers finish with a time interval of one second (Figure 3-left). If the platform has an operating frequency of f platform (in Hz), then the obtained trace is converted to the relative CPU frequency by calculating the product of the average CPU utilization and f platform. Then, total CPU clock tick of a particular experiment is the summation of all CPU usage samples during the time period of that experiment (Figure 3-right). Because of the temporal changes, it is expected that several runs of an experiment with the same configuration parameters may result in slightly different total CPU clock tick. Therefore, utilizing a mechanism to prune 6
7 unsuitable data from the training dataset should improve the modeling accuracy. In [26], Robust Stepwise Linear Regression was used as a post processing stage to refine the outcome of the model by giving weights to data points with high error. In this study, we use a straightforward technique to prune the data set as described in Figure 4. As a result, the final value of total CPU clock tick of this experiment is equal to the final calculated mean in line 8. The same procedure must be followed for other experiments Model creation using linear regression The next step is how to create a model for MapReduce applications by characterizing the relationship between a set of MapReduce configuration parameters and CPU resource utilization metric. The problem of such a modeling based on linear regression involves the choosing of suitable coefficients for the model in order to approximate the real system response [26, 31]. Consider the linear algebraic equations for experiments of an application with different sets of four configuration parameters values: where is the actual value of total CPU clock tick of the application in the j th experiment on MapReduce and S (j) = (M (j), R (j) ) are the MapReduce configuration parameters which are the number of mappers (M (j) ), and the number of reducers (R (j) ), for this experiment. Using the above definition, the approximation problem becomes one of estimating the values of to optimize a cost function between the approximation values and the actual values of total CPU clock tick. Then, an approximated total CPU clock tick experiment is predicted as: of the application for the There are a variety of well-known mathematical methods in the literature to calculate the variables ( ). One widely used in many application domains is the Least Square Regression which calculates the parameters in Eqn.2 by minimizing the error: Least Square Regression theory claims that if: 7
8 Figure 6. the modeling justification then the model satisfying the above error will be calculated as [31-32]: where denotes transpose matrix. The set of configuration parameters values is the model that approximately describes the relationship between total CPU clock tick of an application regard to the four MapReduce configuration parameters. In other words, the relationship between total CPU clock tick of a MapReduce application and the configuration parameters is: Once a model has been created, it can then be applied to CPU resource utilization traces of the same application to estimate what their total CPU requirements will be if the values of MapReduce configuration parameters change. It also should be considered that the obtained model of an application may be different from another application. The modeling and profiling algorithms related to our technique are briefly described in Figure 5. 8
9 TABLE 1. The mean and variance of prediction errors Mean prediction error Variance prediction error WordCount 4.37% % TeraSort % % Exim Mainlog parsing % % 4. Experimental Validation In this section, we evaluate the effectiveness of our models under three standard applications and two different hardware platforms Experimental setting Three widely used text processing applications are used to evaluate the effectiveness of our method. Our method has been implemented and evaluated on a private Cloud as shown in Figure 5. The cloud in our evaluation has the following specifications: Physical H/W: includes five servers, each one is an Intel Genuine with 3.00GHz clock, 1GB memory, 1GB cache and with 50GB of shared SAN hard disk. For virtualization, Xen cloud platform (XCP) has been used on top of the physical H/W. The Xen-API [33] provides functionality to manage virtual machines inside the XCP directly. It provides binding in high level languages like Java, C# and Python. Using those bindings it is possible to measure the performance of all virtual machines in a datacenter and live-migrate them. Virtual nodes (/servers) are implementd on top of the XCP. The number of virtual nodes is chosen from [5, 10, 15, 20, 25] with linux image (debian). the virtual node runs Hadoop version that is Apache implementation of MapReduce developed in Java [34]; at the same time, the SysStat package (installed in image on node) is executed in background to monitor/extract the CPU utilization time series of applications (in the native system) [27]. For an experiment with a specific set of MapReduce configuration parameters values, statistics are gathered from running job stage to the job completion stage (arrows in Figure 3-left) with sampling time interval of one second. All CPU usages samples are then combined to form CPU utilization time series of an experiment. For each application there are sets of experiments where the number of mappers and reducers are a value in and the size of input data is. To capture the statistical information, each experiment is also repeated 10 times. Our benchmark applications are WordCount, TeraSort, and Exim Mmainlog parsing. These benchmarks are used due to their striking differences and also because other studies [1, 19-20, 35-39] have relied on these benchmarks Results As mentioned earlier, there is a strong dependency between a MapReduce application execution time and the number of Mappers and Reducers. Figure 7 shows the 9
10 dependency between these two configuration parameters and execution time for the applications. As can be seen from this figure, these applications behave differently as the number of Mappers and Reducers increase. In general, executing the same size of data for both applications results in different execution times so that, in many cases, WordCount has double execution time than Exim main log. In addition, although both applications show the minimum execution time for 20 mappers and five reducers, WordCount shows more fluctuation than Exim for other number of mappers/reducers. The reason why these numbers of mappers and reducers give the lowest execution time is not clear to us. Perhaps moving from one platform to another platform causes this variation. Therefore, even though the modeling is valid for applications on different platforms, the coefficients of the model may change by migrating from one platform to another. This result also supports our initial claim as the number of mappers and reducers have significant impact on the total execution time of the application. To test the accuracy of an application s model, we use it to predict total CPU clock tick of several experiments of the application with randomly values of the four configuration parameters in the predefined range. We then run the experiments on the real system and gather total CPU utilization in term of clock tick to determine the prediction error. For evaluation, the outcomes of these models on three standard MapReduce applications (WordCount, Exim_Mainlog parsing and TeraSort) are compared with their actual total CPU clock ticks. Figure 7 shows the prediction accuracies and prediction errors of these applications between actual values of total CPU clock tick and their predicted values, while Table 1 is the statistical mean and variance of prediction errors for the three applications. We find that the average error between the actual values of total CPU clock ticks and their predicted counterparts is less than 5% for the tested applications. Some of the errors come from the model inaccuracy, but it can also be because of temporal changes in the system resulting in total CPU clock tick increase for a short time. Those spikes of prediction errors in Figure 7 happen in the low values of total CPU clock ticks which most likely caused by background processes running during the execution of the applications. For example in Hadoop, one of the main background processes comes from streaming when the mapper and reducer are written in a programming language other than Java. As these background processes constantly consume a certain amount of CPU capacity, their influence becomes significant when total CPU clock tick of the MapReduce application is low. Although the obtained model can successfully predict the level of total CPU capacity required for a MapReduce application, it cannot give information about how application performance, such as response time, changes or how CPU utilization varies during time. Finally, our approach can help cloud customers and providers to approximate the total amount of CPU resources which have to be allocated to a MapReduce application in order to prevent significance performance reduction because of CPU resource limitation. 10
11 WordCount (a) TeraSort (b) Exim MainLog Parsing R=4 R=8 R=12 R=16 R=20 R=24 R=28 R=32 (c) Figure 7. the prediction accuracy and error between the actual and predicted total CPU clock tick for the applications 6. Conclusion 11
12 The motivation behind this work is that the accurate modeling and prediction of resource usage of a MapReduce application before running on the actual cluster/cloud can greatly help both application performance and effective resource utilization. In this paper, we have presented an approach to model/profile the CPU usage of MapReduce application from native system and applied linear regression model to identify the correlation between four major MapReduce configuration parameters and the CPU utilization of the application. Our modeling technique can be used by both users/consumers (e.g., application developers) and service providers in the cloud for effective resource utilization. Evaluation results show that our modeling technique can effectively predict the total computation cost of three standard applications with a median prediction error of less than 5%. Acknowledgment Mr. N. Babaii Rizvandi s work is supported by National ICT Australia (NICTA). Professor A.Y. Zomaya's work is supported by an Australian Research Council Grant LP References [1] N. B. Rizvandi, et al., "Preliminary Results on Modeling CPU Utilization of MapReduce Programs," University of Sydney, TR665, [2] R. Nathuji, et al., "Q-clouds: managing performance interference effects for QoS-aware clouds," presented at the Proceedings of the 5th European conference on Computer systems, Paris, France, [3] Y. Chen, et al., "Towards Understanding Cloud Performance Tradeoffs Using Statistical Workload Analysis and Replay," University of California at Berkeley,Technical Report No. UCB/EECS , [4] S. Babu, "Towards automatic optimization of MapReduce programs," presented at the 1st ACM symposium on Cloud computing, Indianapolis, Indiana, USA, [5] J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Commun. ACM, vol. 51, pp , [6] Hadoop Developer Training. Available: [7] M. Zaharia, et al., "Improving MapReduce Performance in Heterogeneous Environments," 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2008), pp , 18 December [8] M. Zaharia, et al., "Job Scheduling for Multi-User MapReduce Clusters," University of California at Berkeley,Technical Report No. UCB/EECS , [9] J. Leverich and C. Kozyrakis, "On the Energy (In)efficiency of Hadoop Clusters," SIGOPS Oper. Syst. Rev., vol. 44, pp , [10] Y. Chen, et al., "Statistical Workloads for Energy Efficient MapReduce," University of California at Berkeley,Technical Report No. UCB/EECS ,
13 [11] N. Kamyabpour and D. B. Hoang, "A hierarchy energy driven architecture for wireless sensor networks," presented at the 24th IEEE International Conference on Advanced Information Networking and Applications (AINA- 2010), Perth, Australia, [12] N. Kamyabpour and D. B. Hoang, "A Task Based Sensor-Centeric Model for overall Energy Consumption," CoRR, [13] N. Kamyabpour and D. B. Hoang, "A study on Modeling of Dependency between Configuration Parameters and Overall Energy Consumption in Wireless Sensor Network (WSN)," CoRR, [14] K. Almiani, et al., "RMC: An Energy-Aware Cross-Layer Data-Gathering Protocol for Wireless Sensor Networks," presented at the 22nd International Conference on Advanced Information Networking and Applications (AINA), GinoWan, Okinawa, Japan, [15] K. Almiani, et al., "Energy-efficient data gathering with tour lengthconstrained mobile elements in wireless sensor networks," presented at the The 35th Annual IEEE Conference on Local Computer Networks (LCN), Denver, Colorado, USA, [16] T. Sandholm and K. Lai, "MapReduce optimization using regulated dynamic prioritization," presented at the the eleventh international joint conference on Measurement and modeling of computer systems, Seattle, WA, USA, [17] A. Wieder, et al., "Brief Announcement: Modelling MapReduce for Optimal Execution in the Cloud," presented at the Proceeding of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, Zurich, Switzerland, [18] A. Wieder, et al., "Conductor: orchestrating the clouds," presented at the 4th International Workshop on Large Scale Distributed Systems and Middleware, Zurich, Switzerland, [19] N. B. Rizvandi, et al., "On using Pattern Matching Algorithms in MapReduce Applications," presented at the The 9th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Busan, South Korea, [20] N. B. Rizvandi, et al., "Preliminary Results on Using Matching Algorithms in Map-Reduce Applications," University of Sydney, TR672, [21] N. B. Rizvandi, et al., "Preliminary Results: Modeling Relation between Total Execution Time of MapReduce Applications and Number of Mappers/Reducers," University of Sydney2011. [22] N. B. Rizvandi, et al., "A Study on Using Uncertain Time Series Matching Algorithms in Map-Reduce Applications," CoRR, [23] N. B. Rizvandi, et al., "On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time," CoRR, [24] R. Springer, et al., "Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster," presented at the Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, New York, New York, USA, [25] A. Kansal, et al., "Virtual machine power metering and provisioning," presented at the 1st ACM symposium on Cloud computing, Indianapolis, Indiana, USA,
14 [26] T. Wood, et al., "Profiling and Modeling Resource Usage of Virtualized Applications," presented at the Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference, Leuven, Belgium, [27] Sysstat Available: [28] H. Wang, et al., "Distributed Systems Meet Economics: Pricing in the Cloud," presented at the the 2nd USENIX conference on Hot topics in cloud computing, Boston, MA, [29] K. Kambatla, et al., "Towards Optimizing Hadoop Provisioning in the Cloud," presented at the the 2009 conference on Hot topics in cloud computing, San Diego, California, [30] S. Groot and M. Kitsuregawa, "Jumbo: Beyond MapReduce for Workload Balancing," presented at the 36th International Conference on Very Large Data Bases, Singapore [31] N. B. Rizvandi, et al., "An Accurate Fir Approximation of Ideal Fractional Delay Filter with Complex Coefficients in Hilbert Space," Journal of Circuits, Systems, and Computers, vol. 14, pp , [32] N. B. Rizvandi, et al., "An accurate FIR approximation of ideal fractional delay in Hilbert space," presented at the the Fourth IEEE International Symposium on Signal Processing and Information Technology, [33] D. Chisnall, The definitive guide to the xen hypervisor., first ed.: Prentice Hall Press, [34] Hadoop Available: [35] a. Mao, et al., "Optimizing MapReduce for Multicore Architectures," Massachusetts Institute of Technology2010. [36] G. Wang, et al., "Using realistic simulation for performance analysis of mapreduce setups," presented at the Proceedings of the 1st ACM workshop on Large-Scale system and application performance, Garching, Germany, [37] G. Wang, et al., "A Simulation Approach to Evaluating Design Decisions in MapReduce Setups " presented at the MASCOTS, [38] "Optimizing Hadoop Deployments," Intel Corporation2009. [39] N. B. Rizvandi, et al., "MapReduce Implementation of Prestack Kirchhoff Time Migration (PKTM) on Seismic Data," presented at the The 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Gwangju, Korea
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load
Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load Pooja.B. Jewargi Prof. Jyoti.Patil Department of computer science and engineering,
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing Deep Mann ME (Software Engineering) Computer Science and Engineering Department Thapar University Patiala-147004
Characterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
Energy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES
USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES Carlos Oliveira, Vinicius Petrucci, Orlando Loques Universidade Federal Fluminense Niterói, Brazil ABSTRACT In
GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR
GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR ANKIT KUMAR, SAVITA SHIWANI 1 M. Tech Scholar, Software Engineering, Suresh Gyan Vihar University, Rajasthan, India, Email:
Improving MapReduce Performance in Heterogeneous Environments
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce
Figure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 [email protected], [email protected] Abstract One of the most important issues
Task Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
Matchmaking: A New MapReduce Scheduling Technique
Matchmaking: A New MapReduce Scheduling Technique Chen He Ying Lu David Swanson Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln, U.S. {che,ylu,dswanson}@cse.unl.edu
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
Dynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
A Hadoop MapReduce Performance Prediction Method
A Hadoop MapReduce Performance Prediction Method Ge Song, Zide Meng, Fabrice Huet, Frederic Magoules, Lei Yu and Xuelian Lin University of Nice Sophia Antipolis, CNRS, I3S, UMR 7271, France Ecole Centrale
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
Research on Job Scheduling Algorithm in Hadoop
Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National
Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
Processing Large Amounts of Images on Hadoop with OpenCV
Processing Large Amounts of Images on Hadoop with OpenCV Timofei Epanchintsev 1,2 and Andrey Sozykin 1,2 1 IMM UB RAS, Yekaterinburg, Russia, 2 Ural Federal University, Yekaterinburg, Russia {eti,avs}@imm.uran.ru
An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.
An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Mobile Cloud Computing for Data-Intensive Applications
Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, [email protected] Advisor: Professor Priya Narasimhan, [email protected] Abstract The computational and storage
Virtual Machine Instance Scheduling in IaaS Clouds
Virtual Machine Instance Scheduling in IaaS Clouds Naylor G. Bachiega, Henrique P. Martins, Roberta Spolon, Marcos A. Cavenaghi Departamento de Ciência da Computação UNESP - Univ Estadual Paulista Bauru,
Group Based Load Balancing Algorithm in Cloud Computing Virtualization
Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information
Analysis and Modeling of MapReduce s Performance on Hadoop YARN
Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University [email protected] Dr. Thomas C. Bressoud Dept. of Mathematics and
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, [email protected]
Final Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
The Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
An Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems
An Adaptive Scheduling Algorithm for Dynamic Heterogeneous Hadoop Systems Aysan Rasooli, Douglas G. Down Department of Computing and Software McMaster University {rasooa, downd}@mcmaster.ca Abstract The
Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam
Computer Technology and Application 4 (2013) 532-537 D DAVID PUBLISHING Performance of the Cloud-Based Commodity Cluster Van-Hau Pham, Duc-Cuong Nguyen and Tien-Dung Nguyen School of Computer Science and
Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform Shie-Yuan Wang Department of Computer Science National Chiao Tung University, Taiwan Email: [email protected]
Characterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang University of Waterloo [email protected] Joseph L. Hellerstein Google Inc. [email protected] Raouf Boutaba University of Waterloo [email protected]
Virtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Reallocation and Allocation of Virtual Machines in Cloud Computing Manan D. Shah a, *, Harshad B. Prajapati b
Proceedings of International Conference on Emerging Research in Computing, Information, Communication and Applications (ERCICA-14) Reallocation and Allocation of Virtual Machines in Cloud Computing Manan
Large Scale Electroencephalography Processing With Hadoop
Large Scale Electroencephalography Processing With Hadoop Matthew D. Burns I. INTRODUCTION Apache Hadoop [1] is an open-source implementation of the Google developed MapReduce [3] general programming model
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Technical Investigation of Computational Resource Interdependencies
Technical Investigation of Computational Resource Interdependencies By Lars-Eric Windhab Table of Contents 1. Introduction and Motivation... 2 2. Problem to be solved... 2 3. Discussion of design choices...
Affinity Aware VM Colocation Mechanism for Cloud
Affinity Aware VM Colocation Mechanism for Cloud Nilesh Pachorkar 1* and Rajesh Ingle 2 Received: 24-December-2014; Revised: 12-January-2015; Accepted: 12-January-2015 2014 ACCENTS Abstract The most of
Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.
Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement
CiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing
Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing Hsin-Wen Wei 1,2, Che-Wei Hsu 2, Tin-Yu Wu 3, Wei-Tsong Lee 1 1 Department of Electrical Engineering, Tamkang University
An Autonomic Auto-scaling Controller for Cloud Based Applications
An Autonomic Auto-scaling Controller for Cloud Based Applications Jorge M. Londoño-Peláez Escuela de Ingenierías Universidad Pontificia Bolivariana Medellín, Colombia Carlos A. Florez-Samur Netsac S.A.
A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems
A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, and Albert Zomaya Present by Leping Wang 1/25/2012 Outline Background
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
International Journal of Computer & Organization Trends Volume20 Number1 May 2015
Performance Analysis of Various Guest Operating Systems on Ubuntu 14.04 Prof. (Dr.) Viabhakar Pathak 1, Pramod Kumar Ram 2 1 Computer Science and Engineering, Arya College of Engineering, Jaipur, India.
Migration of Virtual Machines for Better Performance in Cloud Computing Environment
Migration of Virtual Machines for Better Performance in Cloud Computing Environment J.Sreekanth 1, B.Santhosh Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India [email protected] 2 Department
Dynamic Resource allocation in Cloud
Dynamic Resource allocation in Cloud ABSTRACT: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
Performance Comparison of VMware and Xen Hypervisor on Guest OS
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 2 Issue 3; July-August-2015; Page No. 56-60 Performance Comparison of
Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on Load
Performance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
Masters Project Proposal
Masters Project Proposal Virtual Machine Storage Performance Using SR-IOV by Michael J. Kopps Committee Members and Signatures Approved By Date Advisor: Dr. Jia Rao Committee Member: Dr. Xiabo Zhou Committee
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Snapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Infrastructure as a Service (IaaS)
Infrastructure as a Service (IaaS) (ENCS 691K Chapter 4) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ References 1. R. Moreno et al.,
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale
The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs
The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs Jagmohan Chauhan, Dwight Makaroff and Winfried Grassmann Dept. of Computer Science, University of Saskatchewan Saskatoon, SK, CANADA
