Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations



Similar documents
Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Evaluating partitioning of big graphs

Load Balancing. Load Balancing 1 / 24

How To Balance In Cloud Computing

Load Balancing Techniques

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

Cloud Computing is NP-Complete

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Sanjeev Kumar. contribute

Clustering & Visualization

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

EBERSPÄCHER ELECTRONICS automotive bus systems. solutions for network analysis

Information Processing, Big Data, and the Cloud

Classification On The Clouds Using MapReduce

Load balancing; Termination detection

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

STUDY OF PROJECT SCHEDULING AND RESOURCE ALLOCATION USING ANT COLONY OPTIMIZATION 1


Partitioning and Divide and Conquer Strategies

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

LOAD BALANCING TECHNIQUES

Testing Intelligent Device Communications in a Distributed System

Topological Properties

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Parallelization: Binary Tree Traversal

Big Data Processing with Google s MapReduce. Alexandru Costan

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

Efficiency of Web Based SAX XML Distributed Processing

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

LDIF - Linked Data Integration Framework

Data Warehousing und Data Mining

Self-organized Multi-agent System for Service Management in the Next Generation Networks

D A T A M I N I N G C L A S S I F I C A T I O N

Techniques for Scaling Components of Web Application

Load balancing; Termination detection

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

A Novel Approach for Efficient Load Balancing in Cloud Computing Environment by Using Partitioning

ABSTRACT INTRODUCTION SOFTWARE DEPLOYMENT MODEL. Paper

Chapter 5. Regression Testing of Web-Components

How To Compare Real Time Scheduling To A Scheduled Scheduler On Linux On A Computer System With A Visualization System

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Visualizing an Auto-Generated Topic Map

Master of Science in Computer Science

Survey on Load Rebalancing for Distributed File System in Cloud

Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Figure 1. The cloud scales: Amazon EC2 growth [2].

Performance Comparison of Server Load Distribution with FTP and HTTP

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

What Is Specific in Load Testing?

The Scientific Data Mining Process

LOAD BALANCING IN CLOUD COMPUTING

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

AUTOMATED TESTING and SPI. Brian Lynch

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

A REVIEW PAPER ON LOAD BALANCING AMONG VIRTUAL SERVERS IN CLOUD COMPUTING USING CAT SWARM OPTIMIZATION

Simulation Software: Practical guidelines for approaching the selection process

CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014)

Business Process Discovery

Computing Concepts with Java Essentials

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

Fall 2012 Q530. Programming for Cognitive Science

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

Introduction to DISC and Hadoop

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

Alphacam Art combines Vectric s Aspire artistic design software with the market leading Alphacam manufacturing software.

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Introduction to Parallel Programming and MapReduce

Spreadsheet Simulation

Efficiency Considerations of PERL and Python in Distributed Processing

Reputation Management Algorithms & Testing. Andrew G. West November 3, 2008

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Building a Question Classifier for a TREC-Style Question Answering System

Instructional Design Framework CSE: Unit 1 Lesson 1

ACO ALGORITHM FOR LOAD BALANCING IN SIMPLE NETWORK

Parallel Analysis and Visualization on Cray Compute Node Linux

Technology White Paper Capacity Constrained Smart Grid Design

A Comparison of Dynamic Load Balancing Algorithms

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations

Project Management System Services

Chapter 13: Binary and Mixed-Integer Programming

Transcription:

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the CASEsim framework. First, we must handle a few practical concerns, such as increasing the framework s usability and implementing the ability to distribute computation. To achieve this, we plan to use XML configuration files for usability and the Java RMI library for distributed computing. Second, we will study load balancing techniques for 0-dimensional (non-spatial) social network simulations, creating heuristics that allow for general load balancing of such simulations in an optimal way. We will explore two approaches. The first is a centralized technique, examining macroscopic properties of the graph such as agent clusters in order to put related agents on the same slave. The second is an individual, slave-level technique that in which individual slaves dynamically pass agents to other slaves in order to equalize the workload. Finally, we will test our heuristics via a variety of simulations using the CASEsim framework. 1 Introduction Multi-agent systems and computer models of society in general are new and promising areas of computer science research because they can simulate dynamic social situations that are more realistic than many other social models. A multi-agent system is broadly divided into an environment and a number of agents within that environment. These agents are anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators [6]. Because the definition of an agent is so broad, multi-agent systems can model many different situations, such as economic, social, and political activity. Of course, creating realistic simulations is not easy and requires collaboration with economics and/or the social sciences. However, multi-agent systems provide the technological framework to make such modeling possible. Most previous work has been in spatial simulations, that is, simulations in which each agent has a location in space at all times. These simulations are promising partially because they model real-world situations (like traffic flow or migration patterns) but also because they lend themselves to parallel and distributed computing. For example, those writing spatial simulations can use data structures like kd-trees to assign slaves to particular regions of space and thereby distribute the work of computing agent behavior to many different slaves. Assuming agents are roughly evenly distributed across space, this approach can work quite well. However, another class of social simulations 0-dimensional simulations are not spatial in nature. Such simulations include social networks like Facebook and any simulation dealing with people s interactions without concern for their physical locations. These simulations, although important and interesting, do not easily lend themselves to parallel and distributed computing because there is no obvious way to partition work among slaves. Since 0-dimensional simulations are varied, a load balancing technique that works well in one simulation may not work in another. Our goal is to create some load balancing heuristics that work reasonably well in all 0-dimensional simulations. There has been little research done on this problem, so it is unexplored territory that will require some creativity and thought in order to get results. 1

2 Related Work Most previous work has involved spatial simulations, especially for the CASEsim framework. The main reason for this is that spatial simulations are much easier to load balance than non-spatial ones because they yield to spatial partitioning algorithms such as those involving k-d trees. Using such methods, one can recursively bisect the space in order to balance the workload among as many slaves as needed. Two successful past simulations (both using CASEsim) are Sugarscape, which models societal resource distribution, and Foreclosure, which models the housing foreclosure crisis in the US [7]. In contrast, load balancing 0-dimensional social simulations is relatively unexplored, and there is little, if any, research literature how to do it. However, there is theoretical support for two different approaches to load balancing social networks. Broadly, these two approaches are centralized and slave-based load balancing. Centralized load balancing is inspired by prior work with recursive bisection[5]. These methods partition a spatial area. This is done because it is assumed agents in a given area communicate mostly with other agents in that area. While social networks are non-spatial, they do have a graph structure that can be partitioned. We plan to adapt the same type of recursive balancing algorithm used with recursive bisection but partition using minimum cuts in the social network graph[5]. In contrast, slave-based load balancing is based on slaves managing workload when they become too overloaded or underloaded. This sort of load balancing has been applied before by Cao [1] but not in the context of 0-dimensional spatial simulations. Thus we plan to adopt Cao s general method in the context of social network load balancing. 3 Problem Definition We have two goals for this research project. The first goal concerns practical components of the CASEsim framework, such as increasing usability and adding the ability to do distributed computation. This work is necessary for research but will not directly help us achieve our research goals. The second is to create a system for load balancing 0-dimensional simulations of social networks. 1. CASEsim Usability The CASEsim framework needs to be made more usable in a number of ways in order to help the other research groups create simulations with ease. There are a large number of ways we can do this, so we will work on implementing enough of these features that the framework is usable, but we do not expect to accomplish all of these goals. First, we need to create an easier to use system of simulation configuration. Currently, configuring a simulation requires writing an iterator in Scala (the base language for CASEsim) and manually declaring each agent through Scala code. This process could be automated somewhat through a configuration file that specifies types of agents and starting parameters for the agents. It would also describe what machines are being used for parallelization and the path for the Scala executable. A second goal is to create a visual client to manipulate simulation configuration once the configuration mechanism is in place. This would make simulation initialization even easier because the user would not need neither programming nor scripting knowledge to use the system. Another, related goal is to create a configuration file for the visual client that would tell the client necessary information for it to initialize a simulation, such as simulation type and parallelization setup. Another goal is to create a standardized system through which simulations can log simulation data while a simulation is running. Currently, logging is available through text and binary mode, 2

but it is not in a standardized format. Having such a format would make automated processing of simulation logs easier, especially if the format is the same as that of the configuration files. Finally, it would be helpful for CASEsim to have an easy way to import Geographic Information System (GIS) data. This data is useful for a variety of simulations, such as traffic flow modeling, and we anticipate that it would help the other research groups with their work as well as extend CASEsim s functionality. 2. Distributed Computing In order to study any sort of load balancing, CASEsim needs to have the capability to distribute its computation amongst multiple machines. The current Scala version of CASEsim does not have this functionality, which makes all load balancing research impossible until it is implemented. 3. Load balancing of 0-dimensional Social Simulations Finally, our primary research goal is in load balancing social simulations. This problem is of research interest for a number of reasons. The main reason is that it is a relatively unexplored topic. Much work has been done on the load balancing of spatial simulations using methods such as recursive bisection of the spatial area. Previous research groups in this program have studied this using kd-trees to facilitate the process. However, this approach only works when agents can be partitioned spatially, which requires that they have a position at all times. In the case of social simulations (such as simulating Facebook or any other social network), agents do not have locations so there is no obvious way to partition agents amongst the slaves. Thus other criteria must be identified and used. We plan to use the graph structure of social networks in order to optimize load balancing. Because agents in social simulations all have connections amongst each other, we can represent these connections with edges and the agents with nodes. An optimal partitioning of agents would minimize inter-slave messaging and maximize intra-slave messaging while keeping the processor load on each slave equal. The issue with social simulations is that agent connections continuously shift, resulting in a very dynamic graph structure. The complex component of this research problem is not only to partition the agents in a good way, but also to maintain a good partitioning as the simulation runs. Our goal is to find a good way to do this. Ideally this method will be applicable to most if not all social simulations. 4 Proposed Solution Since our problem has three primary components, our proposed solution is split into three sections. The first section describes solutions to the usability problem. The second covers how we will add the ability to distribute computation to CASEsim. The third describes our approach to the load balancing of social simulations. 1. Usability To facilitate simple simulation configuration we will give CASEsim the ability to load XML configuration files. XML is a good choice because it is relatively simple to edit and very extensible. If we we add new features to CASEsim, it will be easy to add XML options for them. Additionally, Scala has excellent support for XML parsing, making the implementation of the parser very simple. These files will allow the user to easily configure the simulation setup, data recording, load balancing, and other important aspects of CASEsim. If possible, it would be good to add the ability to specify agent behavior in a configuration file, but this functionality may be beyond the scope of our usability work. Even if agent behavior cannot be set, we do view it as important to allow configuration files to specify the initial layout and state of agents in the simulation. An additional usability task is to add the ability to easily import GIS data into CASEsim. Other research groups have expressed interest in this functionality, so we will work with them to enable this. 3

Since this task could be complex and is not part of our primary research topic, we may not be able to accomplish it. 2. Distributed computing In order to give CASEsim the ability to perform distributed computation, we will use Java s RMI libraries. Since Scala is closely related to Java, all RMI methods can be easily called from within our Scala framework, allowing us to implement distributed functionality without much difficulty. This task is of very high importance because we cannot start researching load balancing until it is complete. 3. Load balancing of social simulations There are several ways we can pursue load balancing of social simulations. The first way is centralized and involves observing macroscopic properties of the graph of a social network. The second operates at the slave level without centralized control, each slave adjusting its load according to how much work it is doing. These two approaches both have the same goal, those of equal processor load on each slave, maximal intra-slave communication and minimal inter-slave communication. To achieve this, our methods attempt to partition the graph into clusters and then distribute these clusters in an optimal way. Ideally, we will find clusters with a large amount of interconnection and few connections to other clusters. Naively, we are simply finding cliques like structures, but since finding cliques of a given size is NP-complete [3], we cannot follow this approach. Instead we will create heuristics to find reasonably ideal clusters. Broadly, we plan to take two approaches to this problem: centralized and slave-based. Centralized approaches Our centralized approach is to adapt the spatial method of recursive bisection partitioning to a non-spatial graph. While prior approaches have used kd-trees to store partitions based on cuts along geometric axes, we will instead partition based on minimum cuts in the graph [5]. Our algorithm will build a one-dimensional tree by recursively finding the minimum cut in the graph, which is easy to find using the Ford-Fulkerson method or by applying the Kernighan-Lin algorithm[2][4]. There are also a large number of other possible approaches to the graph partitioning problem. Our expectation is that this cut-tree will successfully partition the graph into reasonably good clusters. Once the initial distribution has been computed, the simulation is started. As the social network evolves, communications and processing bottlenecks will emerge. To eliminate these, the centralized method employs the same techniques that prior work with kd-trees has used. A bottom-up approach to this would work in the following way. The algorithm recursively moves up the tree starting with the lowest level. At each node the algorithm checks to see if the load values of the children are approximately equal. If not, agents are transferred from one child to the other. If this transfer takes place at a non-leaf node, the algorithm will recursively navigate down the tree to identify the ideal agents to move. To decide what agents to transfer, the algorithm will shift the cut line that defines a node s partition in such a way that the total agent load is reduced but the cut value is still minimized locally. Once the entire tree has been traversed, the simulation should be balanced. There are a number of challenges that need to be overcome in order to successfully use this method. The first is that the initial partitioning of the graph needs to split it into clusters of an appropriate size for assignment to slaves. A potential solution to this problem is to allow non-leaf nodes to be assigned to slaves for computation. Another is that in a dynamic social network, the structure could change enough that this method of shifting cut lines will get stuck at a local optimum. A possible solution to this problem is to allow the system to reinitialize the cut tree. However, reinitializing is computationally expensive, so this approach needs to be considered carefully.another issue is that both communication load and processor load need to be balanced. The planned algorithm attempts communication load balancing by minimizing it 4

locally and expecting that to work well enough. It needs to be explored whether or not that is actually effective. Slave-based approaches Another method for load balancing is inspired by Cao [1], who simulated ant behavior by having overburdened ants search for under-burdened ants to pass work to (and vice versa). The advantage of such an approach is that it does not involve any type of centralized organization. With it, we can also give agents the ability to respond to both network and processor load, directly addressing both balancing issues. The difficult part of the method is that it requires the slaves to have a good heuristic to govern how load should be passed around. The expectation is that if slaves are actively working to minimize inter-slave communication, then slaves will partition the social network reasonably well on their own. Simulation initialization will still need to be worked out, but if this heuristic is good enough, the slaves should find a near optimal solution soon after the simulation begins. Our initial plan for a heuristic is to have each slave store a record of how much work its agents are doing and which slaves they communicate with the most. This communication information, combined with agent friend data, allows a slave to calculate a value representing to what degree its agents are attached to it. Using communication data in addition to just agent-provided data allows the framework to take into account friendships of varying strength without putting any additional burden on the simulation writer. When a slave realizes that it is overworked, it can send its least-attached agents to other slaves. We can also use this procedure to search for underloaded slaves and maintain load balance. We can also include occasional random agent redistribution, which could serve as a way to get the system out of a bad local optimum. Shadowing In both of the above load balancing methods, agents are passed around frequently. It is possible that an optimal partitioning requires that agents be in multiple places at once. Shadowing is the idea of letting this occur and taking advantage of it. An example of when it may be appropriate is when a single agent receives messages from every agent in the simulation but spends very little time processing these messages. A more specific example would be a centralized logger agent that records data and occasionally sends results through an output service. This agent would place very high load on the network but minimal load on the processor. If this agent were shadowed such that it was present on every machine, message passing would be reduced substantially at the cost of minimal processor load. It could be the case that shadowing certain agents reduces network load enough to be beneficial. Shadowing could also allow trial moves, where an agent exists in multiple locations for a brief time and then stays in the more optimal one. The issue with shadowing is that all shadows need to maintain synchronization, which could be difficult to maintain. 5 Proposed Experiment Domain Our primary testing method will be to create a set of test simulations for which we know the optimal distribution. Ideally, we will run these tests on simulations of one to ten million agents. Once we develop a balancing system that achieves this goal, we will experiment with simulations for which we do not know the optimum and attempt to understand why the system behaved as it did. The very first test will be to create a random static social network and then place every agent on one slave. If the balancing works, agents should be rapidly distributed to other slaves. The second test will be to create a static social network in which the agents are grouped into a number of cliques equal to the number of slaves. We can then randomly assign agents to slaves and check to see how closely the slaves store agents within the underlying clique structure after balancing. Another test would be to take this clique network and have it evolve dynamically in a known way. An example of a test in the second category would be to 5

have a master agent and, each turn, have agents break off existing connections and replace them with connections to the master. The correct response to this by the load balancing algorithm is not obvious, so it would be interesting to see how our method handles it. An ideal load balancing algorithm will successfully adapt to the new cliques. Of course, this list of simulations is not exhaustive and more may be created if necessary. 6 Proposed Timeline 1. Initial Tasks Our initial tasks are to continue our literature review and start working on improving the core functionality of CASEsim. 2. Intermediate Tasks Our intermediate tasks are to explore our approaches to the load balancing problem. We should select one and get a basic implementation working. 3. Final Tasks The final task is to test and tweak our selected load balancing algorithm. 4. Wrapup Our wrapup tasks are to finalize the test set and prepare the final paper and presentation. 5. Rough Schedule Week Tasks 1 Literature review, familiarization with MAS 2 Literature review, study CASEsim code 3 Literature review, start work on CASEsim distributed computing and XML configuration 4 Finish basic CASEsim enhancements 5 Start work on load balancing algorithms 6 Select a primary approach and start implementing it 7 Finish implementation and start testing 8 Test and tweak algorithm. 9 Final analysis, paper preparation. 10 Wrapup week. 7 Acknowledgments We would like to thank Dr. Zhang, Dr. Lewis, and the Trinity University computer science department for making this research possible. References [1] Cao, Junwei. Self-Organizing Agents for Grid Load Balancing. Preceedings of the Fifth IEEE/ACM International Workshop on Grid Computing, pp. 388-395. 2004. [2] Ford, L. R; Fulkerson, D. R. Maximal flow through a network. Canadian Journal of Mathematics 1956, 8:399-404 [3] Karp, R.M. Reducibility among combinatorial problems in Complexity of Computer Computations. ed. by R.E. Miller and J.W. Thatcher. pp. 85 103. Plenum Press, 1972. 6

[4] Kernighan, B. W.; Lin, Shen An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal. 1970. 49:291-307 [5] Welch, Aaron. Lopez-Lago, Alejandro. A Comparison of Load Balancing Algorithms for Spatially Oriented Multi-Agent Simulation Frameworks Trinity University, 2009. [6] Zhang, Yu. AI and Multi-Agent Systems. Presentation for Trinity REU students, Trinity University. 2010. [7] Zhang, Yu, et. al. A Multi-Agent Simulation for Social Agents. SpringSim. 2008. 7