Graph Analyi I Network Meaure of the Networked Adaptive Agents



Similar documents
Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

Unit 11 Using Linear Regression to Describe Relationships

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

A note on profit maximization and monotonicity for inbound call centers

Redesigning Ratings: Assessing the Discriminatory Power of Credit Scores under Censoring

A Spam Message Filtering Method: focus on run time

Assessing the Discriminatory Power of Credit Scores

A Note on Profit Maximization and Monotonicity for Inbound Call Centers


Profitability of Loyalty Programs in the Presence of Uncertainty in Customers Valuations

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

Name: SID: Instructions

Project Management Basics

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

Queueing Models for Multiclass Call Centers with Real-Time Anticipated Delays

1 Introduction. Reza Shokri* Privacy Games: Optimal User-Centric Data Obfuscation

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Acceleration-Displacement Crash Pulse Optimisation A New Methodology to Optimise Vehicle Response for Multiple Impact Speeds

Partial optimal labeling search for a NP-hard subclass of (max,+) problems

Research Article An (s, S) Production Inventory Controlled Self-Service Queuing System

Maximizing Acceptance Probability for Active Friending in Online Social Networks

Mobility Improves Coverage of Sensor Networks

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

Mixed Method of Model Reduction for Uncertain Systems

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

Laureate Network Products & Services Copyright 2013 Laureate Education, Inc.

NETWORK TRAFFIC ENGINEERING WITH VARIED LEVELS OF PROTECTION IN THE NEXT GENERATION INTERNET

A technical guide to 2014 key stage 2 to key stage 4 value added measures

Bi-Objective Optimization for the Clinical Trial Supply Chain Management

Socially Optimal Pricing of Cloud Computing Resources

INFORMATION Technology (IT) infrastructure management

CASE STUDY BRIDGE.

Group Mutual Exclusion Based on Priorities

QUANTIFYING THE BULLWHIP EFFECT IN THE SUPPLY CHAIN OF SMALL-SIZED COMPANIES

Return on Investment and Effort Expenditure in the Software Development Environment

Cluster-Aware Cache for Network Attached Storage *

2. METHOD DATA COLLECTION

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

Towards Control-Relevant Forecasting in Supply Chain Management

POSSIBILITIES OF INDIVIDUAL CLAIM RESERVE RISK MODELING

TRADING rules are widely used in financial market as

A model for the relationship between tropical precipitation and column water vapor

Algorithms for Advance Bandwidth Reservation in Media Production Networks

Control Theory based Approach for the Improvement of Integrated Business Process Interoperability

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

Exposure Metering Relating Subject Lighting to Film Exposure

RISK MANAGEMENT POLICY

Utility-Based Flow Control for Sequential Imagery over Wireless Networks

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

BUILT-IN DUAL FREQUENCY ANTENNA WITH AN EMBEDDED CAMERA AND A VERTICAL GROUND PLANE

A Life Contingency Approach for Physical Assets: Create Volatility to Create Value

OPINION PIECE. It s up to the customer to ensure security of the Cloud

MANAGING DATA REPLICATION IN MOBILE AD- HOC NETWORK DATABASES (Invited Paper) *

MSc Financial Economics: International Finance. Bubbles in the Foreign Exchange Market. Anne Sibert. Revised Spring Contents

Network Architecture for Joint Failure Recovery and Traffic Engineering

Assigning Tasks for Efficiency in Hadoop

THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON CRITICAL ILLNESS INSURANCE: A SIMULATION STUDY BASED ON UK BIOBANK ABSTRACT KEYWORDS

AN OVERVIEW ON CLUSTERING METHODS

A New Optimum Jitter Protection for Conversational VoIP

6. Friction, Experiment and Theory

Bidding for Representative Allocations for Display Advertising

CASE STUDY ALLOCATE SOFTWARE

A Duality Model of TCP and Queue Management Algorithms

THE ECONOMIC INCENTIVES OF PROVIDING NETWORK SECURITY SERVICES ON THE INTERNET INFRASTRUCTURE

Unobserved Heterogeneity and Risk in Wage Variance: Does Schooling Provide Earnings Insurance?

THE ECONOMIC INCENTIVES OF PROVIDING NETWORK SECURITY SERVICES ON THE INTERNET INFRASTRUCTURE

Pekka Helkiö, 58490K Antti Seppälä, 63212W Ossi Syd, 63513T

A Review On Software Testing In SDlC And Testing Tools

Simulation of Sensorless Speed Control of Induction Motor Using APFO Technique

Growing Self-Organizing Maps for Surface Reconstruction from Unstructured Point Clouds

Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn

Performance Evaluation and Delay Modelling of VoIP Traffic over Wireless Mesh Network

Growth and Sustainability of Managed Security Services Networks: An Economic Perspective

Health Insurance and Social Welfare. Run Liang. China Center for Economic Research, Peking University, Beijing , China,

G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences

HUMAN CAPITAL AND THE FUTURE OF TRANSITION ECONOMIES * Michael Spagat Royal Holloway, University of London, CEPR and Davidson Institute.

Performance of a Browser-Based JavaScript Bandwidth Test

Growth and Sustainability of Managed Security Services Networks: An Economic Perspective

1 Looking in the wrong place for healthcare improvements: A system dynamics study of an accident and emergency department

Independent Samples T- test

Software Engineering Management: strategic choices in a new decade

Chapter 10 Velocity, Acceleration, and Calculus

CHAPTER 5 BROADBAND CLASS-E AMPLIFIER

Progress 8 measure in 2016, 2017, and Guide for maintained secondary schools, academies and free schools

Office of Tax Analysis U.S. Department of the Treasury. A Dynamic Analysis of Permanent Extension of the President s Tax Relief

1) Assume that the sample is an SRS. The problem state that the subjects were randomly selected.

Online story scheduling in web advertising

SHARESYNC SECURITY FEATURES

RO-BURST: A Robust Virtualization Cost Model for Workload Consolidation over Clouds

DUE to the small size and low cost of a sensor node, a

Brand Equity Net Promoter Scores Versus Mean Scores. Which Presents a Clearer Picture For Action? A Non-Elite Branded University Example.

Morningstar Fixed Income Style Box TM Methodology

Design of Compound Hyperchaotic System with Application in Secure Data Transmission Systems

Simulation of Power Systems Dynamics using Dynamic Phasor Models. Power Systems Laboratory. ETH Zürich Switzerland

Senior Thesis. Horse Play. Optimal Wagers and the Kelly Criterion. Author: Courtney Kempton. Supervisor: Professor Jim Morrow

v = x t = x 2 x 1 t 2 t 1 The average speed of the particle is absolute value of the average velocity and is given Distance travelled t

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

Transcription:

Uing Graph Analyi to Study Network of Adaptive Agent Sherief Abdallah Britih Univerity in Dubai, United Arab Emirate Univerity of Edinburgh, United Kingdom hario@ieee.org ABSTRACT Experimental analyi of network of cooperative learning agent (to verify certain propertie uch a the ytem tability) ha been commonly ued due to the complexity of theoretical analyi in uch cae. Due to the large number of parameter to analyze, reearcher ued metric that ummarize the ytem in few parameter. Since in cooperative ytem the ultimate goal i to optimize ome global metric, reearcher typically analyzed the evolution of the global performance metric over time to verify ytem propertie. For example, if the global metric improve and eventually tabilize, it i conidered a reaonable verification of the ytem tability. The global performance metric, however, overlook an important apect of the ytem: the network tructure. We how an experimental cae tudy where the convergence of the global performance metric i deceiving, hiding an underlying intability in the ytem that later lead to a ignificant drop in performance. To expoe uch intability, we propoe the ue of the graph analyi methodology, where the network tructure i ummarized uing ome network meaure. We develop a new network meaure that ummarize an agent interaction with it neighbor and take the diparity of thee interaction into account. The new meaure i applied to our cae tudy, clearly expoing the intability that wa previouly hidden by the global performance metric. Categorie and Subject Decriptor I.2.6 [Artificial Intelligence]: Learning; I.2.11 [Artificial Intelligence]: Ditributed Artificial Intelligence General Term Experimentation Keyword Simulation and Experimental Verification, Agent Network, Multi-agent Learning, Network Analyi 1. INTRODUCTION Cite a: Proc. of 9th Int. Conf. on Autonomou Agent and Multiagent Sytem (AAMAS 2010) 517-524 c Organizing agent in a network i a common approach to achieve calability in multi-agent ytem [1]. Retricting an agent to interact only with it neighbor implifie the agent deciion making problem (the problem become independent of the overall ytem ize). To cope with an environment that keep changing or that i not known a prior, learning algorithm have been ued to optimize performance in agent network [2, 3, 1]. Multi-agent learning algorithm allow agent to optimize their deciion baed on their interaction with both the environment and their neighbor in the network. Analyzing the dynamic of uch a ytem (coniting of networked adaptive agent) over time i a nontrivial tak, due to the large number of ytem parameter (each agent maintain a et of local parameter controlling it behavior), the concurrency by which thee parameter change (agent acting independently), and the delay in the effect/conequence of parameter change (becaue of communication delay between agent and the time it take for learning algorithm to adapt). A reearcher tudying a large-cale ytem of adaptive agent needed ome metric to ummarize thi ytem (with it large number of parameter) into a few number that are more manageable. In cooperative MAS, the natural choice wa the global performance metric() the ytem i trying to optimize (e.g. payoff). Reearcher inpected the evolution of ome global performance metric a a rough approximation to the internal dynamic of the adaptive agent network. [2, 3, 1]. For example, if the global metric improved over time and eventually appeared to tabilize, it wa uually conidered a reaonable verification of convergence. Example of global performance metric include the percentage of total number of delivered packet in routing problem [4], the average turn around time of tak in tak allocation problem [5], or the average reward (received by agent) in general [6]. The global performance metric, however, overlook an important apect of the ytem: the network tructure. We preent in thi paper an experimental cae tudy where the tability of the global performance metric can hide an underlying intability in the ytem. The hidden intability later lead to a ignificant drop in the global performance metric itelf, but after ome period of (fake) tability. We propoe the ue of graph analyi to tudy the dynamic of networked adaptive agent. Graph analyi i an interdiciplinary reearch field that tudie intereting graph propertie in real-world network. In recent year graph analyi received ignificant attention due to the exploive growth of ocial network and the dicovery of common pattern that govern a wide-range of real world network [7, 8]. 517

Figure 1: Example howing the evolution of a node interaction with it neighbor over time (top) and the correponding continuou degree of that node over time. The core of graph analyi i network meaure: function that ummarize a graph to impler numeric value which are then more manageable to analyze. One can abtract a networked MAS at any point in time a a weighted graph, where an edge weight reflect the amount of interaction over that edge (e.g. the number of meage exchanged). Figure 4(a) how an example network of 4 agent. We can then apply ome network meaure that were developed in graph analyi to ummarize thee weighted graph into meaningful more manageable parameter. However, the bulk of the meaure previouly developed in graph analyi focued on unweighted graph [9]. The lack of weighted network meaure i due, at leat in part, to the difficulty of quantifying uch weight in real-world network (e.g. ocial network). Even if quantified, the accuracy of uch weight are at bet debatable. In our cae here network are artificial and weight can be quantified preciely. In fact, weight can not be ignored, becaue they provide crucial inight into the learning proce going on inide the network of agent. We propoe a new network meaure, which we call the continuou degree. The new meaure take weight into account to determine how many neighbor are effectively being ued by an agent in the network. Figure 1 illutrate how our new meaure ummarize the evolution of a node interaction over time. When initially the agent interact uniformly with all it neighbor, the continuou degree equal 4, the number of neighbor. At the other extreme when the agent interact motly with one neighbor, the continuou degree equal 1. We prove intereting propertie of our propoed meaure, including it connection to the traditional node degree and it ability to capture the diparity of interaction among neighbor. We ue our meaure to expoe and explain the intability reported in the cae tudy we mentioned earlier. In ummary, thi paper make the following contribution: preent a cae tudy illutrating the rik of relying on global performance metric to analyze network of adaptive agent. propoe the ue of graph analyi to analyze network of adaptive agent, and develop a new network meaure that take the diparity of interaction among agent into account illutrate how graph analyi can be ued to explain the oberved behavior in our cae tudy. The paper i organized a follow. Section 2 decribe the cae tudy we will be uing throughout the paper. Section 3 preent our propoed meaure, the continuou degree. Section 4 reviit the cae tudy, illutrating how the continuou degree expoe the intability in the ytem. Section 5 analyze our propoed meaure and prove ome of it propertie. Section 6 dicue the related work. We conclude in Section 7. 2. CASE STUDY: DISTRIBUTED TASK AL- LOCATION PROBLEM (DTAP) We ue a implified verion of the ditributed tak allocation domain (DTAP) [5], where the goal of the ytem i to aign tak to agent uch that the ervice time of each tak i minimized. For illutration, conider the example cenario depicted in Figure 2. Agent A0 receive tak T1, which can be executed by any of the agent A0, A1, A2, A3, and A4. All agent other than agent A4 are overloaded, and therefore the bet option for agent A0 i to forward tak T1 to agent A2 which in turn forward the tak to it left neighbor (A5) until tak T1 reache agent A4. Although agent A0 doe not know that A4 i under-loaded (becaue agent A0 interact only with it immediate neighbor), agent A0 hould eventually learn (through experience and interaction with it neighbor) that ending tak T1 to agent A2 i the bet action without even knowing that agent A4 exit. Figure 2: Tak allocation uing a network of agent. Each time unit, agent make deciion regarding all tak requet received during thi time unit. For each tak, the agent can either execute the tak locally or end the tak to a neighboring agent. If an agent decide to execute the tak locally, the agent add the tak to it local queue, where tak are executed on a firt come firt erve bai, with unlimited queue length. The main goal of DTAP i to reduce the total ervice time, averaged over tak, AT ST = T T τ TST(T ) T τ, where T τ i the et of tak requet received during a time period τ and TST(T ) i the total time a tak T pend in the ytem. The TST(T ) time conit of the time for routing a tak requet through the network, the time the tak requet pend in the local queue, and the time of actually executing the tak. Agent interact via two type of meage. A REQUEST meage i, j, T indicate a requet ent from agent i to agent j requeting the execution of tak T. An UPDATE 518

meage i, j, T, R indicate a feedback (reward ignal) from agent i to agent j that tak T took R time tep to complete (the time tep are computed from the time agent i received T requet). Communication delay i an important property of our verion of DTAP. Each agent ha a phyical location and the communication delay between two agent i proportional to the Euclidean ditance between them (one time unit per ditance unit). Due to communication delay, the effect of an action doe not appear immediately becaue it i communicated via meage and meage take time to route. Not only i the reward delayed but o i any change in the ytem tate. A conequence of communication delay i partial obervability: an agent can not oberve the full ytem tate (the queue at every other agent, meage on link and in queue, etc.). Although the underlying imulator ha different underlying tate (tak at local agent queue, meage in tranit over communication link, etc.), we made agent obliviou to thee tate. The only feedback an agent get i it own reward. Thi implifie the agent deciion problem and reemphaize partial obervability: agent collectively learn a joint policy that make a good compromie over the different unoberved tate. The following ection give a brief overview of the multi-agent learning (MAL) algorithm that we evaluate in our cae tudy and dicue the iue of convergence with repect to MAL. 2.1 Multiagent Learning A large number of multi-agent learning algorithm were propoed that vary in their underlying aumption and target domain [10]. The experimental reult and the analyi we report here focu on two (gradient-acent) multi-agent learning algorithm: GIGA-WoLF [11] and WPL [5], but our methodology can be ued with other algorithm a well. Thee two learning algorithm allow agent to learn tochatic policie, which are better uited for partially-obervable domain [1]. The pecific of WPL and GIGA-WoLF (uch a their update equation, the underlying intuition, their difference and imilaritie) are neither relevant to the purpoe of thi paper nor needed to follow our analyi in Section 4. Neverthele, and for completene, we mention below (very briefly) the equation for updating the policy for the two algorithm. Further detail regarding the two algorithm can be found elewhere [11, 5]. The term convergence, in the reinforcement learning context, refer to the tability of the learning proce (and the underlying model) over time. Similar to ingle agent reinforcement learning algorithm (uch a Q-learning [12]), the convergence of a multi-agent reinforcement learning (MARL) algorithm i an important property that received coniderable attention [13, 11, 14, 15, 16, 5]. However, proving the convergence of a MARL algorithm via theoretical analyi i ignificantly more challenging than proving the convergence in the ingle agent cae. The preence of other agent that are alo learning deem the environment non-tationary, therefore violating a foundational aumption in ingle agent learning. In fact, proving the convergence of a MARL algorithm even in 2-player-2-action ingle-tage game (arguably the implet cla of multi-agent ytem domain) ha been challenging [13, 14, 5]. An agent i uing WPL update it policy π i according to the following equation: j neighbor(i) :Δπ t+1 i (j) { Vi t (π) πi t(j) η πi(j) t if V i t (π) < 0 π i t(j) 1 πi(j) t otherwie π t+1 i projection(πi t +Δπ t+1 i ) where η i a mall learning contant and V i(π) itheexpected reward agent i would get if all agent (including agent i) follow the joint policy π (agent i part of π i it own policy π i). The projection function enure that after adding the gradient Δπ i to the policy, the reulting policy i till valid. An agent i uing GIGA-WoLF update it policy π i according to the following equation: ˆπ t+1 i z t+1 i = projection(π t i + ηv t i (π) t ) = projection(πi t + ηvi t (π)/3) δ t+1 i = min (1, zt+1 i zi t ) z t+1 i ˆπ i t π t+1 i =ˆπ t+1 i + δ t+1 i (z t+1 i ˆπ t+1 i ) The following ection evaluate both WPL and GIGA- WoLF in DTAP uing ATST a the metric of evaluation. 2.2 Stability Under the Global Metric Reult We have evaluated the performance of WPL and GIGA WoLF uing the following etting.100 agent are organized in a 10x10 grid. Communication delay between two adjacent agent i two time unit. Tak arrive at the 4x4 ub-grid (i.e. the 16 agent at the center) at the center at rate 0.5 tak per agent per time unit. All agent can execute a tak with a rate of 0.1 tak/time unit (both tak arrival and ervice duration follow an exponential ditribution). Figure 3 plot the global performance (meaured in term of ATST) of the two multi-agent learning algorithm in the DTAP domain. Jut by looking at the ATST plot, it i relatively afe to conclude that WPL converge quickly while GIGA-WoLF converge after about 75,000 time tep. Thi i conitent with previouly reported reult that were conducted on a much impler verion of DTAP (only 5 agent with no communication delay) [17]. The following ection preent a new network meaure that we later ue to dicover that the apparent tability of GIGA-WoLF i actually mileading. 1800 1600 1400 1200 1000 800 600 400 200 0 0 50000 100000 150000 200000 WPL GIGA WoLF Figure 3: Comparing the average total ervice time for 200,000 time tep of the DTAP problem for WPL and GIGA-WoLF. 519

3. THE CONTINUOUS DEGREE (C-DEGREE) A key meaurement that ha been ued extenively in analyzing network i the degree of a node. A node degree i the number of edge incident to that node. Intuitively, a node degree reflect how connected the node i. Thi imple meaure (along with other network meaure) allowed the dicovery of pattern common in many real world network, uch a the power law of the degree ditribution [18]. To put it more formally, a network (graph) i defined a N = V,E, where V i the et of network node (vertice) and E i the et of edge (link) connecting thee node. The degree of a node v V i k(v) = E(v), where E(v) i the et of edge incident to node v. For a weighted network, we define the function which return the weight of edge e E. Alo for convenience, we define the function W (v) = {W (e) : e E(v)}, i.e. the multiet of weight incident to node v. The degree ditribution P (k) meaure the frequency of a particular degree k in a network (P (k = u) = {v : v V k(v) =u} ) and erve a a common method for ummarizing and characterizing network. One of the limitation of the degree meaure i that it ignore any diparity in the interaction between a node and it neighbor. In other word, the degree meaure aume uniform interaction acro each node neighbor. Thi can reult in giving an incorrect perception of the effective node degree. For example, an agent may have 10 or more neighbor but mainly interact with only two of them. Should that agent be conidered 2 time more connected than an agent with only 5 neighbor but alo interacting primarily with two of them? We introduce in thi paper a new meaure for analyzing weighted network: the continuou degree, orthec-degree. Definition 1. The C-degree of a node v in a network i r(v), where { 0 if v i diconnected r(v) = e E(v) 2( (v) log 2 (v) ) otherwie Where (v) = w W (v) w i called the trength of node v. Intuitively, the quantity repreent the probability of (v) { } an interaction over an edge e. The et (e, ):e E(v) (v) i the interaction probability ditribution for node v. The quantity H(v) = [ ] e E(v) log (v) (v) 2 i then the en- tropy of the interaction probability ditribution, or how many bit are needed to encode the interaction probability ditribution. The entropy quantifie the diparity in the interaction ditribution: the more uniform the interaction ditribution i, the higher the entropy and vice vera. 1 The purpoe of the power 2 i to convert the entropy back to the number of neighbor that are effectively being ued. Figure 4 compare the continuou degree ditribution to the (dicrete) degree ditribution in a imple weighted network of four node. A node on the boundary ha an out degree of 1, while an internal node ha an out degree of 2. Intuitively, however, only one of the internal node i fully utilizing it degree of 2 (the one to the left), while the other 1 Thi i in contrat to the Y meaure, which decreae if the interaction ditribution become more uniform. A brief dicuion of the Y-meaure i given in Section 6 Figure 4: Continuou v dicrete degree ditribution. node (to the right) i motly uing one neighbor only. The C-degree meaure capture thi and how that only one internal node ha a C-degree of 2 while the other internal node ha a C-degree of 1.38. What et our meaure apart from previou work in graph analyi i that it i a continuou generalization of the degree meaure that capture the diparity of interaction (the difference in weight) among the neighbor of a node. In particular, if every node interact with all it neighbor equally, then the C-degree become identical to traditional (dicrete) degree meaure of the ame node. However, if there i a diparity in a node interaction with it neighbor, then the C-degree will capture uch diparity, unlike the traditional degree meaure. We prove thee propertie of the C-degree in Section 5, but before doing o, let u illutrate how the C-degree can help in verifying the tability of a network of adaptive agent. 4. VERIFYING STABILITY USING C-DEGREE Returning to our cae tudy (refer to Section 2), Figure 5 plot the average C-degree over the 100 agent againt time. When agent are uing WPL, the C-degree do tabilize. When agent ue GIGA-WoLF, however, the C-degree doe not converge and continue to decreae. Thi obervation ugget that we need to run the imulation for a longer time period in order to verify GIGA-WoLF tability. Figure 5: The average C-degree of WPL and GIGA- WoLF for 200,000 time tep. When the imulator i allowed to run for 600,000 time 520

tep, the global performance metric (the ATST in the cae of DTAP) of GIGA-WoLF tart lowly to diverge after 250,000 time tep and the correponding C-degree continue to decreae. WPL C-degree remain table, conitent with WPL the global performance metric. 3500 3000 2500 2000 1500 1000 500 0 0 100000 200000 300000 400000 500000 600000 WPL GIGA WoLF Figure 6: Comparing the ATST for 600,000 time tep of the DTAP problem for WPL and GIGA-WoLF. 5. PROPERTIES OF THE C-DEGREE The C-degree atifie three propertie that etablih it connection to the original degree metric (firt two propertie) and it ability to capture the diparity of interaction among the neighbor of a node (the third property). 1. Preerving maximum degree: v V : r(v) k(v). Furthermore, r(v) =k(v) iff e E(v) : = C, where C i ome contant. In other word, the C-degree i maximum and equal the original degree when there i no diparity between weight. 2. Preerving minimum degree: v V : r(v) =0 iff v i diconnected. Furthermore, r(v) =1iff e E(v) : > 0 and e E(v) e e : w(e ) = 0. In other word, the C-degree equal one when all edge, except one edge, have zero weight. 3. Conitent partial order over node: The C-degree impoe a imple partial order that i conitent with the above two propertie. If the two node have the ame number of neighbor, the ame ummation of weight (trength), and the individual edge weight are the ame except for two edge, then the node with more uniform weight ha higher C-degree. A formal definition of thi property i given in Lemma 4. The intuition of the three propertie can be clarified through the following numeric example. Let v1,v2,v3, and v4 be four node where W (v1) = {5, 5, 5, 5},W(v2) = {9, 5, 5, 1},W(v3) = {9, 8, 2, 1} and W (v4) = {20, 0, 0, 0}. The third property then guarantee that the C-degree impoe the ordering: r(v1) >r(v2) >r(v3). All three propertie guarantee that k(v1) = r(v1) >r(v2) >r(v3) >r(v4) = 1. Below we prove the three propertie. Figure 7: The average C-degree of WPL and GIGA- WoLF for 600,000 time tep. The plot of the C-degree tell u that GIGA-WoLF learn a a lower pace than WPL. Furthermore, while agent uing WPL continue to effectively ue (at leat) two neighbor, agent uing GIGA-WoLF almot learn a determinitic policy (interacting primarily with only one neighbor). It i intereting to note that the average payoff for GIGA-WoLF tart to diverge when the average C-degree of GIGA-WoLF fall below the average C-degree of WPL. GIGA-WoLF divergence in thi cae tudy i not urpriing in itelf, ince GIGA-WoLF convergence wa proved only for 2-player-2- action game and GIGA-WoLF wa reported to diverge in ome game with larger number of either player or action [11, 19]. A more in-depth analyi that take the pecific of GIGA-WoLF into account i needed to fully explain the root of GIGA-WoLF divergence in the DTAP domain. Such an analyi i beyond the cope of thi paper and hould not ditract from the main point we are trying to make: the common practice of uing a global performance metric to verify the tability of an adaptive MAS i not reliable and can hide threatening intability. Graph analyi can provide ueful inight in that repect when equipped with informative network meaure, uch a the meaure we have preented. Lemma 2. The C-degree atifie the minimum degree axiom. Proof. When all weight are zero except only one weight that i greater than zero, then the entropy (the exponent of the C-degree) i zero, and therefore the C-degree i 1. Lemma 3. The C-degree atifie the maximum degree axiom. Proof. Under uniform interaction, all the weight incident to a node v are equal to a contant W v. Therefore v V,e E(v) : (v) = W v e E(v) Wv = 1 We then have e E(v) v : r(v) =2 (v) log 2 (v) =2 1 k(v) k(v) log 2 (k(v)) =2 log 2 (k(v)) = k(v) k(v) In other word, both the degree and the C-degree of a node become equivalent under uniform interaction. The C-degree i alo maximum in thi cae, becaue the exponent i the entropy of the interaction ditribution, which i maximum when the interaction i uniform over edge. 521

Lemma 4. The C-degree atifie the conitent partial order property. Proof. Let i, j be two node uch that k(i) = k(j) = n, (i) = (j) =, W (i) W (j) = n 2, {w i1,w i2} = W (i) W (j), {w j1,w j2} = W (j) W (i), and w i1 w i2 < w j1 w j2. Without lo of generality, we can aume that w i1 w i2 and w j1 w j2, therefore w i1 w i2 <w j1 w j2. We alo have w i1 + w i2, therefore w j1 =1 > wi1 w W (i) W (j) w c 2 c wi1 Then from Lemma 5 we have h(c, w i1 wi1 wj1 = wj1 + wj2 wj1 >c = c ) >h(c, w j1 ), or log 2( wi1 wi1 ) (c )log 2(c wi1 ) > log 2( wj1 wj1 ) (c )log 2(c wj1 ) Therefore H(i) > H(j), becaue the ret of the entropy term (correponding to W (i) W (j)) are equal, and conequently r(i) >r(j). Lemma 5. The quantity h(c, x) = x log 2 (x) (C x)log 2 (C x) i ymmetric around and maximized at x = C 2 for C x 0. Proof. h(c, C 2 + δ) = ( C 2 + δ)log 2( C 2 + δ) ( C 2 δ)log 2( C 2 δ) = h(c, C 2 δ) Therefore h(c, x) i ymmetric around c/2. Furthermore, h(c, x) i maximized when h(c, x) =0= 1 log x 2 x +1+log 2 (C x) or log 2 x =log 2 (C x) Therefore h(c, x) i maximized at x = C x = C. 2 The following ection dicue the imilaritie and difference between the C-degree and previouly developed network meaure in the field of graph analyi. 6. RELATED WORK Some reearcher analyzed how individual agent policie co-evolved over time, either theoretically [13, 20, 5] or experimentally [6], a the number of agent increae, but inpecting individual agent policie doe not cale well with the ize of the network. A recent work ued data mining technique to extract pattern from log file of communicated meage between agent [21]. Inpecting individual meage exchanged between the agent become le practical and le ueful a the network get larger. Some of the previou work provided generic framework for analyzing multi-agent ytem, but thee framework limited analyi to a few agent and ignored the underlying network tructure, unlike the work preented here [22, 23]. Mot of the work in analyzing (communication) network and ditributed ytem relied on imple heuritic that are intuitive, eay to undertand, and experimentally verified to work adequately. Large-cale and in-depth analyi of the network behavior received attention recently, auming the underlying node are relatively imple with fixed behavior [24]. Here we are intereted in analyzing network of learning agent, where the dynamic of the network change over time even if the outide world remain unchanged. Surveying all network meaure that were propoed to analyze weighted graph i beyond the cope of thi paper. Intead, we focu on a ample of thee meaure that are motly related to our propoed meaure (the intereted reader may refer to urvey paper on the ubject uch a [9]). The trength of a node (the ummation of weight incident to the node) become identical to the node degree if all weight are equal to 1. The trength meaure, however, fail to capture the diparity of interaction between an individual node and it neighbor (the conitent partial order property). For example, uppoe that a node v 1 ha multiet of incident weight {1, 9}, while another node v 2 ha a multiet of incident weight {5, 5}. Both node will have the ame trength, depite the fact that node v 2 i interacting with it neighbor more uniformly than node v 1. A more recent work [25] analyzed a graph total weight, e E, againt the graph total number of edge, E, over time. That work alo analyzed the degree of a node, k(v), againt the node trength, (v). Thee meaure again fail to capture the diparity in interaction between a node and it neighbor. The network meaure Y (v) = ( ) 2 e E(v) (v) uccefully capture the diparity of interaction within a node v [26]. However, the Y meaure i not a generalization of the degree meaure a it fail to atify the firt two propertie of the C-degree, therefore the Y-meaure i le intuitive than the C-degree. An intereting method for generalizing the node degree i generating an enemble of unweighted network that are ampled from the original weighted network [27] (the weight of an edge reflect the probability of generating the edge in a ample network). The effective node degree i then the average over the ample. While the enemble approach atifie the firt two propertie of the C-degree, it till fail to atify the third property, the conitency in handling diparity. 7. CONCLUSION AND FUTURE WORK In thi paper we preented a cae tudy of 100 networked adaptive agent where the global performance metric can hide an underlying intability in the ytem, and how that thi intability lead to a ignificant drop in performance later on. We propoed the ue of graph analyi to analyze agent interaction over time. We alo developed a new network meaure, the C-degree, that generalized the degree of a node to take weight into account. Our methodology uccefully expoed the hidden intability in the cae tudy. We finally proved that the C-degree of a node reduce to the original (dicrete) degree when the node interact with it neighbor uniformly. Otherwie the C-degree capture the diparity in a node interaction with it neighbor. The analyi preented here, a mentioned earlier, i a preliminary tep that we hope to trigger more reearch that applie network analyi and mining technique to multi-agent 522

ytem. The C-degree, which we propoed here, focue on capturing the diparity of interaction between a node and it neighbor. In domain where the (abolute) quantity of interaction i crucial in undertanding the dynamic, alternative to the C-degree may be more effective. 8. REFERENCES [1] S. Abdallah, V. Leer, Multiagent reinforcement learning and elf-organization in a network of agent, in: Proceeding of the International Joint Conference on Autonomou Agent and Multiagent Sytem, 2007, pp. 1 8. [2] J. A. Boyan, M. L. Littman, Packet routing in dynamically changing network: A reinforcement learning approach, in: Proceeding of the Annual Conference on Advance in Neural Information Proceing Sytem, 1994, pp. 671 678. [3] L. Pehkin, V. Savova, Reinforcement learning for adaptive routing, in: Proceeding of the International Joint Conference on Neural Network, 2002, pp. 1825 1830. [4] Y.-H. Chang, T. Ho, Mobilized ad-hoc network: A reinforcement learning approach, in: Proceeding of the Firt International Conference on Autonomic Computing, 2004, pp. 240 247. [5] S. Abdallah, V. Leer, A multiagent reinforcement learning algorithm with non-linear dynamic, Journal of Artificial Intelligence Reearch 33 (2008) 521 549. [6] M. Ghavamzadeh, S. Mahadevan, R. Makar, Hierarchical multi-agent reinforcement learning, Autonomou Agent and Multi-Agent Sytem 13 (2) (2006) 197 229. [7] M. E. J. Newman, Finding community tructure in network uing the eigenvector of matrice, Phyical Review E 74 (2006) 036104. [8] J. Park, A.-L. Barabai, Ditribution of node characteritic in complex network, Proceeding of the National Academy of Science 104 (2007) 17916 17920. [9] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Complex network: Structure and dynamic, Phyic Report 424 (2006) 175 308. [10] L. Panait, S. Luke, Cooperative multi-agent learning: The tate of the art, Autonomou Agent and Multi-Agent Sytem 11 (3) (2005) 387 434. [11] M. Bowling, Convergence and no-regret in multiagent learning, in: Proceeding of the Annual Conference on Advance in Neural Information Proceing Sytem, 2005, pp. 209 216. [12] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Pre, 1999. [13] M. Bowling, M. Veloo, Multiagent learning uing a variable learning rate, Artificial Intelligence 136 (2) (2002) 215 250. [14] V. Conitzer, T. Sandholm, AWESOME: A general multiagent learning algorithm that converge in elf-play and learn a bet repone againt tationary opponent, Machine Learning 67 (1-2) (2007) 23 43. [15] Y. Shoham, R. Power, T. Grenager, If multi-agent learning i the anwer, what i the quetion?, Artificial Intelligence 171 (7) (2007) 365 377. [16] B. Banerjee, J. Peng, Generalized multiagent learning with performance bound, Autonomou Agent and Multiagent Sytem 15 (3) (2007) 281 312. [17] S. Abdallah, V. Leer, Learning the tak allocation game, in: Proceeding of the International Joint Conference on Autonomou Agent and Multiagent Sytem, 2006, pp. 850 857. [18] A. Clauet, C. Rohilla Shalizi, M. E. J. Newman, Power-law ditribution in empirical data, ArXiv e-printarxiv:0706.1062. [19] M. Bowling, Convergence and no-regret in multiagent learning, Technical Report TR04-11, Univerity of Alberta (2004). [20] P. Vrancx, K. Tuyl, R. Wetra, Switching dynamic of multi-agent learning, in: Proceeding of the 7th International Joint conference on Autonomou Agent and Multiagent Sytem, 2008, pp. 307 313. [21] E. Serrano, J. J. Gómez-Sanz, J. A. Botía, J. Pavón, Intelligent data analyi applied to debug complex oftware ytem, Neurocomputing 72 (13-15) (2009) 2785 2795. [22] J. Jin, R. T. Mahewaran, R. Sanchez, P. Szekely, Vizcript: viualizing complex interaction in multi-agent ytem, in: Proceeding of the 12th international conference on Intelligent uer interface, 2007, pp. 369 372. [23] T. Boe, D. N. Lam, K. S. Barber, Tool for analyzing intelligent agent ytem, Web Intelligence and Agent Sytem 6 (4) (2008) 355 371. [24] V. Paxon, End-to-end routing behavior in the internet, SIGCOMM Computer Communication Review 36 (5) (2006) 41 56. [25] M. McGlohon, L. Akoglu, C. Falouto, Weighted graph and diconnected component: pattern and a generator, in: ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, 2008, pp. 524 532. [26] E. Almaa, B. Kovac, T. Vicek, Z. N. Oltvai, A. L. Barabai, Global organization of metabolic fluxe in the bacterium, echerichia coli, Nature 427 (2004) 839. [27] S. E. Ahnert, D. Garlachelli, T. M. A. Fink, G. Caldarelli, Enemble approach to the analyi of weighted network, Phyic Review E 76 (1) (2007) 016101. 523

524