Bandwidth Allocation for Best Effort Traffic to Achieve 100% Throughput



Similar documents
All pay auctions with certain and uncertain prizes a comment

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

QoS Mechanisms C HAPTER Introduction. 3.2 Classification

WEB DELAY ANALYSIS AND REDUCTION BY USING LOAD BALANCING OF A DNS-BASED WEB SERVER CLUSTER

Reasoning to Solve Equations and Inequalities

Econ 4721 Money and Banking Problem Set 2 Answer Key

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

How To Network A Smll Business

Graphs on Logarithmic and Semilogarithmic Paper

Efficient load-balancing routing for wireless mesh networks

Performance Prediction of Distributed Load Balancing on Multicomputer Systems

How To Set Up A Network For Your Business

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Research of Flow Allocation Optimization in Hybrid Software Defined Networks Based on Bi-level Programming

Small Business Networking

Traffic Rank Based QoS Routing in Wireless Mesh Network

TITLE THE PRINCIPLES OF COIN-TAP METHOD OF NON-DESTRUCTIVE TESTING

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

Space Vector Pulse Width Modulation Based Induction Motor with V/F Control

Integration by Substitution

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany

Experiment 6: Friction

Protocol Analysis / Analysis of Software Artifacts Kevin Bierhoff

The Velocity Factor of an Insulated Two-Wire Transmission Line

Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist

Small Business Networking

Distributions. (corresponding to the cumulative distribution function for the discrete case).

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

Small Business Networking

Lower Bound for Envy-Free and Truthful Makespan Approximation on Related Machines

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

piecewise Liner SLAs and Performance Timetagment

Integration. 148 Chapter 7 Integration

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics STRATEGIC SECOND SOURCING IN A VERTICAL STRUCTURE

MODULE 3. 0, y = 0 for all y

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

Online Multicommodity Routing with Time Windows

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

Introducing Kashef for Application Monitoring

SPECIAL PRODUCTS AND FACTORIZATION

Small Business Networking

Helicopter Theme and Variations

Basic Analysis of Autarky and Free Trade Models

How To Make A Network More Efficient

Module 2. Analysis of Statically Indeterminate Structures by the Matrix Force Method. Version 2 CE IIT, Kharagpur

Dynamic TDMA Slot Assignment in Ad Hoc Networks

Learner-oriented distance education supporting service system model and applied research

AN ANALYTICAL HIERARCHY PROCESS METHODOLOGY TO EVALUATE IT SOLUTIONS FOR ORGANIZATIONS

2 DIODE CLIPPING and CLAMPING CIRCUITS

Vectors Recap of vectors

T H E S E C U R E T R A N S M I S S I O N P R O T O C O L O F S E N S O R A D H O C N E T W O R K

Lecture 3 Gaussian Probability Distribution

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

PERFORMANCE ANALYSIS FOR BANDWIDTH ALLOCATION IN IEEE BROADBAND WIRELESS NETWORKS USING BMAP QUEUEING

2. Transaction Cost Economics

CHAPTER 11 Numerical Differentiation and Integration

Portfolio approach to information technology security resource allocation decisions

9 CONTINUOUS DISTRIBUTIONS

ENHANCING CUSTOMER EXPERIENCE THROUGH BUSINESS PROCESS IMPROVEMENT: AN APPLICATION OF THE ENHANCED CUSTOMER EXPERIENCE FRAMEWORK (ECEF)

EQUATIONS OF LINES AND PLANES

Section 7-4 Translation of Axes

Enterprise Risk Management Software Buyer s Guide

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Performance analysis model for big data applications in cloud computing

Math 135 Circles and Completing the Square Examples

Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration

Euler Euler Everywhere Using the Euler-Lagrange Equation to Solve Calculus of Variation Problems

Week 11 - Inductance

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

COMPONENTS: COMBINED LOADING

Factoring Polynomials

MATH 150 HOMEWORK 4 SOLUTIONS


Discovering General Logical Network Topologies

A Network Management System for Power-Line Communications and its Verification by Simulation

Operations with Polynomials

belief Propgtion Lgorithm in Nd Pent Penta

4.11 Inner Product Spaces

Project Recovery. . It Can Be Done

Engineer-to-Engineer Note

Application-Level Traffic Monitoring and an Analysis on IP Networks

Reputation management for distributed service-oriented architectures

Revisions published in the University of Innsbruck Bulletin of 18 June 2014, Issue 31, No. 509

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

VoIP for the Small Business

19. The Fermat-Euler Prime Number Theorem

Modeling POMDPs for Generating and Simulating Stock Investment Policies

ORBITAL MANEUVERS USING LOW-THRUST

Health insurance marketplace What to expect in 2014

Deployment Strategy for Mobile Robots with Energy and Timing Constraints

Allocation Strategies of Virtual Resources in Cloud-Computing Networks

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

Lectures 8 and 9 1 Rectangular waveguides

How To Understand The Theory Of Inequlities

EE247 Lecture 4. For simplicity, will start with all pole ladder type filters. Convert to integrator based form- example shown

The mean-variance optimal portfolio

Data replication in mobile computing

Optiml Control of Seril, Multi-Echelon Inventory (E&I) & Mixed Erlng demnds

Transcription:

Bndwidth Alloction for Best Effort Trffic to Achieve 100% Throughput Msoumeh Krimi, Zhuo Sun, nd Deng Pn Florid Interntionl University, Mimi, FL E-mils: {mkri001, zsun003, pnd}@fiu.edu Abstrct Generlized Processor Shring (GPS) is powerful model nd there re mny prcticl scheduling lgorithms tht cn perfectly emulte it. GPS is widely used s n idel firness model to schedule pckets for gurnteed performnce trffic. However, there hs not been wy for GPS to properly hndle the best effort trffic. In this pper, we propose bndwidth lloction scheme for GPS clled Queue Length Proportionl (QLP) in crossbr switches without speedup. QLP dynmiclly obtins fesible bndwidth mtrix to schedule best effort flows. In QLP, the mount of service tht ech flow receives is proportionl to the length of its bcklogged queue. We nlyticlly prove tht QLP is strongly stble nd hence provides 100% throughput for ny dmissible trffic, no mtter whether the trffic distribution is uniform or non-uniform. Moreover, we show tht QLP is fesible, which mens the llocted bndwidth does not exceed the vilble cpcity. We lso discuss how to trck the queue length in GPS. Finlly, we perform simultions to verify the theoreticl results nd to mesure the performnce of QLP. I. INTODUCTION Generlized Processor Shring (GPS) hs long been known s simple nd powerful fluid model for trffic scheduling [1]. All fir queueing lgorithms ultimtely emulte the GPS idel model becuse it chieves perfect firness for pcket scheduling [2], [3], [4]. GPS is theoreticl fluid model nd divides the vilble bndwidth into logiclly independent chnnels. Thus, trffic of ech flow is smoothly trnsmitted through its own exclusive chnnel from the input port to the output port. GPS is widely used s n idel firness model to schedule pckets for gurnteed performnce trffic. However, there hs not been wy for GPS to properly hndle the best effort trffic. The objective of this pper is to ddress the pproprite scheduling of best effort trffic in order to be employed in crossbr switches. Crossbr switches hve received significnt ttention due to the non-blocking cpbility nd lrge bndwidth utiliztion in comprison with bus bsed switches [6], [7]. The chllenge of bndwidth lloction in crossbr switch is how to efficiently shre the vilble cpcity of ech input port nd output port. Simple proportionl bndwidth lloction for shred link is not proper for crossbr switches [8], [9] becuse flows of switch re subject to two bndwidth constrints: the vilble bndwidth t both the input port nd output port of the flow. The scheme should be efficient to fully utilize the vilble bndwidth, nd should be fesible in order to be pplied in prctice. Our motivtion for this work rises from the fct tht in crossbr switches we need to dynmiclly obtin n dmissible bndwidth mtrix to properly hndle best effort trffic. Gurnteed performnce flows reserve resources for n llocted trnsmission rte [10]. However, best effort flows try to mke the best use of the vilble trnsmission cpcity but hve no gurntee to the qulity of service [7]. As cn be seen, the bndwidth lloction scheme plys severl importnt roles in gurnteeing the high performnce of switch [11]. First, the scheme helps to determine the trffic dmission policy nd buffer mngement strtegy. Second, n efficient scheme mkes it possible for switch to chieve 100% throughput. Third, the scheme is used s the scheduling criterion by fir scheduling lgorithms. There re mny fir scheduling lgorithms designed for bndwidth lloction in different crossbr switch rchitectures [2], [3], [12], [13], [14], [15]. However, most of them focused on providing qulity of service for gurnteed performnce trffic nd there is reltively less work on how to better support best effort trffic. In this pper, we present bndwidth lloction scheme for GPS clled Queue Length Proportionl (QLP) to properly hndle best effort trffic in crossbr switches without speedup. In QLP, the mount of service tht ech flow receives, or its dedicted bndwidth, is proportionl to the length of its bcklogged queue. QLP essentilly fvors the queues with the gretest occupncy nd thus ssists the crossbr switch to be more work-conserving. We conduct theoreticl nlysis to prove tht QLP is strongly stble nd therefore chieves 100% throughput for ny dmissible trffic, no mtter whether the trffic distribution is uniform or non-uniform. We lso show tht QLP is fesible, which mens the llocted bndwidth does not exceed the vilble cpcity. Furthermore, we discuss how to trck the queue length is GPS. Lstly, we conduct simultions to verify the nlyticl results nd to mesure the performnce of QLP. The orgniztion of this pper is s follows. In Section II, we present the bstrct switch model nd the QLP bndwidth lloction scheme. In Section III, we theoreticlly nlyze the stbility nd throughput of QLP nd then provide some discussions. We show the simultion results in Section IV. Finlly, we conclude the pper in Section V. II. QUEUE LENGTH POPOTIONAL (QLP) SCHEME In this section, we present our bndwidth lloction scheme. First, we briefly explin the switch model. Then, we describe the QLP bndwidth lloction lgorithm for best effort trffic.

A. The Abstrct Switch Model The considered switch rchitecture includes N input ports nd N output ports, connected by crossbr with no internl speedup. Let In i denote the i th input port nd Out j denote the j th output port. The vilble bndwidth of ech input port nd output port nd lso the crossbr is. Define the trffic from In i destined to Out j to be flow F ij. Use ij (t) to represent the llocted bndwidth of F ij t time t. Denote the queue of pckets t In i destined to Out j s Q ij. B. Algorithm Description In this subsection we present our Queue Length Proportionl (QLP) bndwidth lloction scheme nd investigte its fesibility property. As mentioned erlier, simple proportionl bndwidth lloction policy of GPS does not pply to switches [8], [9], [11]. For GPS server works t fixed bndwidth, the φ rte of j will gurntee the performnce of ech flow, j φj where φ j is the weight of ech flow [1]. However in contrst with single server, flows of switch re subject to two bndwidth constrints: the vilble bndwidth t both the input nd output port of the flow. Nive bndwidth lloction t the output port my mke the flows violte the bndwidth constrints t their input ports, nd vice vers. QLP dynmiclly ssigns the bndwidth to ech best effort flow proportionl to the bcklogged queue length. We use queue length Q ij (t) s dynmic weight of ech flow F ij nd define Q j (t) = i Q ij(t) to be the number of bits queued t ll input ports directed to prticulr Out j t time t, nd Q i (t) = j Q ij(t) to be the number of bits queued t prticulr In i destined to different output ports t time t. We lso denote the llocted bndwidth of flow F ij respecting to the constrint of ech input port nd output port s i.in(t) nd j.out (t), ccordingly. eclling the GPS fluid model, trffic of ech flow cn smoothly strem from the input port to the output port through its own exclusive chnnel, without buffering in the middle, s illustrted in Figure 1. Thus, by considering the bndwidth constrints t both the input port nd output port of ech flow F ij we hve ij.in (t) = Q ij(t) Q i (t) = Q ij(t) j Q ij(t) (1) ij.out (t) = Q ij(t) Q j (t) = Q ij(t) i Q ij(t) (2) In order to void bndwidth violtion t the input port nd the output port, we consider the overll llocted bndwidth ij (t) of F ij to be the minimum [8] between two clculted bndwidths s follows ij(t) = min{ij.in (t), ij.out (t)} (3) = min{ Q ij(t) Q i (t), Q ij(t) } (4) Q j (t) Q ij (t) = mx{q i (t), Q j (t)} (5) As cn be seen, QLP bndwidth lloction scheme essentilly fvors the queues with the gretest occupncy. It trets ll Input 1 Input i 1 = j 1 j Input i = j ij N = j Nj Fig. 1....... F11 Output F 1 i1 F N1 1 = i i1 F Nj F 1 j F 1 N Output F j Output ij F in F NN j = N = i ij i in...... GPS idel fluid model used for scheduling in crossbr switch. queues firly nd ssists the crossbr switch to be more workconserving [20]. One of the dvntges of our scheme is its simplicity. It does not require ny sorting or re-sorting process. Using the clculted informtion of the queue lengths, QLP just needs to find the mximum between the ggregted queue length of flows witing in prticulr input port nd flows directed to specific output port. Now we discuss the importnt properties of QLP, fesibility nd stbility. A bndwidth lloction scheme should efficiently utilize the vilble bndwidth while mintining fesibility. Definition 1: The llocted bndwidth ij (t) of F ij will be fesible if no over-subscription hppens t ny input port or output port. In other words, the ggregted ssigned rte of flows does not exceed the vilble cpcity, i.e. i nd j (6) For esy presenttion, ssume normlized bndwidth; = 1. Now, we show tht our scheme is fesible. According to the QLP description we cn write i = ij(t) = j=1 i=1 j=1 i=1 Q ij (t) mx{ j Q ij(t), i Q ij(t)} j = ij(t) Q ij (t) = mx{ j Q ij(t), i Q ij(t)} Since N j=1 Q ij(t) mx{ j Q ij(t), i Q ij(t)} nd N i=1 Q ij(t) mx{ j Q ij(t), i Q ij(t)}, we hve i 1 nd j 1 (7) which indictes QLP is fesible. We study the stbility of our scheme in the next section. III. PEFOMANCE ANALYSIS A. Stbility nd Throughput In this subection, we theoreticlly prove tht QLP is strongly stble for ny dmissible trffic which implies tht QLP chieves 100% throughput. We first dopt the following definitions presented in [23]. The switch size is shown by P = N N = N 2.

Definition 2: U is the set of vector Y = (Y 1,..., Y P ) nd Y +P, such tht Y (i+jn) 1 j = 0, 1,..., (N 1) (8) i=1 N 1 Y (i+jn) 1 i = 1,..., N (9) j=0 Definition 3: Y is the Eucliden norm of vector Y, i.e., Y = y1 2 +... y2 i... + y2 P = P i=1 (y2 i ). Definition 4: Ŷ is the normlized vector prllel to Y, given tht Y 0, i.e., Ŷ = Y Y = Ỹ Ỹ. Definition 5: Ỹ is the mximl vector prllel to Y, given tht Y 0 nd k, i.e., Ỹ = mx k ky. Definition 6: Ψ Y is the symmetric mtrix ssocited with the projection opertor long the direction of Ŷ, given tht Y 0, i.e., Ψ Y = Ŷ T Ŷ. Property 1: Given tht Y 0, we hve Y Ψ Y = Y, which is strightforwrd result of definition 6. The following vribles re used to represent the sttus of the crossbr switch. Their initil vlues re ssumed to be zero t time t = 0. Q ij (t) is the number of bits buffered in Q ij t time t, belong to flow F ij. Q(t) is the vector of queue lengths t time t, i.e., Q(t) = (Q 11,..., Q ij,..., Q NN ). A ij (t) is the number of bits rriving t Q ij up to time t. A(t) is the vector of the number of rrivls t time t, i.e., A(t) = (A 11,..., A ij,..., A NN ). D ij (t) is the number of bits deprting from Q ij up to time t. D(t) is the vector of the number of deprtures t time t, i.e., D(t) = (D 11,..., D ij,..., D NN ). Assume tht the number of rriving bits A ij (t) stisfies the Strong Lw of Lrge Numbers (SLLN), i.e. A ij (t) lim = λ ij (10) t t where λ ij is the rrivl rte of Q ij. Consider the incoming trffic is dmissible, which mens tht no over-subscription t ny input ports or output ports, i.e., for ll λ ij 0 we hve i, λ ij 1, nd j, λ ij 1 (11) j i Correspondingly, Λ is the vector of the verge rrivl rtes, i.e., Λ = (λ 11,..., λ ij,..., λ NN ), nd due to the dmissible trffic we hve E[A(t)] = Λ. Similr to tht in [25][18], the evolution eqution of the switch for the intervl [t, t + 1] is described s follows Q(t + 1) = Q(t) + A(t) D(t) (12) Before investigting the stbility of our scheme, we first clculte the verge deprture rte s the following lemm. Lemm 1: For crossbr switch without speedup, the verge deprture rte in QLP bndwidth lloction scheme is proportionl to the queue length, i.e. E[D(t)] = Q(t) (13) Proof: According to the previous Section, the dedicted portion of the vilble bndwidth for ech queue Q ij t time t, cn be described s weight fctor w ij (t). It mens tht for flow F ij we cn hve { 0, if Qij (t) = 0 w ij (t) = Q ij(t) mx{ j Qij(t), Qij(t)}, otherwise (14) i which is positive when there is n offered lod Q ij (t) > 0. Consequently, ij (t), the llocted bndwidth of flow F ij Q t time t will be ij (t) mx{ j Qij(t), i Qij(t)}. Obviously, the deprture rte of ech flow will be equl to its llocted bndwidth s follows. D ij (t) = ij(t) = Q ij (t) mx{ j Q ij(t), i Q ij(t)} = Q ij (t) which is proportionl to its queue length t time t. By considering the normlized vilble bndwidth = 1, we hve D ij (t) = Q ij (t), nd thus E[D(t)] = Q(t). In order to prove 100% throughput of our scheme, we introduce the following definition nd lemm from [23][25]. Definition 7: A system of queues is strongly stble if lim sup E[ Q(t) ] < (15) t which implies 100% throughput nd bounded dely gurntee. Lemm 2: Given system of queues whose evolution is described by DTMC with stte vector S(t) N M, nd whose stte spce H is subset of the Crtesin product of denumerble stte spce H Q nd finite stte spce H K, then, if lower bounded function V (Q(t)), clled Lypunov function, V : N P cn be found such tht E[V (Q t+1 ) S(t)] < S(t) nd there exists ɛ +, B + such tht S(t) : Q(t) > B E[V (Q(t + 1)) V (Q(t)) S(t)] < ɛ Q(t) (16) then the system of queues is strongly stble. It mens tht the queue length does not grow infinitely which implies 100% throughput nd bounded verge dely. Proof: We need to show lim t sup E[ Q(t) ] <. For the detiled proof, see theorem 2 in [23]. Now, we present the min theorem for the stbility of QLP. Theorem 1: For crossbr switch without speedup, the QLP bndwidth lloction scheme is strongly stble for ny dmissible trffic, i.e., it chieves 100% throughput. Proof: According to lemm 2 nd eqution (16), we cn define qudrtic Lypunov function V (t) = Q(t)ZQ T (t) s tht in [24][25], if there exists symmetric copositive mtrix Z P P. Similrly, we cn consider Z = I ρψ Λ (17) where I is n identity mtrix, ρ such tht 0 ρ 1, Λ = E[A(t)] is the vector of the verge rrivl rte, nd by definition 6 we hve Ψ Λ = ˆΛ T ˆΛ. It is esy to prove tht Z is positive (semi)definite. We lso ssume tht the stte vector S(t) = Q(t).

Now, we need to prove tht for some ɛ +, B + (B is lrge enough), ρ such tht for Q(t) > B, we hve E[Q(t + 1)ZQ T (t + 1) Q(t)ZQ T (t) Q(t)] < ɛ Q(t) For nottionl convenience, we define the timing index s subscript, e.g., Q t+1 is equivlent to Q(t + 1). Therefore E[Q t+1 ZQ T t+1 Q t ZQ T t Q t ] < ɛ Q t (18) By substituting Q t+1 nd Q T t+1 from the evolution eqution (12) into the left hnd side of (18) we hve E[Q t+1 ZQ T t+1 Q t ZQ T t Q t ] = (19) E[(Q t + A t D t )Z(Q T t + A T t D T t ) Q t ZQ T t Q t ] = E[2(A t D t )Q T t + (A t D t )(A t D t ) T Q t ] (20) For simplicity, we find the limit of (20) when Q t tends to infinity, s follows. E[2(A t D t )ZQ T t + (A t D t )Z(A t D t ) T Q t ] lim Q t Q t As cn be seen, the terms (A t D t ) nd (A t D t ) T re bounded since the number of rrivls nd deprtures in time intervl [t, t + 1] re bounded. Also, we know tht Z is positive (semi)definite mtrix. It mens tht, the limit of E[(A t D t )(A t D t ) T Q t ] Q t when Q t is 0. As result, the remining prt of the eqution will be E[2(A t D t )ZQ T t Q t ] lim Q t Q t By definition 4 nd knowing tht Q T t = Q t, we obtin lim Q t By using (17) in (23), we hve E[2(A t D t )Z( ˆQ T t Q t ) Q t ] Q t (21) = (22) E[2(A t D t )Z ˆQ T t Q t ] (23) 2E[(A t D t )(I ρψ Λ ) ˆQ T t Q t ] (24) By definition 6 nd property 1, (24) cn be written s follows 2(Λ ˆQ T t E[D t ] ˆQ T t ρλ ˆQ T t + ρe[d t ]Ψ Λ ˆQT t ) By replcing the result of lemm 1 for E[D t ], we obtin 2(Λ ˆQ T t (1 ρ) Q t ˆQT t + ρ Q t Ψ Λ ˆQT t ) (25) As cn be seen, (25) is function of ρ nd ˆQ t, i.e., f(ρ, ˆQ t ) = 2(Λ ˆQ T t (1 ρ) Q t ˆQT t + ρ Q t Ψ Λ ˆQT t ) (26) In order to proof the stbility, we need to show tht for the entire domin of ˆQ t, there exists ρ such tht (26) is lwys less thn finite negtive constnt, i.e., Since ˆQ(t) = ρ, ˆQ t Q(t) Q(t) = : f(ρ, ˆQ t ) < ɛ Q(t) is normlized vector, ij Q2 ij (t) for given ρ, the domin of f(ρ, ˆQ t ) is the surfce of the unit sphere such tht ˆQt +P. On the other hnd, for given vector ˆQ t, vrition of f(ρ, ˆQ t ) is liner versus the sclr ρ. Knowing tht 0 ρ 1, we nlyze two cses s below, for ll vlues of ˆQ t. Cse 1. when ρ = 1: f(1, ˆQ t ) = 2( Q t ˆQT t + Q t Ψ Λ ˆQT t ). If ˆQ t is in prllel with Λ, we will hve ˆQ t = ˆΛ, by definition 6 we cn write Ψ Λ = ˆQ T t ˆQ t. Thus f(1, ˆQ t ) = 2( Q t ˆQT t + Q t ˆQT t ˆQt ˆQT t ) = 2 Q t ( ˆQ T t + ˆQ T t ) = 0. If ˆQ t is not in prllel with Λ, we cn find Q t ˆQT t > Q t Ψ Λ ˆQT t. It leds to hve negtive vlue for f(1, ˆQ t ). As result, for ll vlues of ˆQ t we obtin f(ρ, ˆQ t ) ρ=1 = f(1, ˆQ t ) 0. Cse 2. when 0 ρ < 1, or in other words: 1 ρ 1 < 0. To investigte this cse, we cn write f s follows f(ρ, ˆQ t ) = f(ρ, ˆQ t ) ρ=1 +(ρ 1) ρ f(ρ, ˆQ t ), where the prtil derivtive of f cn be found s ρ f(ρ, ˆQ t ) = 2( Λ ˆQ T t + Q t Ψ Λ ˆQT t ). If ˆQ t is in prllel with Λ, i.e., ˆQ t = ˆΛ nd thus Ψ Λ = ˆQ T ˆQ t t. We cn write the prtil derivtive of f s ρ f(ρ, ˆQ t ) = 2( ˆQ t ˆQT t + Q t ˆQT t ˆQt ˆQT t ), which is strictly positive, i.e., ρ f(ρ, ˆQ t ) > 0. Since in cse one we obtined f(1, ˆQ t ) 0, fter comprison with f(ρ, ˆQ t ) = f(1, ˆQ t )+(ρ 1) ρ f(ρ, ˆQ t ) for 0 ρ < 1, we find tht f(ρ, ˆQ t ) < 0. If ˆQ t is not in prllel with Λ, we will hve Λ ˆQ T t < Q t Ψ Λ ˆQT t nd therefore, ρ f(ρ, ˆQ t ) will be strictly positive. Similrly, it cn be shown tht for 0 ρ < 1 we hve f(ρ, ˆQ t ) < 0. Hence, QLP is lwys stble for crossbr switch without speedup for ny dmissible trffic, no mtter whether the trffic distribution is uniform or non-uniform. It mens tht the queue length t input buffers does not grow infinity nd there exists finite upper bound B < t which bcklogged queues will settle. B. Discussions In this subsection, we discuss how to find out the queue length of ech flow in GPS. For trcking the queue length of flow F ij, similr to eqution (12) we cn hve n evolution eqution for queue Q ij during intervl [t 1, t 2 ] s follows Q ij (t 2 ) = Q ij (t 1 ) + A ij (t 2, t 1 ) D ij (t 2, t 1 ) (27) where Q ij (t 1 ) is the remining bcklogged queue of F ij t time t 1. We know tht D ij (t 2, t 1 ), the number of deprted bits from prticulr flow F ij during intervl [t 1, t 2 ] in GPS, cn be obtined s D ij (t 2, t 1 ) = t2 t 1 ij(t)dt (28) Assume tht the llocted bndwidth of F ij is fixed to constnt ij during intervl [t 1, t 2 ], i.e., ij (t) = ij, eqution (28) cn be clculted s t2 D ij (t 2, t 1 ) = ijdt = ij (t 2 t 1 ) (29) t 1 Thus Q ij (t 2 ) = Q ij (t 1 ) + A ij (t 2, t 1 ) (ij (t 2 t 1 )) (30) As cn be seen, the length of ech queue Q ij cn be found during ny intervl [t 1, t 2 ].

IV. SIMULATION ESULTS In this section, we crry out simultions to verify the theoreticl results in Section III, nd to evlute the performnce of QLP. We consider 16 16 crossbr switch. Ech input nd output hs bndwidth of = 1 Gbps, nd the crossbr hs speedup of one. We set the pcket length to be distributed between 40 nd 1,500 bytes [26]. For the destintion of the pckets, we consider both uniform trffic nd non-uniform trffic pttern. For uniform trffic, the destintion of new incoming pcket is uniformly distributed mong ll the output ports, i.e., λ ij = η/n, where η is the effective lod nd N is the switch size. The η tkes one of the 10 possible vlues of [0.1,1] with step of 0.1 For non-uniform trffic, we use the sme model s tht in [27]. The trffic rrivl rte λ ij is defined by i, j nd n { unblnced probbility w s follows. ( ) w + 1 w λ ij (t) = N, if i = j 1 w N, if i j In this cse, the η is fixed to 1 nd w tkes one of the 11 possible vlues of [0,1] with step of 0.1. When w = 0, the trffic rrivl is uniformly distributed mong the outputs, i.e., λ ij (t) = /N. Otherwise, the incoming pckets t In i re more directed to Out j rther thn the other outputs, which is clled the hotspot destintion. A specil cse will hppen when w = 1, i.e., λ ii (t) =. To constrin the burstiness of flow F ij, we consider leky bucket model (η λ ij, σ ij ), where σ ij is the burst size of F ij [7]. We set σ ij of every flow to fixed vlue of 10,000 bytes, nd the burst my rrive t ny time during simultion run. To evlute the performnce of our scheme, We compre our simultion dt with Loclized Independent Pcket Scheduling (LIPS), in which n input port or output port mkes scheduling decisions solely bsed on the stte informtion of its locl crosspoint buffers [6]. We consider four different LIPS implementtion versions with different rbitrtion rules s follows: FP (Fixed Priority) ssigns fixed priority order to ll the virtul queues of the sme input to the sme output nd lwys picks the cndidte with the highest priority; D (ndom) mkes the rbitrtion on rndom bsis; (ound obin) lterntively chooses eligible cndidtes in round robin mnner to void strvtion; OPF (Oldest Pcket First) uses the pcket rrivl time s the rbitrtion criterion, i.e., the pcket rriving erlier hs higher priority. Now, we investigte the results on the throughput nd the verge dely. A. Throughput To verify theorem 1, we present the simultion dt to show tht our scheme chieves 100% throughput. Figure 2() displys the throughput under uniform trffic. It cn be seen tht, the throughput of different schemes grows consistently with the effective lod, nd finlly reches 100% when the effective lod becomes 1, only FP slightly decreses when the lod is the mximum. Figure 2(b) shows the throughput under non-uniform trffic. It clerly reflects the non-monotonic performnce vritions of other methods. As expected, QLP significntly yields higher throughput thn the other schemes when the speedup is one. The other schemes Throughput Throughput 16 16 Switch, Uniform Trffic Lod () 16 16 Switch, Non-uniform Trffic Unblnced Probbility Fig. 2. Throughput of QLP. () With different lods (b) With different unblnced probbilities. chieve the lowest throughput when the unblnced probbility is round 0.5 nd then, throughput is grdully improved until the unblnced probbility becomes 1. In fct t this point, ll the pckets of In i go to Out j nd no scheduling is necessry. The results confirm tht, QLP chieves 100% throughput nd outperforms the other four methods when the speedup is one. B. Averge Dely Next, we study the dely performnce of QLP. We mesure the totl time tht pcket stys in the switch. It is the intervl from the time tht the lst bit of pcket rrives t its input to the time tht the lst bit of the pcket is sent to the output. We plot the verge dely of different schemes in logrithmic scle nd the verge dely is mesured in seconds. Figure 3() displys the verge dely under uniform trffic. As cn be seen, the dely grows grdully when the effective lod increses, nd jumps when the effective lod becomes 1. Surprisingly, for the effective lods greter thn 0.8, the QLP outperforms the other four schemes. Figure 3(b) depicts the verge dely under non-uniform trffic. As expected, QLP shows n optimistic behvior in comprison with the other four methods nd the dely difference is more thn one decde. The verge dely of other schemes increses with the unblnced probbility nd reches to the mximum when the unblnced probbility is round 0.5. Then, the verge dely (b)

Averge Dely (Logrithmic Scle) Averge Dely (Logrithmic Scle) 16 16 Switch, Uniform Trffic Lod () 16 16 Switch, Non-uniform Trffic Unblnced Probbility (b) Fig. 3. Averge Dely of QLP. () With different lods (b) With different unblnced probbilities. of ll schemes drops when the unblnced probbility is equl to 1, becuse t this point ll trffic of n input is destined to the sme output nd no switching is necessry. It is observed tht, different unblnced probbilities do not significntly ffect the verge dely of QLP, which demonstrtes tht our scheme works well under non-uniform trffic. V. CONCLUSIONS In this pper, we hve presented the Queue Length Proportionl (QLP) bndwidth lloction scheme for GPS to schedule best effort trffic in crossbr switches without speedup. In QLP, the mount of service tht ech flow receives is proportionl to the length of its bcklogged queue. QLP essentilly fvors the queues with the gretest occupncy nd thus ssists the crossbr switch to be more work-conserving. By theoreticl nlysis, we hve proved tht QLP is strongly stble nd therefore provides 100% throughput for ny dmissible trffic, no mtter whether the trffic distribution is uniform or nonuniform. We hve lso shown tht QLP is fesible, which mens the llocted bndwidth does not exceed the vilble cpcity. Furthermore, we hve discussed how to trck the queue length in GPS. Finlly, we hve conducted simultions to verify the nlyticl results. EFEENCES [1] A. Prekh nd. Gllger, A generlized processor shring pproch to flow control in integrted services networks: the single node cse, IEEE/ACM Trns. Networking, vol. 1, no. 3, pp. 344-357, Jun. 1993. [2] A. Demers, S. Keshv nd S. Shenker, Anlysis nd simultion of fir queueing lgorithm, ACM SIGCOMM 89, vol. 19, no. 4, pp. 3-12, Austin, TX, Sept. 1989. [3] H. Zhng, WF2Q: worst-cse fir weighted fir queueing, IEEE INFO- COM 96, pp. 120-128, Sn Frncisco, CA, Mr. 1996. [4] J. Xu nd. J. Lipton, On Fundmentl Trdeoffs between Dely Bounds nd Computtionl Complexity in Pcket Scheduling Algorithms, IEEE/ACM Trns. on Netw., vol. 13, no. 1, pp. 15-28, Feb. 2005. [5] S. He, S. Sun, W. Zho, Y. Zheng, nd W. Go, On Gurnteed Smooth Switching for Buffered Crossbr Switches, IEEE/ACM Trnsctions on Networking, vol. 16, no. 3, pp. 718-731, 2008. [6] D. Pn nd Y. Yng, Loclized Independent pcket scheduling for buffered crossbr switches, IEEE Trns. on Comp., vol. 58, Feb. 2009. [7] J. Kurose nd K. oss, Computer networking: top-down pproch, Addison Wesley, 4th edition, 2007. [8] X. Zhng, S.. Mohnty, nd L.N. Bhuyn, Adptive Mx-Min Fir Scheduling in Buffered Crossbr Switches Without Speedup, 26th IEEE Interntionl Conference on Computer Communictions, INFOCOM 07, pp. 454-462, Qulcomm Inc., Sn Diego, My 2007. [9] M.. Hosgrhr nd H. Sethu, Mx-Min Fir Scheduling in Input- Queued Switches, IEEE Trnsctions on Prllel nd Distributed Systems, volume 19, number 4, pp. 462-475, Apr. 2008. [10] Gerld. Ash, Trffic engineering nd QoS optimiztion of integrted voice nd dt networks, Morgn Kufmnn, first edition, 2006. [11] D. Pn nd Y. Yng, Mx-min fir bndwidth lloction lgorithms for pcket switches, IEEE Int. Pr. nd Dist. Processing Symp. (IPDPS), Long Bech, CA, Mr. 2007. [12] S. Chung, S. Iyer, nd N. McKeown, Prcticl lgorithms for performnce gurntees in buffered crossbrs, IEEE INFOCOM 05, Mimi, FL, Mrch 2005. [13] M. Shreedhr nd G. Vrghese, Efficient fir queuing using deficit round robin, IEEE/ACM Trns. Netw., vol. 4, no. 3, pp. 375-385, 1996. [14] N. Ni nd L. Bhuyn, Fir scheduling for input buffered switches, Cluster Computing, vol. 6, no. 2, pp. 105-114, Hinghm, MA, Apr. 2003. [15] X. Zhng nd L. Bhuyn, Deficit round-robin scheduling for inputqueued switches, IEEE Journl on Selected Ares in Communictions, no. 4, pp. 584-594, My 2003. [16] D. Pn nd Y. Yng, Credit bsed fir scheduling for pcket switched networks, IEEE INFOCOM, pp. 843-854, Mimi, FL, Mrch 2005. [17] M. J. Krol, M. J. Hluchyj, nd S. P. Morgn, Input Versus Output Queueing on Spce-Division Pcket Switch, IEEE Trnsctions on Communictions, vol. 35, no. 12, pp. 1347-1356, Dec 1987. [18] N. McKeown, A. Mekkittikul, V. Annthrm nd J. Wlrnd, Achieving 100% throughput in n input queued switch, IEEE Trnsctions on Communictions, vol. 47, no. 8, pp. 1260-1267, 1999. [19] D. Stephens nd H. Zhng, Implementing distributed pcket fir queueing in sclble switch rchitecture, IEEE INFOCOM, Sn Frncisco, CA, Mrch 1998. [20] X. Zhng nd L. Bhuyn, An Efficient Scheduling Algorithm for Combined-Input-Crosspoint-Queued (CICQ) Switches, IEEE GLOBE- COM, Dlls, TX, November 2004. [21] L. Mhmdi nd M. Hmdi, Output queued switch emultion by one-cell-internlly buffered crossbr switch, IEEE GLOBECOM, Sn Frncisco, CA, Dec. 2003. [22] J. Turner, Strong performnce gurntees for synchronous crossbr schedulers, IEEE/ACM Trnsctions on Networking, to pper, 2009. [23] E. Leonrdi, M. Melli, F. Neri, nd M. A. Mrsn, On the stbility of input-queued switches with speed-up, IEEE/ACM Trns. Netw., vol. 9, no. 1, pp. 104-118, 2001. [24] P.. Kumr nd S. P. Meyn, Stbility of queueing networks nd scheduling policies, IEEE Trnsctions on Automt. Control, vol. 40, pp. 251-260, Feb. 1995. [25] M. A. Mrsn, et l. Pcket Scheduling in Input-Queued Cell-Bsed Switches, IEEE INFOCOM, Alsk, USA, April 2001. [26] G. Psss, M. Ktevenis, Pcket Mode Scheduling in Buffered Crossbr (CICQ) Switches, Proc. IEEE Workshop on High Performnce Switching nd outing (HPS 2006), pp. 105-112, Poznn, Polnd, June 2006. [27] ojs-cess, E. Oki, Z. Jing nd H. J. Cho, CIXB-1: Combined input-once-cell-crosspoint buffered switch, IEEE Workshop on High Performnce Switching nd outing, Dlls, TX, July 2001.