Paper Performace Evaluatio of the MSMPS Algorithm uder Differet Distributio Traffic Grzegorz Dailewicz ad Marci Dziuba Faculty of Electroics ad Telecommuicatios, Poza Uiversity of Techology, Poza, Polad Abstract I this paper, the Maximal Size Matchig with Permaet Selectio (MSMPS) schedulig algorithm ad its performace evaluatio, uder differet traffic models, are described. I this article, computer simulatio results uder ouiformly, diagoally ad li-diagoally distributed traffic models are preseted. The simulatios was performed for differet switch sizes: 4 4, 8 8 ad. Results for MSMPS algorithm ad for other algorithms well kow i the literature are discussed. All results are preseted for switch size but simulatio results are represetative for other switch sizes. Mea Time Delay ad efficiecy were compared ad cosidered. It is show that our algorithm achieve similar performace results like aother algorithms, but it does ot eed ay additioal calculatios. This iformatio causes that MSMPS algorithm ca be easily implemeted i hardware. Keywords coectio patter, diagoally distributed traffic, li-diagoally distributed traffic, MQL matrix, o-uiformly distributed traffic, switchig fabric.. Itroductio Several well kow schedulig algorithms have bee proposed i the literature [] []. All these algorithms, which are resposible for cofiguratio of a switchig fabric, are very sophisticated ad they achieve a good efficiecy ad short time delay. Durig desigig of a ew algorithm, a theoretical approach is applied. It meas that desigers do ot pay attetio to algorithm implemetatio costraits. Most of well kow algorithms, which achieve the good performace results, are very difficult for implemetatio i the real switchig fabric hardware. This is due to very complicated calculatios which must be performed durig algorithms work. The high calculatios complexity makes this algorithms impractical. Istead, most of the ew geeratio switches ad routers use much simpler schedulig algorithms to cotrol ad cofigure switchig fabric. Oe of this kid of algorithms is MSMPS [7], which achieve the similar performace results like the rest of algorithms but does ot eed to perform a lot of complicated calculatio. Other importat fact, which ifluece o switches ad routers performace, is switchig fabric buffers architecture. I our research we study a switchig fabric with VOQ (Virtual Output Queue) system [], [8]. This bufferig system has bee proposed to solve a HOL (Head of Lie) effect. I VOQ system each switchig fabric iput has a separate queue for a packet directed to particular output of a switchig fabric. Usig this kid of architecture, its performace depeds oly o a good schedulig algorithm. Algorithm should be very fast, achieve the good results (high efficiecy ad short time delay) ad be easy to implemet i hardware. Before each packet will be sed through the switch, it should be decided which packet, from which VOQ will be chose. This decisio is take i each time slot the basic uit of time i simulatio eviromet. To solve this problem i hardware, a few schedulig mechaisms are used. There are three basic methods: radom selectio, first i first out (FIFO) ad roud-robi. I the preseted architecture cetralized schedulig mechaism is used. I this mechaism all decisios cosiderig settig up coectios betwee switchig fabric iputs ad outputs (coectio patters) are made by algorithm or driver implemeted i a separated cotrol module. Driver ca cotrol some coected switchig fabrics located i differet equipmets (i.e., routers). Such solutio ca be used i the ew geeratio etworks for example i Software Defied Networks (SDN) [9]. Routers are resposible for direct packets i data paths but high level decisios (routig) are moved to separate module or device which is located out of routers. Routig decisios are set to routers to execute suitable coectio patters i each switchig fabric of each router. Cetralized schedulig mechaism has a huge advatage over traditioal schedulig mechaism. I todays etwork odes, where 0 Gbit/s ports are used, each time slot is equivalet to the 50 s. This time i ot eough to realize traditioal schedulig mechaism, which based o sedig cotrol sigal. This sigal cosists of three parts: demad, cofirmatio ad acceptace. Nowadays, all algorithms are desiged i such a way, that the umber of cotrol sigals is miimized. The best solutio is sedig oly oe sigal betwee cotrol module ad switchig fabric. All this thigs are fulfilled by MSMPS algorithm. This paper is orgaized as follows. I Sectio, the switch architecture is discussed. I Sectio 3 of this article, all simulatio parameters are explaied. I Sectio 4 traffic distributio models which are used i our research, are described. The i Sectio 5 computer simulatio results uder differet traffic patters are show. Results achieved 74
Performace Evaluatio of the MSMPS Algorithm uder Differet Distributio Traffic for MSMPS algorithm, are compared with aother algorithms well kow i the literature. I Sectio, same coclusio are give.. Switch Architecture The geeral VOQ switch architecture is preseted i Fig. [0]. Fig.. Geeral VOQ switch architecture. I our research we use switchig fabric with iput queuig system (Iput Queued switches), where buffers are placed at the iputs. Each iput has separated queue which is divided ito N idepedet VOQs. The total umber of virtual queues depeds o the umber of iputs ad outputs. It was assumed that i preseted switch, the umber of iputs ad outputs is equal ad i geeral case is N. Based o this assumptio, total umber of VOQs i switchig fabric, with N umber of iputs/outputs, is equal to N. Additioally, each virtual queue is deoted by VOQ (i, j), where i is the iput port umber ad j is the output port umber. It ca be assumed that: 0 i N ad 0 j N. Betwee iputs ad outputs modules, the switchig fabric is placed. I the switchig fabric, there are electrical or optical equipmets which have to be properly cofigured whe all coectios betwee iputs ad outputs are established. Implemeted algorithm is resposible for a proper cofiguratio of metioed equipmets. The most importat module, i preseted symmetrical switch, is schedulig system module. This is a module, where algorithms are implemeted. I the schedulig module all iformatio about queues coditios are stored. It meas that schedulig system has kowledge about umbers of packets waitig i all queues, to be sed through the switch. This iformatio is ecessary to make a right decisio by MSMPS algorithm about coectio patter i the switchig fabric. 3. Algorithm Descriptio MSMPS algorithm is based o permaet coectios patter betwee iputs ad outputs. For example, from Fig., coectio patter for 4 4 switch ca be observed. Fig.. Coectio patter for 4 4 switch. Permaet coectios patter provides fair access to the each output. It meas that all outputs i switch are treated equally. As metioed before, schedulig module has iformatio about VOQ coditios. This iformatio is stored i MQL matrix (Matrix of Queue Legths). This kid of matrix was the easiest way to store this iformatio. Figure 3 shows MQL matrix for 4 4 switch. Fig. 3. MQL matrix for 4 4 switch. Iformatio is updated i each time slot. Each cell (oe positio i matrix) i matrix MQL ad each VOQ has uique address. This correlatio allows attribute oe cell to oe VOQ. For example cell [0;0] correspods to the VOQ (0,0). I cell [0;0] iformatio about umber of packets waitig i VOQ (0,0) are stored. If there is o packets i VOQ, suitable positio i matrix is filled by 0. It ca be see from Fig. 3 ca be observed that matrix has N rows ad N colums. It correspods to the 4 4 switch, which is preseted i our example. Based o permaet coectios ad iformatio, stored i MQL matrix, 75
Grzegorz Dailewicz ad Marci Dziuba MSMPS algorithm makes decisios about coectios to be set up i switchig fabric. The mai purpose is to avoid empty coectios. Empty coectio meas that there is o packets to be sed from a iput to a output. Algorithm gives priority to the most filled VOQs. More details about MSMPS algorithm ca be foud i [7]. 4. Simulatio Coditios I this paper, performace results for some schedulig algorithms, well kow i the literature, ad for MSMPS algorithm are preseted. All graphs are plotted as the results of computer simulatios. Packets are icomig at the iputs accordig to Beroulli arrival model [], []. Uder this model, oly oe packet ca arrive at the iput i each time slot (basic of time uit). It was assumed that oe packet may occupy oly oe time slot. I Beroulli model, probability that packet will arrive at the iput is equal to p, p ε(0 < p ). () Simulatio results are preseted as a mea value of te idepedet simulatio rus. Number of iteratio i oe simulatio ru is equal to 500,000, where the first 30,000 steps are reserved for obtaiig covergece i the simulatio eviromet. It was assumed also that our switchig fabric is strict sese oblockig. It meas that there is always possible to establish coectio betwee each suitable ad idle iput ad suitable ad idle output of the switchig fabric. Performace results cosider the efficiecy ad Mea Time Delay parameters. Efficiecy is parameter which was calculated accordig to Eq. (). Numerator is the umber of packets passed i -th time slots through the switchig fabric. Deomiator is the umber of packets which have arrived to the switch buffers i -th time slot [7]. a q =, () b time slot umber, a umber of packets passed i time slot through the switchig fabric, b umber of packets which ca be sed through the switchig fabric i time slot. Mea Time Delay (MTD) is calculated accordig to Eq. (3). Numerator is a sum of differece betwee time whe a packet is trasferred by the switch ad the time whe the packet has arrived to the buffer system. Deomiator is a total umber of packets served by the switchig fabric. t out t i MT D =, (3) k MT D t i t out k Mea Time Delay, time slots umber, time whe a packet arrived to the VOQ, time whe the same packet is trasferred by the switchig fabric, umber of packets. Three distributed traffic models were take ito accout i this paper. Each of this model determies the probability that packet which appear at the iput, will be directed to the certai output. These cosidered traffic models are described i followig subsectios. 4.. No-uiformly Distributed Traffic The probability of the packet arrivig at the iput i, directed to the output j is preseted i Table. For readability, table shows traffic distributio i 4 4 switch. Aalogous traffic distributio is used for other switch sizes: 8 8 ad. It ca be observed from Table that i this type of traffic model, some outputs have higher probability of beig selected [3]. This probability ca be defied as: p i j ad it ca be calculated accordig to the Eq. 4: p i j (N ) for i = j, for i j. N umber of switch iputs/outputs., (4) Table No-uiformly distributed traffic i 4 4 switch with VOQ Output 0 Output Output Output Iput 0 Iput Iput Iput 3 4.. Diagoally Distributed Traffic I this type of distributio model, the traffic is cocetrated i two diagoals of the table (traffic matrix). The probability that packet is appeared at the suitable iput i ad it will be directed to the output j is equal to p i j =. Probability for the rest of iputs (ot placed i two diagoals) is p i j = 0 [], [4] []. From Table it ca be observed that iput i has packets oly for output i ad for output ((i + (N-)) mod N). 7
Performace Evaluatio of the MSMPS Algorithm uder Differet Distributio Traffic Table Diagoally distributed traffic i 4 4 switch with VOQ Output 0 Output Output Output Iput 0 0 0 Iput 0 0 Iput 0 0 Iput 3 0 0 4.3. Li-diagoally Distributed Traffic Li-diagoally distributed model is a modificatio of diagoally distributed model. Cosidered li-diagoally model ad its probabilities are preseted i Table 3. It ca be see from this table that a load decrease liearly from oe diagoal to the other. I geeral case, probability ca be calculated accordig to the followig formula [7]: N d p d = p (5) N(N + )/ with d = 0,...,N, the p i j = p d if j = (i+d) mod N, ad p d p N d probability of packet arrivig i li-diagoally distributed traffic, probability of packet arrivig i Beroulli, process, umber of switch iputs/outputs, output umber. Fig. 4. The efficiecy for Beroulli arrivals with ouiformly distributed traffic i switches. Fig. 5. The efficiecy for Beroulli arrivals with li-diagoally distributed traffic i switches. Table 3 Li-diagoally distributed traffic i 4 4 switch with VOQ Output 0 Output Output Output 4 Iput 0 0 p 0 p 0 p 0 p 3 Iput 0 p 4 0 p 0 p 0 p 3 4 Iput 0 p 0 p 0 p 0 p 3 Iput 3 0 p 0 p 3 0 p 4 0 p 5. Simulatio Results Aalysis I this sectio performace of the MSMPS algorithm will be compared with aother algorithms for VOQ switches. Up today, several schedulig algorithms are preseted i the literature [] []. It was compared ad aalyzed results for: islip which was preseted i [], Maximal Matchig with Roud-Robi Selectio (MMRRS) [], [3], [4], Hierarchical Roud-Robi Matchig (HRRM) [5] ad Parallel Iterative Matchig (PIM) []. The efficiecy is plotted i Figs. 4, 5 ad. This parameter was calculated accordig to Eq.. Similarly as Fig.. The efficiecy for Beroulli arrivals with diagoally distributed traffic i switches. for MTD, results oly for switch size are preseted. From Figs. 4 ad 5 it ca be observed that for low traffic load (betwee 0 0%) our algorithm achieve the worst results compared to other algorithms. Coducted simulatios cofirm, that MSMPS algorithm ca ot cope with low traffic load for differet traffic models. The reaso is that our algorithm focused very much o access aligmet for all outputs, istead of avoidig of empty coectios. Coectios where o packets are to be sed through the switch [7]. Above 0% load, efficiecy of MSMPS algo- 77
Grzegorz Dailewicz ad Marci Dziuba rithm icreases ad reaches mea value about 0.95 with growig tedecy. Differet pheomea ca be observed for other algorithms. All of them maitai efficiecy o a high level about. But above 0% load, PIM ad islip rapidly decreases with ouiformly ad li-diagoally distributed traffic. Oly MMRRS maitai efficiecy about for both metioed traffic distributios. It looks differet with diagoally distributed traffic. Efficiecy for MSMPS algorithm systematically decreases for over 40% load, efficiecy is uder 0.9. This type of distributio caused that Fig. 7. The MTD for Beroulli arrivals with ouiformly distributed traffic i switches. packets are cocetrated i two diagoals of the traffic matrix (Table ). For this traffic model our algorithm achieve the worst results. The MTD is a fuctio of traffic load ad is plotted i Figs. 7, 8 ad 9. MTD is measured i time slots, where oe slot is the basic of time uit i preseted system. Computer simulatios were performed for differet switch sizes. Oly the results for switch size are show. The authors assume that the iput buffers are ifiitely log, ad have preseted results for Beroulli arrivals with differet distributio traffic. From Fig. 7 it ca be see that for ouiformly distributed traffic MSMPS algorithm achieve the best results (the lowest MTD) compared to other algorithms. Up to 75% load, oly HRRM algorithm achieve similar results. The highest MTD, for this type of distributio, has reached MMRRS algorithm. For 0% load, MMRRS algorithm has already achieved 4 cells delay, whe the rest of algorithms reached results close to 0. Very similar results are achieved by all algorithms with li-diagoally distributio traffic Fig. 8. MSMPS algorithm achieve almost the same results like for ouiformly distributio. The same situatio ca be observed with MMRRS algorithm. Iterestig situatio occurred above 0% load, whe MTD for PIM ad islip algorithm rapidly icrease. It ca be caused by arbiters sychroizatio problem. From the Fig. 9, with results for diagoal distributio traffic, it ca be see that MTD for our algorithm rapidly icreased. This is due to our algorithm based o permaet coectio patters ad for high load some outputs are blocked. Accordig to this fact, to much empty coectios are established. This effect ca be elimiated by set up coectios (betwee iputs ad outputs) for more tha oe time slot. Acceptable results are reached by MMRRS algorithm which behave extremely well for diagoal distributio traffic.. Coclusios ad Future Work Fig. 8. The MTD for Beroulli arrivals with li-diagoally distributed traffic i switches. I this paper, performace results for MSMPS schedulig algorithm for VOQ switches uder differet traffic patters were show ad described. Its performace cofirms that MSMPS algorithm ca be used i practice. This algorithm achieved high efficiecy ad i the same time low latecy is provided. I the ext studies, implemetatio of MSMPS algorithm i separate chips or i the switchig fabric equipmet will be discussed. Our algorithm works i simply way ad there is o additioal calculatio eeded. MSMPS algorithm ca be also modified to support differet traffic priorities ad switch architectures. Ackowledgemets Fig. 9. The MTD for Beroulli arrivals with diagoally distributed traffic i switches The work described i this paper was fiaced from the research fudig as research grat UMO-0/0/B/ ST7/03959. 78
Performace Evaluatio of the MSMPS Algorithm uder Differet Distributio Traffic Refereces [] N. McKeow, The islip schedulig algorithm for iput-queued switches, IEEE/ACM Tras. Netw., vol. 7, pp. 88 00, 999. [] A. Baraowska ad W. Kabaciński, The ew packet schedulig algorithms for VOQ switches, i Telecommuicatios ad Networkig ICT 004, J. Neuma de Souza, P. Dii, ad P. Lorez, Eds. LNCS 34, pp. 7 7. Spriger, 004. [3] A. Baraowska ad W. Kabaciński, MMRS ad MMRRS packet schedulig algorithms for VOQ switches, i Proc. MMB & PGTS 004 th GI/ITG Cof. Measur. Eval. Comp. Commu. Sys. (MMB) & 3rd Polish-Germa Teletr. Symp. (PGTS), Dresde, Germay, 004. [4] A. Baraowska ad W. Kabaciński, Evaluatio of MMRS ad MM- RRS packet schedulig algorithms for VOQ switches uder bursty packet arrivals, i Proc. Worksh. High Perfor. Switch. Rout. HPSR 005, Hog Kog, Chia, 005, pp. 37 33. [5] A. Baraowska ad W. Kabaciński, Hierarchical roud-robi matchig for virtual output queuig switches, i Proc. Adv. Idustr. Cof. Telecommu. AICT 005, Lisbo, Portugal, 005, pp. 9 0. [] T. Aderso et al., High-speed switch schedulig for local-area etworks, ACM Tras. Comp. Sys., vol., o. 4, pp. 39 35, 993. [7] G. Dailewicz ad M. Dziuba, The ew MSMPS packet schedulig algorithm for VOQ switches, i Proc. 8th IEEE, IET It. Symp. Commu. Sys. Netw. Digit. Sig. Process. CSNDSP 0, Pozań, Polad, 0. [8] Y. Tamir ad G. Frazier, High performace multiqueue buffers for VLSI commuicatio switches, i Proc. 5th A. It. Symp. Comp. architec. ISCA 988, Hoolulu, Hawaii, USA, 988, pp. 343 354. [9] Myug-Ki Shi, Ki-Hyuk Nam, ad Hyoug-Ju Kim, Softwaredefied etworkig (SDN): A referece architecture ad ope APIs, i Proc. It. Cof. ICT Coverg. ICTC 0, Jeju, Korea, 0. [0] A. Baraowska ad W. Kabaciński, Hierarchiczy algorytm plaowaia przesyłaia pakietów dla przełączika z VOQ, i Pozańskie Warsztaty Telekomuikacyje PWT 004, Pozań, Polad, 004 (i Polish). [] H. Joatha Chao ad B. Liu, High Performace Switches ad Routers. New Jersey: Wiley, 007, pp. 95 97. [] P. Giaccoe, D. Shah, ad S. Prabhakar, A implemetable parallel scheduler for iput-queued switches, IEEE Micro, vol., o., pp. 9 5, 00. [3] K. Yoshigoe ad K. J. Christese, A evolutio to crossbar switches with virtual ouptut queuig ad buffered cross poits, IEEE Network, vol. 7, o. 5, pp. 48 5, 003. [4] D. Shah, P. Giaccoe, ad B. Prabhakar, Efficet radomized algorithms for iput-queued switch schedulig, IEEE Micro, vol., o., pp. 0 8, 00. [5] P. Giaccoe, B. Prabhakar, ad D. Shah, Radomized schedulig algorithms for high-aggregate badwidth switches, IEEE J. Selec. Areas Commu., vol., o. 4, pp. 54 559, 003. [] Y. Jiag ad M. Hamdi, A fully desychroized roud-robi matchig scheduler for a VOQ packet switch architecture, i Proc. IEEE Worksh. High Perform. Switch. Routig HPSR 00, Dallas, TX, USA, 00, pp. 407 4. [7] A. Biaco, P. Giaccoe, E. Leoardi, ad F. Neri, A framework for differetial frame-based matchig algorithms i iput-queued switches, i Proc. 3rd A. Joit Cof. IEEE Comp. Commu. Soc. IEEE INFOCOM 004, Hog Kog, Chia, 004. Grzegorz Dailewicz received the M.Sc. ad Ph.D. degrees i Telecommuicatios from the Poza Uiversity of Techology, Polad, i 993 ad 00, respectively. Sice 993, he has bee workig i the Istitute of Electroics, Poza Uiversity of Techology. Curretly he is a Professor of Poza Uiversity of Techology ad workig as a Vice Dea of the Faculty of Electroics ad Telecommuicatios. His scietific iterests cover photoic broadbad switchig systems with special regard to the realizatio of multicast coectios i such systems. He has published about 40 papers. E-mail: Grzegorz.Dailewicz@et.put.poza.pl Chair of Commuicatio ad Computer Networks Faculty of Electroics ad Telecommuicatios Poza Uiversity of Techology Polaka st 3 0-95 Poza, Polad Marci Dziuba received the M.Sc. degree i Computer Sciece ad Robotics from the Poza Uiversity of Techology, Polad, i 00. Sice 00, he is a Ph.D. studet at Poza Uiversity of Techology, Chair of Commuicatio ad Computer Networks Faculty of Electroics ad Telecommuicatios. E-mail: Marci.Dziuba@put.poza.pl Chair of Commuicatio ad Computer Networks Faculty of Electroics ad Telecommuicatios Poza Uiversity of Techology Polaka st 3 0-95 Poza, Polad 79