A Stgmergy Approach for Open Source Software Developer Communty Smulaton Xaohu Cu, Justn Beaver, Jm Treawell an Thomas Potok Oak Rge Natonal Laboratory Oak Rge, TN 37831 Laura Pullum Lockhee Martn Corporaton St. Paul, MN 55164-055 Abstract The stgmergy collaboraton approach proves a hypothesze explanaton about how onlne groups work together. In ths research, we presente a stgmergy approach for bulng an agent base open source software (OSS) eveloper communty collaboraton smulaton. We use group of actors who collaborate on OSS projects as our frame of reference an nvestgate how the choces actors make n contrbuton ther work on the projects etermnate the global status of the whole OSS projects. In our smulaton, the forum posts an project coes serve as the gtal pheromone an the mofe Perre-Paul Grasse pheromone moel s use for computng eveloper agent behavors selecton probablty. Keywors-component; OSS, Pheromone, Agent, Moel I. INTRODUCTION In most onlne collaboratve groups, the onlne envronment proves a share meum for storng nformaton so that t can be nterprete by other nvuals later. Ths store nformaton can be represente as stgmergy sgnals, servng as long-term memory for the group. The stgmergy collaboraton approach proves a hypothesze explanaton about how these onlne groups work together. The open source software (OSS) eveloper communty s a new kn of onlne software evelopment group where partcpants can rea, mofy, an restrbute software source coe wthout cost. Gven the sgnfcant amount of etale nformaton avalable on ther nternal structure, OSS evelopment communtes offer a unque opportunty to unerstan the large self-organze entty. In ths paper, we present a stgmergy approach [1] for examnng how the behavor of nvuals affects the emergence of an OSS project s global status. II. STIGMERGY COLLABORATION IN OSS COMMUNITIES The stgmergy term was frst propose by Perre-Paul Grasse n the 1950s n conjuncton wth hs research on termtes []. Grasse showe that a partcular confguraton of a termte colony s envronment can trggere a termte to mofy ts envronment (rop a mu n a partcular place for bulng or mantanng the nest). The mofcaton n turn stmulates the orgnal or other termtes n colony to further transform ts envronment. Grasse mae a general efnton of stgmergy as: the stmulaton of the workers by the very performances they have acheve. The stgmergy process has been observe n termtes, ants, bees, an wasps n a we range of actvtes. In termte colony, a hghly complex termte nest s not cause by the net bulng knowlege of nvual termte. It s just a collectve behavor result from large numbers of nvual termtes performng extraornarly smple actons n response to ther local envronment. There are no rect communcatons between termte workers for coornatng ther nest bulng actons. The mofe envronment cause by termte smple actons serve as the coornate sgnals. The state of the nest structure trggers some behavors, whch then mofy the nest structure an trgger new behavors untl the constructon s over. The concept of stgmergy proves a theory for explanng how sparate, strbute, a hoc contrbutons from nvuals coul lea to the emergence of large collaboratve enterprses. Whle people are more ntellgent than socal nsects, nvuals n open software evelopment groups use essentally the same stgmergc mechansm for collaboratng. Mockus et al. [3] pont out, OSS evelopers rarely f ever meet face-to-face or even va telephone. Partcpants n OSS projects manly engage n onlne scusson forums or threae emal messages as a central way to observe, partcpate n, an contrbute to publc scussons of topcs of nterest to ongong project partcpants [4]. The newly evelope software source coes are uploae to the communty webste for beng scrutnze by the member of the communty. Any bug, error or lackng functonalty wll be pont out an entce communty member to take up the short come. These communcaton messages functons as stgmergy sgnals, albet a much more ynamc one than the mu use by termtes. The stgmergy sgnal can serve as a long-term memory for the group. It can be pcke up by any nvual at any tme. III. RELATED WORKS In most onlne hostng envronments, the project relate actons are logge an the log nformaton can later be mne to unerstan the communty structure an nteracton patterns. Ths log ata prove enormous etal nformaton for analyzng [5, 6]. Crowston, et al. [7] propose a moel for effectve work practces n OSS evelopment. The moel was base largely on an exstng moel of group
effectveness ntally propose by Hackman [8] n 1986. Smth, et al. [9] presente an agent-base OSS smulaton moel that nclues software moules complexty, the software's ftness for purpose, the motvaton of evelopers, an the role of users n esgnng requrements. In the researches of how the OSS evelopers collaborate, researches conser the OSS movement as a self-organzng system an a collaboratve socal network [10]. Socal network analyss was use for analyzng OSS communty structure. Actor relatonshps are represente as noes an lnks. The actor can be user or eveloper. Every noe represents an actor wthn network; lnk(,j) enotes socal te between actors an j. However, n OSS communty, the rect connecton between actors s unusual. They exchange the nformaton through the forum or emal-lst nrectly. Most of tme, the actor even on t know who s hs message recever before he sen out hs message. Ellott [1] argue that collaboraton n small groups (roughly -5) reles upon socal negotaton to evolve an gue ts process an creatve output. Collaboraton n large groups (roughly 5-n) s enable by stgmergy. Heylghen [11] propose to stngush the stgmergy n OSS communty as rect an nrect. In OSS evelopment, the unfnshe jobs, serve as the rect stgmergy, whch stmulate other actors to come to fnsh the jobs. Inrect stgmergy can be recognze n forums where bugs or functon requests are poste. These forums are regularly consulte by the evelopers, thus attractng ther attenton to tasks that seem worth performng. IV. TECHNICAL APPROACH Our approach to reproucng the complex envronment of OSS software evelopment communty was to evelop an agent-base stgmergy collaboraton moel. The moel wll represent how the OSS communty collaboraton an how each nvual eveloper s chose of what software element to evelop an how many effort to contrbute wll mpact the status of the whole OSS project. Our hypothess s the collaboratons of nvual OSS evelopers an users are stgmergy collaboratons. The forum posts, emal lst an unfnshe source coes serve as the gtal stgmergy. The peer-to-peer communcatons between nvual OSS members are not occurrng very often. The measures use n ths research for representng OSS project status are lste n Table 1. To smplfer the smulaton, we assume there are only two kns of agents n the smulaton, the eveloper an the user. User agent can change to eveloper agent f they want to. The agents o not nteract wth each other rectly. Instea, they go through the forums for nformaton exchange. There are two kns of forums, the publc forum an the eveloper forum. The publc forum can be access an post message by any agents who are nterestng n the software project. Most of tme, t serve as the message exchange boar between users an evelopers. Users can post questons about how to use the software, bugs they foun urng the software usng an functons they wsh to be nclue n the software. Each problem wll be represente by one forum threa. The other users an the evelopers occasonally go through the forums, answer the questons an get the frst han nformaton about the bug problems an the wsh lst functons. TABLE I. OSS Communty Status Metrc Developers Software Releases Downloas OSS COMMUNITY STATUS MEASURES Font sze an style A count of the group s core membershp A count of the number of orchestrate actons that the group has performe Measures the egree to whch the group s actons are foun to be useful n the communty In ths smulaton, the evelopers use a mofe Ant Colony Algorthm moel to choose whch forum threa problem he/she wll contrbute to solve. In ths algorthm, each forum threa serves as one potental gtal tral to fferent software evelopment rectons an the post messages n ths threa represent the gtal pheromone la own. Every tme, when user or eveloper posts a new message n ths forum threa, a new pheromone s epost on the tral. The pheromone content of a forum threa can be upate an ecaye. Pheromone upate: when a message s poste n a forum threa, the pheromone for ths threa s ncremente by a constant, γ. The nomnal value of γ s one. Equaton (1) escrbes the pheromone upate proceure when a message s pose by actor a n a post threa at tme t. t+ t P 1 = P + γ (1) Pheromone ecay: to account for pheromone ecay, each threa pheromone values are perocally multple by the ecay factor, ε-τ. The ecay rate s τ>=0. A hgh ecay rate wll quckly reuce the amount of remanng pheromone, whle a low pheromone ecay rate wll egrae the pheromone value slowly. The nomnal pheromone ecay nterval s one ay; we call t ecay pero. Equaton () escrbes pheromone ecay. t + 1 t P = *ε τ P If no message has been poste n a threa n qute some tme, the pheromone for ths a threa wll be ecay to a near zero value. The threa wll be remove from the eveloper s potental threa select recton. Accorng to the pheromone theory, forum users wll most lkely jon an post message n the threa that has hghest pheromone content. Threa selecton: Agents wll ranomly chose a threa base on the amount of pheromone present on each forum ()
threa. The equaton (3) escrbes the threa s probablty ρ beng chosen. U t = θ * N = 1 u * p Z (5) ρ = N ( P = 1 t ( P + K) t F + K) N s the total number of forum threas. The constants F an K are use to tune the behavor of forum users. The value of K etermnes the senstvty of the probablty calculatons of small amounts of pheromone. If K s large, then large amounts of pheromone wll have to be present before an apprecable effect wll be seen n the message postng probablty. The nomnal value of K s zero. Smlarly, F may be use to moulate the fferences between pheromone amounts. For example, F > 1 wll accentuate fferences between lnks, whle F < 1 wll eemphasze them. F = 1 yels a smple normalzaton. The nomnal value of F s two. Another forum n the smulaton s eveloper forum. It serves as the nternal forum an use by evelopers post ther ongong works. It represents the emal-lst an SVN repostory n real OSS project. Each eveloper s contrbuton s stmulate by other evelopers ongong works poste n the eveloper forum. The probablty c for eveloper contnually contrbute on a software element evelopng s moele as termte mu rop probablty an s gven by: f ( ) c = n *( k + f ( ) 1 δ (, j) f ( ) = max(0.0, (1 ) σ α δ (, j) [0,1] j L Here, s the ssmlarty value between eveloper contrbute post an the eveloper j contrbute post message n emal-lst an SVN n repostory, s the message length. α [0,1] s a ataepenent scalng parameter, an σ s the total number of post messages n eveloper forum n pre-efne tme pero L [1,15]. The equaton (3) an equaton (4) ece the behavor of agents n the smulaton. These nvual s behavors can change the OSS project status. In our smulaton, one of the measures for the status of the OSS project, the software utlty (number of ownloas), s moele by the equaton (5). U t represents the number of the software been ownloae each ay. F (3) ), (4) s constant. u s the number of users who post messages n forum threa. p s the number of evelopers who post messages n forum threa, an Z s the total number of users an evelopers n the forum. The membershp measure (number of evelopers), s P moele by the equaton (6). t represente the ncreasng rate of the number of the evelopers. P t = α * u N = 1 Z * p α s constant. Other parameters are same as equaton (5). V. EXPERIMENTS We use etale OSS communty log ata on SourceForge to llustrate an valate the moel s theoretcal mechansms. SourceForge, an onlne center for OSS evelopment communtes, proves collaboratve resources for approxmately 00,000 projects. Ths ata consste of all the actvty nformaton of OSS software evelopers an users that regstere on SourceForge from 003 to 008. We evelope scrpts that query the SourceForge Research Data Archve hoste by the Unversty of Notre Dame [6], for project ata that meet our crtera. We were ntereste n OSS projects that reache a mnmum team sze of 0 evelopers at some pont n the project lfetme n orer to nsure that the socal an techncal factors were well represente. In aton, we elmnate projects that not appear to use the SourceForge collaboraton tools as a sgnfcant means for communcaton an coornaton. In all, we entfe 67 projects as vable for use as tranng ata for our OSS moel. From these ata, we can reconstruct how the local behavor of settng up the project teams that create nvuals le to the emergence of the project global status measures. A. Determnng Moel s Accuracy of Ft to Emprcal Data We frst trane the propose agent-base moel wth the ata collecte from the log fles of OSS project web ste an then compare the closeness of the smulate results to the emprcal ata. In ths exercse, the moel was tune wth actual OSS project ata for 9 tme slces (one tme slce = one month), an then smulate for 1, 3, 6, 9, an 1 atonal tme slces. At each nterval, an assessment of the Accuracy of Ft s recore. The Accuracy of Ft measure s a quanttatve etermnaton of how well a moel represents the unerlyng ata. It s a measure that ncates the correctness of the moel wth respect to the ata that was (6)
use to construct the moel. The Accuracy of Ft s etermne through an analyss of the Equalty of Means an the Equalty of Varances between the moele values of the OSS project measures an the assocate actual OSS project measures values. The Equalty of Means Hypothess Test [17] quantfes the confence that the mean values of two gven atasets are equvalent. The equaton s shown n Fgure 1. By comparng the Equalty of Means between actual open source software communty status values an the status value generate by OSS communty smulaton, the ablty of the OSS moel to accurately characterze the unerlyng ata set s reveale. If the Test Statstc for the gven qualty measure s less than the crtcal value for that measure, then the null hypothess must be accepte, the means are etermne to be equvalent, an the moel s sa to prove an accurate ft for the unerlyng ata. For ths stuy, a confence of =0.9, or 90% was use for all Equalty of Means calculatons. Thus, there s 90% confence that all Equalty of Means etermnatons are correct. Fgure 1. Hypothess Test for Equalty of Means. The approach to calculatng the Equalty of Varances s to use a textbook rule of thumb test recommene n [18], where n comparng the moele ata to the actual ata, f the rato of the maxmum varance value to the mnmum varance value must be less than three to conser the varances equvalent. The applcaton of ths rule of thumb s approprate n that t bouns the relatonshp of the varances. The goal for the Equalty of Varances s not to get a quantfable confence on the accuracy of the moele ata (whch s alreay accomplshe through the Equalty of Means test), but to get a screte ncaton that the varance of the moele ata s on the orer of the varance of the actual ata. Fgure. Equalty of Means Valaton. B. Results The results of applyng the Accuracy of Ft statstcal tests, Equalty of Means, an Equalty of Varances are shown n Fgure an Fgure 3. The Equalty of Means analyss, shown n Fgure, hghlghts the statstcal threshol for ths test as a black lne. The nterpretaton of ths graph s that those ata ponts below the threshol lne ncate that the smulaton was able to mantan a mean value consstent wth the unerlyng ata for those metrcs. Smlarly, ata ponts above the threshol ncate that the smulaton verge from the strbuton upon whch the smulaton s base. Fgure shows that the mean values of all three OSS communty status measures were consstent wth ther actual mean values for up to three tme slces (three months). In the case of Group Membershp, the mean value was consstent for nne smulate tme slces. Thus the moel performe relate well n remanng consstent wth the unerlyng mean values of the ata for short-term smulaton, but became less consstent as the smulaton progresse. The results of the Equalty of Varances test, shown n Fgure 3, are smlar to the Equalty of Means n ther nterpretaton. Values above the threshol lne ncate ponts of unequal varance between actual an smulate ata, an values below the threshol ncate ponts of consstency across the varances. The moel well at smulatng the varances for moele values consstent wth the unerlyng OSS ata.
Fgure 3. Equalty of Varances Valaton. VI. CONCLUSION We use group of agents who collaborate on projects through forums as our frame of reference an nvestgate how the choces agents make n contrbuton ther work on the projects etermnate the global status of the whole OSS projects. Our hypothess s that the collaboratons of nvual OSS evelopers an users are stgmergy collaboratons. We propose a stgmergy collaboraton OSS moel to prouce a smulaton that accurately represents the collaboraton n an OSS communty. We compare the smulate output to emprcally observe behavors n fferent OSS project forums. The smulaton s able to partally reprouce the forum evoluton tren n many OSS projects. The closeness of the smulate results to the emprcal ata ncates that our moel may reflect the processes that occur n OSS evoluton. Our next step wll be extenng the research moel to other self organze communtes. Our hypothess s f two systems obey the same mathematcal laws, we can perform experments on one system an nfer how another system mght behave uner smlar contons. ACKNOWLEDGMENT Prepare by Oak Rge Natonal Laboratory, P.O. Box 008, Oak Rge, Tennessee 37831-685, manage by UT- Battelle, LLC, for the U.S. Department of Energy uner contract DE-AC05-00OR75; an prepare by Lockhee Martn Company. The vews an conclusons contane n ths ocument are those of the authors an shoul not be nterprete as representng the offcal polces, ether expresse or mple, of the Oak Rge Natonal Laboratory, Lockhee Martn Company, the Department of Energy or the U.S. government. [] Dorgo, M., E. Bonabeau, an G. Theraulaz, Ant algorthms an stgmergy. Future Generaton Computer Systems, 000. 16(8): p. 851-871. [3] Aurs Mockus, R.T.F., James D. Herbsleb, Two case stues of open source software evelopment: Apache an Mozlla. ACM Trans. Softw. Eng. Methool., 00. 11(3): p. 37. [4] Yutaka Yamauch, M.Y., Takesh Shnohara, Toru Isha, Collaboraton wth Lean Mea: how open-source software succees. CSCW, 000: p. 9. [5] Xu, J., Y. Gao, an G. Maey. A Dockng Experment: Swarm an Repast for Socal Network Moelng. n Seventh Annual Swarm Researchers Meetng (Swarm003). 003. Notre Dame, IN. [6] Chrstley, S. an G. Maey, Collecton of Actvty Data for SourceForge Projects. 005, Dept. of Computer Scence an Engneerng, Unversty of Notre Dame: Notre Dame, IN. [7] K. Crowston, H.A., J. Howson, an C. Masango, Towars a Portfolo of FLOSS Project Success Measures, n The 4th Workshop on Open Source Software Engneerng, Internatonal Conference on Software Engneerng (ICSE) 004. [8] Hackman, J.R., The esgn of work teams. The Hanbook of Organzatonal Behavor, e. J.W. Lorsch. 1986, Englewoo Clffs, NJ: Prentce-Hall. [9] Nel Smth, A.C., Juan Fernánez-Raml, sers an Developers: An Agent-Base Smulaton of Open Source Software Evoluton, n SPW/ProSm 006. 006. [10] Jn Xu, Y.G., Scott Chrstley, Gregory R. Maey, A Topologcal Analyss of the Open Souce Software Development Communty n HICSS 005. 005. [11] 11. heylgen, F., e. Why s Open Access Development so Successful? Open Source Jahrbuch, e. M.B.R.A.G. B. Lutterbeck. 007, Lehmanns Mea. [1] Dorgo, M., Ant colony optmzaton an swarm ntellgence 4th Internatonal Workshop, ANTS 004, Brussels, Belgum, September 5-8, 004 : proceengs. Lecture notes n computer scence. 004, Berln: Sprnger. [13] Bankes, S., Exploratory Moelng for Polcy Analyss. Operatons Research, 1993. 41(3): p. 435-449. [14] Sargent, R.G. Verfcaton an Valaton of Smulaton Moels. n Proc. 003 of Wnter Smulaton Conference. 003. New Orleans, Lousana. [15] Macal, C.M. an M.J. North, Valaton of an Agent-Base Moel of Deregulate Electrc Power Markets, n North Amercan Assocaton for Computatonal an Socal Organzaton (NAACSOS) Conference. 005: Notre Dame, Inana. [16] Axtell, R., et al., Algnng smulaton moels: a case stuy an results. Computatonal an Mathematcal Organzaton Theory, 1995. 1(): p. 18. [17] Sncch, W.M.a.T., Statstcs for Engneers an the Scences. 1995, Upper Sale Rge, NJ: Prentce-Hall. [18] Voss, A.M.D.a.D.T., Desgn an Analyss of Experments. 1999, New York, NY: Sprnger-Verlag New York, Inc. REFERENCES [1] Ellott, M., Stgmergc Collaboraton: The Evoluton of Group Work. M/C Journal, 006. 9().