1 The Cathedral and the Bazaar Eric Steven Raymond cf text and copyright at: ~ esr/writings Abstract I anatomize a successful open-source project, fetchmail, that was run as a deliberate test of some surprising theories about so=ware engineering suggested by the history of Linux. I discuss these theories in terms of two fundamentally di:erent development styles, the cathedral model of most of the commercial world versus the bazaar model of the Linux world. I show that these models derive from opposing assumptions about the nature of the so=ware-debugging task. I then make a sustained argument from the Linux experience for the proposition that Given enough eyeballs, all bugs are shallow, suggest productive analogies with other self-correcting systems of selfish agents, and conclude with some exploration of the implications of this insight for the future of so=ware. 1 The Cathedral and the Bazaar Linux is subversive. Who would have thought even five years ago (1991) that a world-class operating system could coalesce as if by magic out of part-time hacking by several thousand developers scattered all over the planet, connected only by the tenuous strands of the Internet? Certainly not I. By the time Linux swam onto my radar screen in early 1993, I had already been involved in Unix and open-source development for ten years. I was one of the first gnu contributors in the mid-1980s. I had released a good deal of open-source so=ware onto the net, developing or co-developing several programs (nethack, Emacs s vc and gud modes, xlife, and others) that are still in wide use today. I thought I knew how it was done. Linux overturned much of what I thought I knew. I had been preaching the Unix gospel of small tools, rapid prototyping and evo- 1
2 2 2 THE MAIL MUST GET THROUGH lutionary programming for years. But I also believed there was a certain critical complexity above which a more centralized, a priori approach was required. I believed that the most important so=ware (operating systems and really large tools like the Emacs programming editor) needed to be built like cathedrals, carefully cra=ed by individual wizards or small bands of mages working in splendid isolation, with no beta to be released before its time. Linus Torvalds s style of development release early and o=en, delegate everything you can, be open to the point of promiscuity came as a surprise. No quiet, reverent cathedral-building here rather, the Linux community seemed to resemble a great babbling bazaar of di:ering agendas and approaches (aptly symbolized by the Linux archive sites, who d take submissions from anyone) out of which a coherent and stable system could seemingly emerge only by a succession of miracles. The fact that this bazaar style seemed to work, and work well, came as a distinct shock. As I learned my way around, I worked hard not just at individual projects, but also at trying to understand why the Linux world not only didn t fly apart in confusion but seemed to go from strength to strength at a speed barely imaginable to cathedral-builders. By mid-1996 I thought I was beginning to understand. Chance handed me a perfect way to test my theory, in the form of an opensource project that I could consciously try to run in the bazaar style. So I did and it was a significant success. This is the story of that project. I ll use it to propose some aphorisms about e:ective open-source development. Not all of these are things I first learned in the Linux world, but we ll see how the Linux world gives them particular point. If I m correct, they ll help you understand exactly what it is that makes the Linux community such a fountain of good so=ware and, perhaps, they will help you become more productive yourself. 2 The Mail Must Get Through Since 1993 I d been running the technical side of a small free-access Internet service provider called Chester County InterLink (ccil) in West Chester, Pennsylvania. I co-founded ccil and wrote our unique multiuser bulletin-board so=ware you can check it out by telnetting
3 3 to locke.ccil.org. Today it supports almost three thousand users on thirty lines. The job allowed me 24-hour-a-day access to the net through ccil s 56K line in fact, the job practically demanded it! I had gotten quite used to instant Internet . I found having to periodically telnet over to locke to check my mail annoying. What I wanted was for my mail to be delivered on snark (my home system) so that I would be notified when it arrived and could handle it using all my local tools. The Internet s native mail forwarding protocol, smtp (Simple Mail Transfer Protocol), wouldn t suit, because it works best when machines are connected full-time, while my personal machine isn t always on the net, and doesn t have a static ip address. What I needed was a program that would reach out over my intermittent dialup connection and pull across my mail to be delivered locally. I knew such things existed, and that most of them used a simple application protocol called pop (Post O?ce Protocol). pop is now widely supported by most common mail clients, but at the time, it wasn t built-in to the mail reader I was using. I needed a pop3 client. So I went out on the net and found one. Actually, I found three or four. I used one of them for a while, but it was missing what seemed an obvious feature, the ability to hack the addresses on fetched mail so replies would work properly. The problem was this: suppose someone named joe on locke sent me mail. If I fetched the mail to snark and then tried to reply to it, my mailer would cheerfully try to ship it to a nonexistent joe on snark. Hand-editing reply addresses to tack quickly got to be a serious pain. This was clearly something the computer ought to be doing for me. But none of the existing pop clients knew how! And this brings us to the first lesson: # 1 Every good work of so=ware starts by scratching a developer s personal itch. Perhaps this should have been obvious (it s long been proverbial that Necessity is the mother of invention ) but too o=en so=ware developers spend their days grinding away for pay at programs they neither need nor love. But not in the Linux world which may explain why the average quality of so=ware originated in the Linux community is so high.
4 4 2 THE MAIL MUST GET THROUGH So, did I immediately launch into a furious whirl of coding up a brand-new pop3 client to compete with the existing ones? Not on your life! I looked carefully at the pop utilities I had in hand, asking myself which one is closest to what I want? Because # 2 Good programmers know what to write. Great ones know what to rewrite (and reuse). While I don t claim to be a great programmer, I try to imitate one. An important trait of the great ones is constructive laziness. They know that you get an A not for e:ort but for results, and that it s almost always easier to start from a good partial solution than from nothing at all. Linus Torvalds, for example, didn t actually try to write Linux from scratch. Instead, he started by reusing code and ideas from Minix, a tiny Unix-like operating system for pc clones. Eventually all the Minix code went away or was completely rewritten but while it was there, it provided sca:olding for the infant that would eventually become Linux. In the same spirit, I went looking for an existing pop utility that was reasonably well coded, to use as a development base. The source-sharing tradition of the Unix world has always been friendly to code reuse (this is why the gnu project chose Unix as a base os, in spite of serious reservations about the os itself). The Linux world has taken this tradition nearly to its technological limit; it has terabytes of open sources generally available. So spending time looking for some else s almost-good-enough is more likely to give you good results in the Linux world than anywhere else. And it did for me. With those I d found earlier, my second search made up a total of nine candidates fetchpop, PopTart, get-mail, gwpop, pimp, pop-perl, popc, popmail and upop. The one I first settled on was fetchpop by Seung-Hong Oh. I put my header-rewrite feature in it, and made various other improvements which the author accepted into his 1.9 release. A few weeks later, though, I stumbled across the code for popclient by Carl Harris, and found I had a problem. Though fetchpop had some good original ideas in it (such as its backgrounddaemon mode), it could only handle pop3 and was rather amateurishly coded (Seung-Hong was at that time a bright but inexperienced programmer, and both traits showed). Carl s code was better, quite
5 5 professional and solid, but his program lacked several important and rather tricky-to-implement fetchpop features (including those I d coded myself). Stay or switch? If I switched, I d be throwing away the coding I d already done in exchange for a better development base. A practical motive to switch was the presence of multiple-protocol support. pop3 is the most commonly used of the post-o?ce server protocols, but not the only one. Fetchpop and the other competition didn t do pop2, rpop, or apop, and I was already having vague thoughts of perhaps adding imap (Internet Message Access Protocol, the most recently designed and most powerful post-o?ce protocol) just for fun. But I had a more theoretical reason to think switching might be as good an idea as well, something I learned long before Linux. # 3 Plan to throw one away; you will, anyhow. (Fred Brooks, The Mythical Man-Month, Chapter 11) Or, to put it another way, you o=en don t really understand the problem until a=er the first time you implement a solution. The second time, maybe you know enough to do it right. So if you want to get it right, be ready to start over at least once [jb]. Well (I told myself) the changes to fetchpop had been my first try. So I switched. A=er I sent my first set of popclient patches to Carl Harris on 25 June 1996, I found out that he had basically lost interest in popclient some time before. The code was a bit dusty, with minor bugs hanging out. I had many changes to make, and we quickly agreed that the logical thing for me to do was take over the program. Without my actually noticing, the project had escalated. No longer was I just contemplating minor patches to an existing pop client. I took on maintaining an entire one, and there were ideas bubbling in my head that I knew would probably lead to major changes. In a so=ware culture that encourages code-sharing, this is a natural way for a project to evolve. I was acting out this principle: # 4 If you have the right attitude, interesting problems will find you. But Carl Harris s attitude was even more important. He understood that
6 6 3 THE IMPORTANCE OF HAVING USERS # 5 When you lose interest in a program, your last duty to it is to hand it o: to a competent successor. Without ever having to discuss it, Carl and I knew we had a common goal of having the best solution out there. The only question for either of us was whether I could establish that I was a safe pair of hands. Once I did that, he acted with grace and dispatch. I hope I will do as well when it comes my turn. 3 The Importance of Having Users And so I inherited popclient. Just as importantly, I inherited popclient s user base. Users are wonderful things to have, and not just because they demonstrate that you re serving a need, that you ve done something right. Properly cultivated, they can become co-developers. Another strength of the Unix tradition, one that Linux pushes to a happy extreme, is that a lot of users are hackers too. Because source code is available, they can be e:ective hackers. This can be tremendously useful for shortening debugging time. Given a bit of encouragement, your users will diagnose problems, suggest fixes, and help improve the code far more quickly than you could unaided. # 6 Treating your users as co-developers is your least-hassle route to rapid code improvement and e:ective debugging. The power of this e:ect is easy to underestimate. In fact, pretty well all of us in the open-source world drastically underestimated how well it would scale up with number of users and against system complexity, until Linus Torvalds showed us di:erently. In fact, I think Linus s cleverest and most consequential hack was not the construction of the Linux kernel itself, but rather his invention of the Linux development model. When I expressed this opinion in his presence once, he smiled and quietly repeated something he has o=en said: I m basically a very lazy person who likes to get credit for things other people actually do. Lazy like a fox. Or, as Robert Heinlein famously wrote of one of his characters, too lazy to fail. In retrospect, one precedent for the methods and success of Linux can be seen in the development of the gnu Emacs Lisp library and
7 7 Lisp code archives. In contrast to the cathedral-building style of the Emacs C core and most other gnu tools, the evolution of the Lisp code pool was fluid and very user-driven. Ideas and prototype modes were o=en rewritten three or four times before reaching a stable final form. And loosely-coupled collaborations enabled by the Internet, a la Linux, were frequent. Indeed, my own most successful single hack previous to fetchmail was probably Emacs vc (version control) mode, a Linux-like collaboration by with three other people, only one of whom (Richard Stallman, the author of Emacs and founder of the Free So=ware Foundation) I have met to this day. It was a front- end for sccs, rcs and later cvs from within Emacs that o:ered one-touch version control operations. It evolved from a tiny, crude sccs.el mode somebody else had written. And the development of vc succeeded because, unlike Emacs itself, Emacs Lisp code could go through release/test/improve generations very quickly. 4 Release Early, Release O=en Early and frequent releases are a critical part of the Linux development model. Most developers (including me) used to believe this was bad policy for larger than trivial projects, because early versions are almost by definition buggy versions and you don t want to wear out the patience of your users. This belief reinforced the general commitment to a cathedralbuilding style of development. If the overriding objective was for users to see as few bugs as possible, why then you d only release a version every six months (or less o=en), and work like a dog on debugging between releases. The Emacs C core was developed this way. The Lisp library, in e:ect, was not because there were active Lisp archives outside the fsf s control, where you could go to find new and development code versions independently of Emacs s release cycle [qr]. The most important of these, the Ohio State elisp archive, anticipated the spirit and many of the features of today s big Linux archives. But few of us really thought very hard about what we were doing, or about what the very existence of that archive suggested about problems in the fsf s cathedral- building development model. I made one serious attempt around 1992 to get a lot of the Ohio code
8 8 4 RELEASE EARLY, RELEASE OFTEN formally merged into the o?cial Emacs Lisp library. I ran into political trouble and was largely unsuccessful. But by a year later, as Linux became widely visible, it was clear that something di:erent and much healthier was going on there. Linus s open development policy was the very opposite of cathedralbuilding. Linux s Internet archives were burgeoning, multiple distributions were being floated. And all of this was driven by an unheardof frequency of core system releases. Linus was treating his users as co-developers in the most e:ective possible way: # 7 Release early. Release o=en. And listen to your customers. Linus s innovation wasn t so much in doing quick-turnaround releases incorporating lots of user feedback (something like this had been Unix-world tradition for a long time), but in scaling it up to a level of intensity that matched the complexity of what he was developing. In those early times (around 1991) it wasn t unknown for him to release a new kernel more than once a day! Because he cultivated his base of co-developers and leveraged the Internet for collaboration harder than anyone else, this worked. But how did it work? And was it something I could duplicate, or did it rely on some unique genius of Linus Torvalds? I didn t think so. Granted, Linus is a damn fine hacker. How many of us could engineer an entire production-quality operating system kernel from scratch? But Linux didn t represent any awesome conceptual leap forward. Linus is not (or at least, not yet) an innovative genius of design in the way that, say, Richard Stallman or James Gosling (of NeWS and Java) are. Rather, Linus seems to me to be a genius of engineering and implementation, with a sixth sense for avoiding bugs and development dead-ends and a true knack for finding the minimum-e:ort path from point A to point B. Indeed, the whole design of Linux breathes this quality and mirrors Linus s essentially conservative and simplifying design approach. So, if rapid releases and leveraging the Internet medium to the hilt were not accidents but integral parts of Linus s engineeringgenius insight into the minimum-e:ort path, what was he maximizing? What was he cranking out of the machinery? Put that way, the question answers itself. Linus was keeping his hacker/users constantly stimulated and rewarded stimulated by
9 9 the prospect of having an ego-satisfying piece of the action, rewarded by the sight of constant (even daily) improvement in their work. Linus was directly aiming to maximize the number of personhours thrown at debugging and development, even at the possible cost of instability in the code and user-base burnout if any serious bug proved intractable. Linus was behaving as though he believed something like this: # 8 Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone. Or, less formally, Given enough eyeballs, all bugs are shallow. I dub this: Linus s Law. My original formulation was that every problem will be transparent to somebody. Linus demurred that the person who understands and fixes the problem is not necessarily or even usually the person who first characterizes it. Somebody finds the problem, he says, and somebody else understands it. And I ll go on record as saying that finding it is the bigger challenge. But the point is that both things tend to happen rapidly. Here, I think, is the core di:erence underlying the cathedralbuilder and bazaar styles. In the cathedral-builder view of programming, bugs and development problems are tricky, insidious, deep phenomena. It takes months of scrutiny by a dedicated few to develop confidence that you ve winkled them all out. Thus the long release intervals, and the inevitable disappointment when longawaited releases are not perfect. In the bazaar view, on the other hand, you assume that bugs are generally shallow phenomena or, at least, that they turn shallow pretty quickly when exposed to a thousand eager co-developers pounding on every single new release. Accordingly you release o=en in order to get more corrections, and as a beneficial side e:ect you have less to lose if an occasional botch gets out the door. And that s it. That s enough. If Linus s Law is false, then any system as complex as the Linux kernel, being hacked over by as many hands as the Linux kernel, should at some point have collapsed under the weight of unforseen bad interactions and undiscovered deep bugs. If it s true, on the other hand, it is su?cient to explain Linux s relative lack of bugginess and its continuous uptimes spanning months or even years.
10 10 4 RELEASE EARLY, RELEASE OFTEN Maybe it shouldn t have been such a surprise, at that. Sociologists years ago discovered that the averaged opinion of a mass of equally expert (or equally ignorant) observers is quite a bit more reliable a predictor than that of a single randomly-chosen one of the observers. They called this the Delphi e:ect. It appears that what Linus has shown is that this applies even to debugging an operating system that the Delphi e:ect can tame development complexity even at the complexity level of an os kernel. One special feature of the Linux situation that clearly helps along the Delphi e:ect is the fact that the contributors for any given project are self- selected. An early respondent pointed out that contributions are received not from a random sample, but from people who are interested enough to use the so=ware, learn about how it works, attempt to find solutions to problems they encounter, and actually produce an apparently reasonable fix. Anyone who passes all these filters is highly likely to have something useful to contribute. I am indebted to my friend Je: Dutky for pointing out that Linus s Law can be rephrased as Debugging is parallelizable. Je: observes that although debugging requires debuggers to communicate with some coordinating developer, it doesn t require significant coordination between debuggers. Thus it doesn t fall prey to the same quadratic complexity and management costs that make adding developers problematic. In practice, the theoretical loss of e?ciency due to duplication of work by debuggers almost never seems to be an issue in the Linux world. One e:ect of a release early and o=en policy is to minimize such duplication by propagating fed-back fixes quickly [jh]. Brooks (the author of The Mythical Man-Month ) even made an o:-hand observation related to Je: s: The total cost of maintaining a widely used program is typically 40 percent or more of the cost of developing it. Surprisingly this cost is strongly a:ected by the number of users. More users find more bugs. (my emphasis). More users find more bugs because adding more users adds more di:erent ways of stressing the program. This e:ect is amplified when the users are co- developers. Each one approaches the task of bug characterization with a slightly di:erent perceptual set and analytical toolkit, a di:erent angle on the problem. The Delphi e:ect seems to work precisely because of this variation. In the specific context of debugging, the variation also tends to reduce duplication of e:ort.
11 11 So adding more beta-testers may not reduce the complexity of the current deepest bug from the developer s point of view, but it increases the probability that someone s toolkit will be matched to the problem in such a way that the bug is shallow to that person. Linus coppers his bets, too. In case there are serious bugs, Linux kernel version are numbered in such a way that potential users can make a choice either to run the last version designated stable or to ride the cutting edge and risk bugs in order to get new features. This tactic is not yet formally imitated by most Linux hackers, but perhaps it should be; the fact that either choice is available makes both more attractive. [hbs] 5 When Is A Rose Not A Rose? Having studied Linus s behavior and formed a theory about why it was successful, I made a conscious decision to test this theory on my new (admittedly much less complex and ambitious) project. But the first thing I did was reorganize and simplify popclient a lot. Carl Harris s implementation was very sound, but exhibited a kind of unnecessary complexity common to many C programmers. He treated the code as central and the data structures as support for the code. As a result, the code was beautiful but the data structure design ad-hoc and rather ugly (at least by the high standards of this old lisp hacker). I had another purpose for rewriting besides improving the code and the data structure design, however. That was to evolve it into something I understood completely. It s no fun to be responsible for fixing bugs in a program you don t understand. For the first month or so, then, I was simply following out the implications of Carl s basic design. The first serious change I made was to add imap support. I did this by reorganizing the protocol machines into a generic driver and three method tables (for pop2, pop3, and imap). This and the previous changes illustrate a general principle that s good for programmers to keep in mind, especially in languages like C that don t naturally do dynamic typing: # 9 Smart data structures and dumb code works a lot better than the other way around.
12 12 5 WHEN IS A ROSE NOT A ROSE? Brooks, Chapter 9: Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won t usually need your [code]; it ll be obvious. Actually, he said flowcharts and tables. But allowing for thirty years of terminological/cultural shi=, it s almost the same point. At this point (early September 1996, about six weeks from zero) I started thinking that a name change might be in order a=er all, it wasn t just a pop client any more. But I hesitated, because there was as yet nothing genuinely new in the design. My version of popclient had yet to develop an identity of its own. That changed, radically, when fetchmail learned how to forward fetched mail to the smtp port. I ll get to that in a moment. But first: I said above that I d decided to use this project to test my theory about what Linus Torvalds had done right. How (you may well ask) did I do that? In these ways: I released early and o=en (almost never less o=en than every ten days; during periods of intense development, once a day). I grew my beta list by adding to it everyone who contacted me about fetchmail. I sent chatty announcements to the beta list whenever I released, encouraging people to participate. And I listened to my beta testers, polling them about design decisions and stroking them whenever they sent in patches and feedback. The payo: from these simple measures was immediate. From the beginning of the project, I got bug reports of a quality most developers would kill for, o=en with good fixes attached. I got thoughtful criticism, I got fan mail, I got intelligent feature suggestions. Which leads to: # 10 If you treat your beta-testers as if they re your most valuable resource, they will respond by becoming your most valuable resource. One interesting measure of fetchmail s success is the sheer size of the project beta list, fetchmail-friends. At the time of last revision (August 2000) it has 249 members and is adding two or three a week. Actually, as I revise in late May 1997 the list is beginning to lose members from its high of close to 300 for an interesting reason.
13 13 Several people have asked me to unsubscribe them because fetchmail is working so well for them that they no longer need to see the list tra?c! Perhaps this is part of the normal life-cycle of a mature bazaar-style project. 6 Popclient becomes Fetchmail The real turning point in the project was when Harry Hochheiser sent me his scratch code for forwarding mail to the client machine s smtp port. I realized almost immediately that a reliable implementation of this feature would make all the other mail delivery modes next to obsolete. For many weeks I had been tweaking fetchmail rather incrementally while feeling like the interface design was serviceable but grubby inelegant and with too many exiguous options hanging out all over. The options to dump fetched mail to a mailbox file or standard output particularly bothered me, but I couldn t figure out why. (If you don t care about the technicalia of Internet mail, the next two paragraphs can be safely skipped.) What I saw when I thought about smtp forwarding was that popclient had been trying to do too many things. It had been designed to be both a mail transport agent (mta) and a local delivery agent (mda). With smtp forwarding, it could get out of the mda business and be a pure mta, handing o: mail to other programs for local delivery just as sendmail does. Why mess with all the complexity of configuring a mail delivery agent or setting up lock-and-append on a mailbox when port 25 is almost guaranteed to be there on any platform with tcp/ip support in the first place? Especially when this means retrieved mail is guaranteed to look like normal sender-initiated smtp mail, which is really what we want anyway. (Back to a higher level...) Even if you didn t follow the preceding technical jargon, there are several important lessons here. First, this smtp-forwarding concept was the biggest single payo: I got from consciously trying to emulate Linus s methods. A user gave me this terrific idea all I had to do was understand the implications.
14 14 6 POPCLIENT BECOMES FETCHMAIL # 11 The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better. Interestingly enough, you will quickly find that if you are completely and self- deprecatingly truthful about how much you owe other people, the world at large will treat you like you did every bit of the invention yourself and are just being becomingly modest about your innate genius. We can all see how well this worked for Linus! (When I gave my talk at the Perl conference in August 1997, hacker extraordinaire Larry Wall was in the front row. As I got to the last line above he called out, religious-revival style, Tell it, tell it, brother!. The whole audience laughed, because they knew this had worked for the inventor of Perl, too.) A=er a very few weeks of running the project in the same spirit, I began to get similar praise not just from my users but from other people to whom the word leaked out. I stashed away some of that ; I ll look at it again sometime if I ever start wondering whether my life has been worthwhile : ). But there are two more fundamental, non-political lessons here that are general to all kinds of design. # 12 O=en, the most striking and innovative solutions come from realizing that your concept of the problem was wrong. I had been trying to solve the wrong problem by continuing to develop popclient as a combined mta/mda with all kinds of funky local delivery modes. Fetchmail s design needed to be rethought from the ground up as a pure mta, a part of the normal smtp-speaking Internet mail path. When you hit a wall in development when you find yourself hard put to think past the next patch it s o=en time to ask not whether you ve got the right answer, but whether you re asking the right question. Perhaps the problem needs to be reframed. Well, I had reframed my problem. Clearly, the right thing to do was (1) hack smtp forwarding support into the generic driver, (2) make it the default mode, and (3) eventually throw out all the other delivery modes, especially the deliver-to-file and deliver-to-standardoutput options. I hesitated over step 3 for some time, fearing to upset long-time popclient users dependent on the alternate delivery mechanisms.
15 15 In theory, they could immediately switch to.forward files or their non-sendmail equivalents to get the same e:ects. In practice the transition might have been messy. But when I did it, the benefits proved huge. The cru=iest parts of the driver code vanished. Configuration got radically simpler no more grovelling around for the system mda and user s mailbox, no more worries about whether the underlying os supports file locking. Also, the only way to lose mail vanished. If you specified delivery to a file and the disk got full, your mail got lost. This can t happen with smtp forwarding because your smtp listener won t return ok unless the message can be delivered or at least spooled for later delivery. Also, performance improved (though not so you d notice it in a single run). Another not insignificant benefit of this change was that the manual page got a lot simpler. Later, I had to bring delivery via a user-specified local mda back in order to allow handling of some obscure situations involving dynamic slip. But I found a much simpler way to do it. The moral? Don t hesitate to throw away superannuated features when you can do it without loss of e:ectiveness. Antoine de Saint- Exupéry (who was an aviator and aircra= designer when he wasn t being the author of classic children s books) said: # 13 Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away. When your code is getting both better and simpler, that is when you know it s right. And in the process, the fetchmail design acquired an identity of its own, di:erent from the ancestral popclient. It was time for the name change. The new design looked much more like a dual of sendmail than the old popclient had; both are mtas, but where sendmail pushes then delivers, the new popclient pulls then delivers. So, two months o: the blocks, I renamed it fetchmail. There is a more general lesson in this story about how smtp delivery came to fetchmail. It is not only debugging that is parallelizable; development and (to a perhaps surprising extent) exploration of design space is, too. When your development mode is rapidly iterative, development and enhancement may become special cases of debugging fixing bugs of omission in the original capabilities or concept of the so=ware.
16 16 7 FETCHMAIL GROWS UP Even at a higher level of design, it can be very valuable to have the thinking of lots of co-developers random-walking through the design space near your product. Consider the way a puddle of water finds a drain, or better yet how ants find food: exploration essentially by di:usion, followed by exploitation mediated by a scalable communication mechanism. This works very well; as with Harry Hochheiser and me, one of your outriders may well find a huge win nearby that you were just a little too close-focused to see. 7 Fetchmail Grows Up There I was with a neat and innovative design, code that I knew worked well because I used it every day, and a burgeoning beta list. It gradually dawned on me that I was no longer engaged in a trivial personal hack that might happen to be useful to few other people. I had my hands on a program every hacker with a Unix box and a slip/ppp mail connection really needs. With the smtp forwarding feature, it pulled far enough in front of the competition to potentially become a category killer, one of those classic programs that fills its niche so competently that the alternatives are not just discarded but almost forgotten. I think you can t really aim or plan for a result like this. You have to get pulled into it by design ideas so powerful that a=erward the results just seem inevitable, natural, even foreordained. The only way to try for ideas like that is by having lots of ideas or by having the engineering judgment to take other peoples good ideas beyond where the originators thought they could go. Andy Tanenbaum had the original idea to build a simple native Unix for ibm pcs, for use as a teaching tool (he called it Minix). Linus Torvalds pushed the Minix concept further than Andrew probably thought it could go and it grew into something wonderful. In the same way (though on a smaller scale), I took some ideas by Carl Harris and Harry Hochheiser and pushed them hard. Neither of us was original in the romantic way people think is genius. But then, most science and engineering and so=ware development isn t done by original genius, hacker mythology to the contrary. The results were pretty heady stu: all the same in fact, just the kind of success every hacker lives for! And they meant I would have to set my standards even higher. To make fetchmail as good as I
17 17 now saw it could be, I d have to write not just for my own needs, but also include and support features necessary to others but outside my orbit. And do that while keeping the program simple and robust. The first and overwhelmingly most important feature I wrote a=er realizing this was multidrop support the ability to fetch mail from mailboxes that had accumulated all mail for a group of users, and then route each piece of mail to its individual recipients. I decided to add the multidrop support partly because some users were clamoring for it, but mostly because I thought it would shake bugs out of the single-drop code by forcing me to deal with addressing in full generality. And so it proved. Getting rfc 822 address parsing right took me a remarkably long time, not because any individual piece of it is hard but because it involved a pile of interdependent and fussy details. But multidrop addressing turned out to be an excellent design decision as well. Here s how I knew: # 14 Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected. The unexpected use for multi-drop fetchmail is to run mailing lists with the list kept, and alias expansion done, on the client side of the Internet connection. This means someone running a personal machine through an isp account can manage a mailing list without continuing access to the isp s alias files. Another important change demanded by my beta testers was support for 8-bit mime (Multipurpose Internet Mail Extensions) operation. This was pretty easy to do, because I had been careful to keep the code 8-bit clean. Not because I anticipated the demand for this feature, but rather in obedience to another rule: # 15 When writing gateway so=ware of any kind, take pains to disturb the data stream as little as possible and *never* throw away information unless the recipient forces you to! Had I not obeyed this rule, 8-bit mime support would have been di?cult and buggy. As it was, all I had to do is read the mime standard (rfc 1652) and add a trivial bit of header-generation logic. Some European users bugged me into adding an option to limit the number of messages retrieved per session (so they can control costs from their expensive phone networks). I resisted this for a long
18 18 8 A FEW MORE LESSONS FROM FETCHMAIL time, and I m still not entirely happy about it. But if you re writing for the world, you have to listen to your customers this doesn t change just because they re not paying you in money. 8 A Few More Lessons From Fetchmail Before we go back to general so=ware-engineering issues, there are a couple more specific lessons from the fetchmail experience to ponder. Nontechnical readers can safely skip this section. The rc (control) file syntax includes optional noise keywords that are entirely ignored by the parser. The English-like syntax they allow is considerably more readable than the traditional terse keywordvalue pairs you get when you strip them all out. These started out as a late-night experiment when I noticed how much the rc file declarations were beginning to resemble an imperative minilanguage. (This is also why I changed the original popclient server keyword to poll ). It seemed to me that trying to make that imperative minilanguage more like English might make it easier to use. Now, although I m a convinced partisan of the make it a language school of design as exemplified by Emacs and html and many database engines, I am not normally a big fan of English-like syntaxes. Traditionally programmers have tended to favor control syntaxes that are very precise and compact and have no redundancy at all. This is a cultural legacy from when computing resources were expensive, so parsing stages had to be as cheap and simple as possible. English, with about 50% redundancy, looked like a very inappropriate model then. This is not my reason for normally avoiding English-like syntaxes; I mention it here only to demolish it. With cheap cycles and core, terseness should not be an end in itself. Nowadays it s more important for a language to be convenient for humans than to be cheap for the computer. There remain, however, good reasons to be wary. One is the complexity cost of the parsing stage you don t want to raise that to the point where it s a significant source of bugs and user confusion in itself. Another is that trying to make a language syntax Englishlike o=en demands that the English it speaks be bent seriously out of shape, so much so that the superficial resemblance to natural
19 19 language is as confusing as a traditional syntax would have been. (You see this bad e:ect in a lot of so-called fourth generation and commercial database-query languages.) The fetchmail control syntax seems to avoid these problems because the language domain is extremely restricted. It s nowhere near a general-purpose language; the things it says simply are not very complicated, so there s little potential for confusion in moving mentally between a tiny subset of English and the actual control language. I think there may be a wider lesson here: # 16 When your language is nowhere near Turing-complete, syntactic sugar can be your friend. Another lesson is about security by obscurity. Some fetchmail users asked me to change the so=ware to store passwords encrypted in the rc file, so snoopers wouldn t be able to casually see them. I didn t do it, because this doesn t actually add protection. Anyone who s acquired permissions to read your rc file will be able to run fetchmail as you anyway and if it s your password they re a=er, they d be able to rip the necessary decoder out of the fetchmail code itself to get it. All.fetchmailrc password encryption would have done is give a false sense of security to people who don t think very hard. The general rule here is: # 17 A security system is only as secure as its secret. Beware of pseudo-secrets. 9 Necessary Preconditions for the Bazaar Style Early reviewers and test audiences for this paper consistently raised questions about the preconditions for successful bazaar-style development, including both the qualifications of the project leader and the state of code at the time one goes public and starts to try to build a co-developer community. It s fairly clear that one cannot code from the ground up in bazaar style [in]. One can test, debug and improve in bazaar style, but it would be very hard to originate a project in bazaar mode. Linus didn t try it. I didn t either. Your nascent developer community needs to have something runnable and testable to play with.
20 20 9 NECESSARY PRECONDITIONS FOR THE BAZAAR STYLE When you start community-building, what you need to be able to present is a plausible promise. Your program doesn t have to work particularly well. It can be crude, buggy, incomplete, and poorly documented. What it must not fail to do is (a) run, and (b) convince potential co-developers that it can be evolved into something really neat in the foreseeable future. Linux and fetchmail both went public with strong, attractive basic designs. Many people thinking about the bazaar model as I have presented it have correctly considered this critical, then jumped from it to the conclusion that a high degree of design intuition and cleverness in the project leader is indispensable. But Linus got his design from Unix. I got mine initially from the ancestral popclient (though it would later change a great deal, much more proportionately speaking than has Linux). So does the leader/ coordinator for a bazaar-style e:ort really have to have exceptional design talent, or can he get by on leveraging the design talent of others? I think it is not critical that the coordinator be able to originate designs of exceptional brilliance, but it is absolutely critical that the coordinator be able to recognize good design ideas from others. Both the Linux and fetchmail projects show evidence of this. Linus, while not (as previously discussed) a spectacularly original designer, has displayed a powerful knack for recognizing good design and integrating it into the Linux kernel. And I have already described how the single most powerful design idea in fetchmail (smtp forwarding) came from somebody else. Early audiences of this paper complimented me by suggesting that I am prone to undervalue design originality in bazaar projects because I have a lot of it myself, and therefore take it for granted. There may be some truth to this; design (as opposed to coding or debugging) is certainly my strongest skill. But the problem with being clever and original in so=ware design is that it gets to be a habit you start reflexively making things cute and complicated when you should be keeping them robust and simple. I have had projects crash on me because I made this mistake, but I managed not to with fetchmail. So I believe the fetchmail project succeeded partly because I restrained my tendency to be clever; this argues (at least) against design originality being essential for successful bazaar projects. And consider Linux. Suppose Linus Torvalds had been trying to pull o:
On System Design Jim Waldo On System Design Jim Waldo Perspectives 2006-6 In an Essay Series Published by Sun Labs December 2006 This work first appeared as part of the OOPSLA 2006 Essays track, October
Why Johnny Can t Encrypt: A Usability Evaluation of PGP 5.0 Alma Whitten School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 firstname.lastname@example.org J. D. Tygar 1 EECS and SIMS University
How to be a Programmer: A Short, Comprehensive, and Personal Summary by Robert L. Read How to be a Programmer: A Short, Comprehensive, and Personal Summary by Robert L. Read Published 2002 Copyright 2002,
In Security and Usability: Designing Secure Systems that People Can Use, eds. L. Cranor and G. Simson. O'Reilly, 2005, pp. 679-702 CHAPTER THIRTY- FOUR Why Johnny Can t Encrypt A Usability Evaluation of
LAW S ORDER This page intentionally left blank LAW S ORDER WHAT ECONOMICS HAS TO DO WITH LAW AND WHY IT MATTERS David D. Friedman PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Copyright 2000 by Princeton
How to Make Software An introduction to software development by Nick Jenkins Nick Jenkins, 2005 http://www.nickjenkins.net This work is licensed under the Creative Commons (Attribution-NonCommercial-ShareAlike)
When Accountability Knocks, Will Anyone Answer? Charles Abelmann Richard Elmore with Johanna Even Susan Kenyon Joanne Marshall CPRE Research Report Series RR-42 Consortium for Policy Research in Education
Massachusetts Institute of Technology Artificial Intelligence Laboratory AI Working Paper 316 October, 1988 How to do Research At the MIT AI Lab by a whole bunch of current, former, and honorary MIT AI
Making Smart IT Choices Understanding Value and Risk in Government IT Investments Sharon S. Dawes Theresa A. Pardo Stephanie Simon Anthony M. Cresswell Mark F. LaVigne David F. Andersen Peter A. Bloniarz
NOVEMBER 2005 OLIVER BAKEWELL & ANNE GARBUTT SEKA Resultatredovisningsprojekt The use and abuse of the logical framework approach INTRODUCTION The logical framework approach (LFA) has come to play a central
SOFTWARE ENGINEERING Report on a conference sponsored by the NATO SCIENCE COMMITTEE Garmisch, Germany, 7th to 11th October 1968 Chairman: Professor Dr. F. L. Bauer Co-chairmen: Professor L. Bolliet, Dr.
THE PH.D. GRIND A Ph.D. Student Memoir Philip J. Guo email@example.com http://www.pgbovine.net/phd-memoir.htm Current release: July 16, 2012 Original release: June 29, 2012 To everyone who aspires to
DOI 10.1007/s10734-007-9065-5 Is that paper really due today? : differences in first-generation and traditional college students understandings of faculty expectations Peter J. Collier Æ David L. Morgan
by BRIAN CLARK Founder of Copyblogger & Scribe HOW TO CREATE COMPELLING CONTENT THAT RANKS WELL IN SEARCH ENGINES Once upon a time, there was something called SEO copywriting. These SEO copywriters seemed
A First Encounter with Machine Learning Max Welling Donald Bren School of Information and Computer Science University of California Irvine November 4, 2011 2 Contents Preface Learning and Intuition iii
The Science of Engagement An exploration into the true nature of engagement - what it means and what causes it. Grounded in science, not fiction. Foreword For a while now, we at Weber Shandwick have been
FREE ONLINE EDITION (non-printable free online version) If you like the book, please support the author and InfoQ by purchasing the printed book: http://www.lulu.com/content/899349 (only $22.95) Brought
LONGMAN COMMUNICATION 3000 1 Longman Communication 3000 The Longman Communication 3000 is a list of the 3000 most frequent words in both spoken and written English, based on statistical analysis of the
Getting Things Done: The Science behind Stress-Free Productivity Francis Heylighen and Clément Vidal ECCO - Evolution, Complexity and Cognition research group Vrije Universiteit Brussel (Free University
HANDBOOK FOR NEW EMPLOYEES ============================================================ HANDBOOK FOR NEW EMPLOYEES ======================================================== A fearless adventure in knowing
Donau-Universität Krems Zentrum für praxisorientierte Informatik Dr. Karl-Dorrek Straße 30 A-3500 Krems/Donau Understanding a hacker s mind A psychological insight into the hijacking of identities a White
Why don t they just leave? (The Booklet) Written by Brian Fox - 1 - Forward Pioneering a Different Future During the past five years I have come to learn more about domestic violence and abuse than I ever