What Goes Around Comes Around

Size: px
Start display at page:

Download "What Goes Around Comes Around"


1 What Goes Around Comes Around Michael Stonebraker Joseph M. Hellerstein Abstract This paper provides a summary of 35 years of data model proposals, grouped into 9 different eras. We discuss the proposals of each era, and show that there are only a few basic data modeling ideas, and most have been around a long time. Later proposals inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a worthwhile exercise to study previous proposals. In addition, we present the lessons learned from the exploration of the proposals in each era. Most current researchers were not around for many of the previous eras, and have limited (if any) understanding of what was previously learned. There is an old adage that he who does not understand history is condemned to repeat it. By presenting ancient history, we hope to allow future researchers to avoid replaying history. Unfortunately, the main proposal in the current XML era bears a striking resemblance to the CODASYL proposal from the early 1970 s, which failed because of its complexity. Hence, the current era is replaying history, and what goes around comes around. Hopefully the next era will be smarter. I Introduction Data model proposals have been around since the late 1960 s, when the first author came on the scene. Proposals have continued with surprising regularity for the intervening 35 years. Moreover, many of the current day proposals have come from researchers too young to have learned from the discussion of earlier ones. Hence, the purpose of this paper is to summarize 35 years worth of progress and point out what should be learned from this lengthy exercise. We present data model proposals in nine historical epochs: Hierarchical (IMS): late 1960 s and 1970 s Network (CODASYL): 1970 s Relational: 1970 s and early 1980 s Entity-Relationship: 1970 s Extended Relational: 1980 s Semantic: late 1970 s and 1980 s Object-oriented: late 1980 s and early 1990 s Object-relational: late 1980 s and early 1990 s

2 What Goes Around Comes Around 3 Semi-structured (XML): late 1990 s to the present In each case, we discuss the data model and associated query language, using a neutral notation. Hence, we will spare the reader the idiosyncratic details of the various proposals. We will also attempt to use a uniform collection of terms, again in an attempt to limit the confusion that might otherwise occur. Throughout much of the paper, we will use the standard example of suppliers and parts, from [CODD70], which we write for now in relational form in Figure 1. Supplier (sno, sname, scity, sstate) Part (pno, pname, psize, pcolor) Supply (sno, pno, qty, price) A Relational Schema Figure 1 Here we have Supplier information, Part information and the Supply relationship to indicate the terms under which a supplier can supply a part. II IMS Era IMS was released around 1968, and initially had a hierarchical data model. It understood the notion of a record type, which is a collection of named fields with their associated data types. Each instance of a record type is forced to obey the data description indicated in the definition of the record type. Furthermore, some subset of the named fields must uniquely specify a record instance, i.e. they are required to be a key. Lastly, the record types must be arranged in a tree, such that each record type (other than the root) has a unique parent record type. An IMS data base is a collection of instances of record types, such that each instance, other than root instances, has a single parent of the correct record type. This requirement of tree-structured data presents a challenge for our sample data, because we are forced to structure it in one of the two ways indicated in Figure 2. These representations share two common undesirable properties: 1) Information is repeated. In the first schema, Part information is repeated for each Supplier who supplies the part. In the second schema, Supplier information is repeated for each part he supplies. Repeated information is undesirable, because it offers the possibility for inconsistent data. For example, a repeated data element could be changed in some, but not all, of the places it appears, leading to an inconsistent data base. 2) Existence depends on parents. In the first schema it is impossible for there to be a part that is not currently supplied by anybody. In the second schema, it is impossible to have a supplier which does not currently supply anything. There is no support for these corner cases in a strict hierarchy.

3 4 Chapter 1: Data Models and DBMS Architecture Supplier (sno, sname, scity, sstate) Part (pno, pname, psize, pcolor) Part (pno, pname, psize, pcolor, qty, price) Supplier (sno, sname, scity, sstate, qty, price) Two Hierarchical Organizations Figure 2 IMS chose a hierarchical data base because it facilitates a simple data manipulation language, DL/1. Every record in an IMS data base has a hierarchical sequence key (HSK). Basically, an HSK is derived by concatenating the keys of ancestor records, and then adding the key of the current record. HSK defines a natural order of all records in an IMS data base, basically depth-first, left-to-right. DL/1 intimately used HSK order for the semantics of commands. For example, the get next command returns the next record in HSK order. Another use of HSK order is the get next within parent command, which explores the subtree underneath a given record in HSK order. Using the first schema, one can find all the red parts supplied by Supplier 16 as: Get unique Supplier (sno = 16) Until failure do Get next within parent (color = red) Enddo The first command finds Supplier 16. Then we iterate through the subtree underneath this record in HSK order, looking for red parts. When the subtree is exhausted, an error is returned. Notice that DL/1 is a record-at-a-time language, whereby the programmer constructs an algorithm for solving his query, and then IMS executes this algorithm. Often there are multiple ways to solve a query. Here is another way to solve the above specification:

4 What Goes Around Comes Around 5 Until failure do Get next Part (color = red) Enddo Although one might think that the second solution is clearly inferior to the first one; in fact if there is only one supplier in the data base (number 16), the second solution will outperform the first. The DL/1 programmer must make such optimization tradeoffs. IMS supported four different storage formats for hierarchical data. Basically root records can either be: Stored sequentially Indexed in a B-tree using the key of the record Hashed using the key of the record Dependent records are found from the root using either Physical sequentially Various forms of pointers. Some of the storage organizations impose restrictions on DL/1 commands. For example the purely sequential organization will not support record inserts. Hence, it is appropriate only for batch processing environments in which a change list is sorted in HSK order and then a single pass of the data base is made, the changes inserted in the correct place, and a new data base written. This is usually referred to as old-master-new-master processing. In addition, the storage organization that hashes root records on a key cannot support get next, because it has no easy way to return hashed records in HSK order. These various quirks in IMS are designed to avoid operations that would have impossibly bad performance. However, this decision comes at a price: One cannot freely change IMS storage organizations to tune a data base application because there is no guarantee that the DL/1 programs will continue to run. The ability of a data base application to continue to run, regardless of what tuning is performed at the physical level will be called physical data independence. Physical data independence is important because a DBMS application is not typically written all at once. As new programs are added to an application, the tuning demands may change, and better DBMS performance could be achieved by changing the storage organization. IMS has chosen to limit the amount of physical data independence that is possible. In addition, the logical requirements of an application may change over time. New record types may be added, because of new business requirements or because of new government requirements. It may also be desirable to move certain data elements from one record type to another. IMS supports a certain level of logical data independence, because DL/1 is actually defined on a logical data base, not on the actual physical data base that is stored. Hence, a DL/1 program can be written initially by defining the logical

5 6 Chapter 1: Data Models and DBMS Architecture data base to be exactly same as the physical data base. Later, record types can be added to the physical data base, and the logical data base redefined to exclude them. Hence, an IMS data base can grow with new record types, and the initial DL/1 program will continue to operate correctly. In general, an IMS logical data base can be a subtree of a physical data base. It is an excellent idea to have the programmer interact with a logical abstraction of the data, because this allows the physical organization to change, without compromising the runability of DL/1 programs. Logical and physical data independence are important because DBMS application have a much longer lifetime (often a quarter century or more) than the data on which they operate. Data independence will allow the data to change without requiring costly program maintenance. One last point should be made about IMS. Clearly, our sample data is not amenable to a tree structured representation as noted earlier. Hence, there was quickly pressure on IMS to represent our sample data without the redundancy or dependencies mentioned above. IMS responded by extending the notion of logical data bases beyond what was just described. Supplier (sno, sname, scity, sstate) Part (pno, pname, psize, pcolor) Supply (pno, qty, price) Two IMS Physical Data Bases Figure 3 Suppose one constructs two physical data bases, one containing only Part information and the second containing Supplier and Supply information as shown in the diagram of Figure 3. Of course, DL/1 programs are defined on trees; hence they cannot be used directly on the structures of Figure 3. Instead, IMS allowed the definition of the logical data base shown in Figure 4. Here, the Supply and Part record types from two different data bases are fused (joined) on the common value of part number into the hierarchical structure shown.

6 What Goes Around Comes Around 7 Basically, the structure of Figure 3 is actually stored, and one can note that there is no redundancy and no bad existence dependencies in this structure. The programmer is presented with the hierarchical view shown in Figure 4, which supports standard DL/1 programs. Supplier (sno, sname, scity, sstate) Supply(pno, qty, price) Part (pno, pname, psize, pcolor) An IMS Logical Data Base Figure 4 Speaking generally, IMS allow two different tree-structured physical data bases to be grafted together into a logical data base. There are many restrictions (for example in the use of the delete command) and considerable complexity to this use of logical data bases, but it is a way to represent non-tree structured data in IMS. The complexity of these logical data bases will be presently seen to be pivotial in determining how IBM decided to support relational data bases a decade later. We will summarize the lessons learned so far, and then turn to the CODASYL proposal. Lesson 1: Physical and logical data independence are highly desirable Lesson 2: Tree structured data models are very restrictive Lesson 3: It is a challenge to provide sophisticated logical reorganizations of tree structured data Lesson 4: A record-at-a-time user interface forces the programmer to do manual query optimization, and this is often hard.

7 8 Chapter 1: Data Models and DBMS Architecture III CODASYL Era In 1969 the CODASYL (Committee on Data Systems Languages) committee released their first report [CODA69], and then followed in 1971 [CODA71] and 1973 [CODA73] with language specifications. CODASYL was an ad-hoc committee that championed a network data model along with a record-at-a-time data manipulation language. This model organized a collection of record types, each with keys, into a network, rather than a tree. Hence, a given record instance could have multiple parents, rather than a single one, as in IMS. As a result, our Supplier-Parts-Supply example could be represented by the CODASYL network of Figure 5. Supplier (sno, sname, scity, sstate) Part (pno, pname, psize, pcolor) Supplies Supplied_by Supply(qty, price) A CODASYL Network Figure 5 Here, we notice three record types arranged in a network, connected by two named arcs, called Supplies and Supplied_by. A named arc is called a set in CODASYL, though it is not technically a set at all. Rather it indicates that for each record instance of the owner record type (the tail of the arrow) there is a relationship with zero or more record instances of the child record type (the head of the arrow). As such, it is a 1-to-n relationship between owner record instances and child record instances. A CODASYL network is a collection of named record types and named set types that form a connected graph. Moreover, there must be at least one entry point (a record type that is not a child in any set). A CODASYL data base is a collection of record instances and set instances that obey this network description.

8 What Goes Around Comes Around 9 Notice that Figure 5 does not have the existence dependencies present in a hierarchical data model. For example, it is ok to have a part that is not supplied by anybody. This will merely be an empty instance of the Supplied_by set. Hence, the move to a network data model solves many of the restrictions of a hierarchy. However, there are still situations that are hard to model in CODASYL. Consider, for example, data about a marriage ceremony, which is a 3-way relationship between a bride, a groom, and a minister. Because CODASYL sets are only two-way relationships, one is forced into the data model indicated in Figure 6. Bride Groom Participates-1 Participates-2 Ceremony Participates-3 Minister A CODASYL Solution Figure 6 This solution requires three binary sets to express a three-way relationship, and is somewhat unnatural. Although much more flexible than IMS, the CODASYL data model still had limitations. The CODASYL data manipulation language is a record-at-a-time language whereby one enters the data base at an entry point and then navigates to desired data by following sets. To find the red parts supplied by Supplier 16 in CODASYL, one can use the following code:

9 10 Chapter 1: Data Models and DBMS Architecture Find Supplier (SNO = 16) Until no-more { Find next Supply record in Supplies Find owner Part record in Supplied_by Get current record -check for red } One enters the data base at supplier 16, and then iterates over the members of the Supplies set. This will yield a collection of Supply records. For each one, the owner in the Supplied_by set is identified, and a check for redness performed. The CODASYL proposal suggested that the records in each entry point be hashed on the key in the record. Several implementations of sets were proposed that entailed various combinations of pointers between the parent records and child records. The CODASYL proposal provided essentially no physical data independence. For example, the above program fails if the key (and hence the hash storage) of the Supplier record is changed from sno to something else. In addition, no logical data independence is provided, since the schema cannot change without affecting application programs. The move to a network model has the advantage that no kludges are required to implement graph-structured data, such as our example. However, the CODASYL model is considerably more complex than the IMS data model. In IMS a programmer navigates in a hierarchical space, while a CODASYL programmer navigates in a multi-dimensional hyperspace. In IMS the programmer must only worry about his current position in the data base, and the position of a single ancestor (if he is doing a get next within parent ). In contrast, a CODASYL programmer must keep track of the: The last record touched by the application The last record of each record type touched The last record of each set type touched The various CODASYL DML commands update these currency indicators. Hence, one can think of CODASYL programming as moving these currency indicators around a CODASYL data base until a record of interest is located. Then, it can be fetched. In addition, the CODASYL programmer can suppress currency movement if he desires. Hence, one way to think of a CODASYL programmer is that he should program looking at a wall map of the CODASYL network that is decorated with various colored pins indicating currency. In his 1973 Turing Award lecture, Charlie Bachmann called this navigating in hyperspace [BACH73].

10 What Goes Around Comes Around 11 Hence, the CODASYL proposal trades increased complexity for the possibility of easily representing non-hierarchical data. CODASYL offers poorer logical and physical data independence than IMS. There are also some more subtle issues with CODASYL. For example, in IMS each data base could be independently bulk-loaded from an external data source. However, in CODASYL, all the data was typically in one large network. This much larger object had to be bulk-loaded all at once, leading to very long load times. Also, if a CODASYL data base became corrupted, it was necessary to reload all of it from a dump. Hence, crash recovery tended to be more involved than if the data was divided into a collection of independent data bases. In addition, a CODASYL load program tended to be complex because large numbers of records had to be assembled into sets, and this usually entailed many disk seeks. As such, it was usually important to think carefully about the load algorithm to optimize performance. Hence, there was no general purpose CODASYL load utility, and each installation had to write its own. This complexity was much less important in IMS. Hence, the lessons learned in CODASYL were: Lesson 5: Networks are more flexible than hierarchies but more complex Lesson 6: Loading and recovering networks is more complex than hierarchies IV Relational Era Against this backdrop, Ted Codd proposed his relational model in 1970 [CODD70]. In a conversation with him years later, he indicated that the driver for his research was the fact that IMS programmers were spending large amounts of time doing maintenance on IMS applications, when logical or physical changes occurred. Hence, he was focused on providing better data independence. His proposal was threefold: Store the data in a simple data structure (tables) Access it through a high level set-at-a-time DML No need for a physical storage proposal With a simple data structure, one has a better change of providing logical data independence. With a high level language, one can provide a high degree of physical data independence. Hence, there is no need to specify a storage proposal, as was required in both IMS and CODASYL. Moreover, the relational model has the added advantage that it is flexible enough to represent almost anything. Hence, the existence dependencies that plagued IMS can be easily handled by the relational schema shown earlier in Figure 1. In addition, the three-

11 12 Chapter 1: Data Models and DBMS Architecture way marriage ceremony that was difficult in CODASYL is easily represented in the relational model as: Ceremony (bride-id, groom-id, minister-id, other-data) Codd made several (increasingly sophisticated) relational model proposals over the years [CODD79, CODDXX]. Moreover, his early DML proposals were the relational calculus (data language/alpha) [CODD71a] and the relational algebra [CODD72a]. Since Codd was originally a mathematician (and previously worked on cellular automata), his DML proposals were rigorous and formal, but not necessarily easy for mere mortals to understand. Codd s proposal immediately touched off the great debate, which lasted for a good part of the 1970 s. This debate raged at SIGMOD conferences (and it predecessor SIGFIDET). On the one side, there was Ted Codd and his followers (mostly researchers and academics) who argued the following points: a) Nothing as complex as CODASYL can possibly be a good idea b) CODASYL does not provide acceptable data independence c) Record-at-a-time programming is too hard to optimize d) CODASYL and IMS are not flexible enough to easily represent common situations (such as marriage ceremonies) On the other side, there was Charlie Bachman and his followers (mostly DBMS practitioners) who argued the following: a) COBOL programmers cannot possibly understand the new-fangled relational languages b) It is impossible to implement the relational model efficiently c) CODASYL can represent tables, so what s the big deal? The highlight (or lowlight) of this discussion was an actual debate at SIGMOD 74 between Codd and Bachman and their respective seconds [RUST74]. One of us was in the audience, and it was obvious that neither side articulated their position clearly. As a result, neither side was able to hear what the other side had to say. In the next couple of years, the two camps modified their positions (more or less) as follows: Relational advocates a) Codd is a mathematician, and his languages are not the right ones. SQL [CHAM74] and QUEL [STON76] are much more user friendly.

12 What Goes Around Comes Around 13 b) System R [ASTR76] and INGRES [STON76] prove that efficient implementations of Codd s ideas are possible. Moreover, query optimizers can be built that are competitive with all but the best programmers at constructing query plans. c) These systems prove that physical data independence is achievable. Moreover, relational views [STON75] offer vastly enhanced logical data independence, relative to CODASYL. d) Set-at-a-time languages offer substantial programmer productivity improvements, relative to record-at-a-time languages. CODASYL advocates a) It is possible to specify set-at-a-time network languages, such as LSL [TSIC76], that provide complete physical data independence and the possibility of better logical data independence. b) It is possible to clean up the network model [CODA78], so it is not so arcane. Hence, both camps responded to the criticisms of the other camp. The debate then died down, and attention focused on the commercial marketplace to see what would happen. Fortuitously for the relational camp, the minicomputer revolution was occurring, and VAXes were proliferating. They were an obvious target for the early commercial relational systems, such as Oracle and INGRES. Happily for the relational camp, the major CODASYL systems, such as IDMS from Culinaine Corp. were written in IBM assembler, and were not portable. Hence, the early relational systems had the VAX market to themselves. This gave them time to improve the performance of their products, and the success of the VAX market went hand-in-hand with the success of relational systems. On mainframes, a very different story was unfolding. IBM sold a derivative of System R on VM/370 and a second derivative on VSE, their low end operating system. However, neither platform was used by serious business data processing users. All the action was on MVS, the high-end operating system. Here, IBM continued to sell IMS, Cullinaine successfully sold IDMS, and relational systems were nowhere to be seen. Hence, VAXes were a relational market and mainframes were a non-relational market. At the time all serious data management was done on mainframes. This state of affairs changed abruptly in 1984, when IBM announced the upcoming release of DB/2 on MVS. In effect, IBM moved from saying that IMS was their serious DBMS to a dual data base strategy, in which both IMS and DB/2 were declared strategic. Since DB/2 was the new technology and was much easier to use, it was crystal clear to everybody who the long-term winner was going to be.

13 14 Chapter 1: Data Models and DBMS Architecture IBM s signal that it was deadly serious about relational systems was a watershed moment. First, it ended once-and-for-all the great debate. Since IBM held vast marketplace power at the time, they effectively announced that relational systems had won and CODASYL and hierarchical systems had lost. Soon after, Cullinaine and IDMS went into a marketplace swoon. Second, they effectively declared that SQL was the de facto standard relational language. Other (substantially better) query languages, such as QUEL, were immediately dead. For a scathing critique of the semantics of SQL, consult [DATE84]. A little known fact must be discussed at this point. It would have been natural for IBM to put a relational front end on top of IMS, as shown in Figure 7. This architecture would have allowed IMS customers to continue to run IMS. New application could be written to the relational interface, providing an elegant migration path to the new technology. Hence, over time a gradual shift from DL/1 to SQL would have occurred, all the while preserving the high-performance IMS underpinnings In fact, IBM attempted to execute exactly this strategy, with a project code-named Eagle. Unfortunately, it proved too hard to implement SQL on top of the IMS notion of logical data bases, because of semantic issues. Hence, the complexity of logical data bases in IMS came back to haunt IBM many years later. As a result, IBM was forced to move to the dual data base strategy, and to declare a winner of the great debate. Old programs new programs Relational interface IMS The Architecture of Project Eagle Figure 7 In summary, the CODASL versus relational argument was ultimately settled by three events:

14 What Goes Around Comes Around 15 a) the success of the VAX b) the non-portability of CODASYL engines c) the complexity of IMS logical data bases The lessons that were learned from this epoch are: Lesson 7: Set-a-time languages are good, regardless of the data model, since they offer much improved physical data independence. Lesson 8: Logical data independence is easier with a simple data model than with a complex one. Lesson 9: Technical debates are usually settled by the elephants of the marketplace, and often for reasons that have little to do with the technology. Lesson 10: Query optimizers can beat all but the best record-at-a-time DBMS application programmers. V The Entity-Relationship Era In the mid 1970 s Peter Chen proposed the entity-relationship (E-R) data model as an alternative to the relational, CODASYL and hierarchical data models [CHEN76]. Basically, he proposed that a data base be thought of a collection of instances of entities. Loosely speaking these are objects that have an existence, independent of any other entities in the data base. In our example, Supplier and Parts would be such entities. In addition, entities have attributes, which are the data elements that characterize the entity. In our example, the attributes of Part would be pno, pname, psize, and pcolor. One or more of these attributes would be designated to be unique, i.e. to be a key. Lastly, there could be relationships between entities. In our example, Supply is a relationship between the entities Part and Supplier. Relationships could be 1-to-1, 1-to-n, n-to-1 or m-to-n, depending on how the entities participate in the relationship. In our example, Suppliers can supply multiple parts, and parts can be supplied by multiple suppliers. Hence, the Supply relationship is m-to-n. Relationships can also have attributes that describe the relationship. In our example, qty and price are attributes of the relationship Supply. A popular representation for E-R models was a boxes and arrows notation as shown in Figure 8. The E-R model never gained acceptance as the underlying data model that is implemented by a DBMS. Perhaps the reason was that in the early days there was no query language proposed for it. Perhaps it was simply overwhelmed by the interest in the relational model in the 1970 s. Perhaps it looked too much like a cleaned up version of the CODASYL model. Whatever the reason, the E-R model languished in the 1970 s.

15 16 Chapter 1: Data Models and DBMS Architecture Part Pno, pname, psize, pcolor. Supply qty, price Supplier Sno, sname, scity, sstate An E-R Diagram Figure 8 There is one area where the E-R model has been wildly successful, namely in data base (schema) design. The standard wisdom from the relational advocates was to perform data base design by constructing an initial collection of tables. Then, one applied normalization theory to this initial design. Throughout the decade of the 1970 s there were a collection of normal forms proposed, including second normal form (2NF) [CODD71b], third normal form [CODD71b], Boyce-Codd normal form (BCNF) [CODD72b], fourth normal form (4NF) [FAGI77a], and project-join normal form [FAGI77b]. There were two problems with normalization theory when applied to real world data base design problems. First, real DBAs immediately asked How do I get an initial set of tables? Normalization theory had no answer to this important question. Second, and perhaps more serious, normalization theory was based on the concept of functional dependencies, and real world DBAs could not understand this construct. Hence, data base design using normalization was dead in the water. In contrast, the E-R model became very popular as a data base design tool. Chen s papers contained a methodology for constructing an initial E-R diagram. In addition, it was straightforward to convert an E-R diagram into a collection of tables in third normal form [WONG79]. Hence, a DBA tool could perform this conversion automatically. As such, a DBA could construct an E-R model of his data, typically using a boxes and arrows drawing tool, and then be assured that he would automatically get a good relational schema. Essentially all data base design tools, such as Silverrun from Magna Solutions, ERwin from Computer Associates, and ER/Studio from Embarcadero work in this fashion. Lesson 11: Functional dependencies are too difficult for mere mortals to understand. Another reason for KISS (Keep it simple stupid). VI R++ Era Beginning in the early 1980 s a (sizeable) collection of papers appeared which can be described by the following template:

JCR or RDBMS why, when, how?

JCR or RDBMS why, when, how? JCR or RDBMS why, when, how? Bertil Chapuis 12/31/2008 Creative Commons Attribution 2.5 Switzerland License This paper compares java content repositories (JCR) and relational database management systems

More information

The End of an Architectural Era (It s Time for a Complete Rewrite)

The End of an Architectural Era (It s Time for a Complete Rewrite) The End of an Architectural Era (It s Time for a Complete Rewrite) Michael Stonebraker Samuel Madden Daniel J. Abadi Stavros Harizopoulos MIT CSAIL {stonebraker, madden, dna, stavros}@csail.mit.edu ABSTRACT

More information

Anatomy of a Database System

Anatomy of a Database System Anatomy of a Database System Joseph M. Hellerstein and Michael Stonebraker 1 Introduction Database Management Systems (DBMSs) are complex, mission-critical pieces of software. Today s DBMSs are based on

More information


THE ASSOCIATIVE MODEL OF DATA THE ASSOCIATIVE MODEL OF DATA SECOND EDITION SIMON WILLIAMS LAZY SOFTWARE Copyright 2000, 2002 Lazy Software Ltd All rights reserved. No part of this publication may be reproduced, stored in a retrieval

More information

A History and Evaluation of System R

A History and Evaluation of System R COMPUTNG PRACTCES A History and Evaluation System R Donald D. Chamberlin Morton M. Astrahan Michael W. Blasgen James N. Gray W. Frank King Bruce G. Lindsay Raymond Lorie James W. Mehl Thomas G. Price Franco

More information

Data Abstraction and Hierarchy

Data Abstraction and Hierarchy Data Abstraction and Hierarchy * This research was supported by the NEC Professorship of Software Science and Engineering. Barbara Liskov Affiliation: MIT Laboratory for Computer Science Cambridge, MA,

More information

Problem solving and program design

Problem solving and program design Keil and Johnson, C++, 1998 1-1 Chapter 1 Problem solving and program design 1. Problems, solutions, and ways to express them 2. The branch control structure 3. The loop control structure 4. Modular design

More information

Practical File System Design

Practical File System Design Practical File System Design:The Be File System, Dominic Giampaolo half title page page i Practical File System Design with the Be File System Practical File System Design:The Be File System, Dominic Giampaolo

More information

Base Tutorial: From Newbie to Advocate in a one, two... three!

Base Tutorial: From Newbie to Advocate in a one, two... three! Base Tutorial: From Newbie to Advocate in a one, two... three! BASE TUTORIAL: From Newbie to Advocate in a one, two... three! Step-by-step guide to producing fairly sophisticated database applications

More information

A Veritable Bucket of Facts Origins of the Data Base Management System Thomas Haigh

A Veritable Bucket of Facts Origins of the Data Base Management System Thomas Haigh A Veritable Bucket of Facts Origins of the Data Base Management System Thomas Haigh University of Wisconsin--Milwaukee Bolton Hall, Room 582, P.O. Box 413, Milwaukee, WI 53201-0413 thaigh@computer.org

More information

Data migration, a practical example from the business world

Data migration, a practical example from the business world Data migration, a practical example from the business world Master of Science Thesis in Software Engineering and Technology LINA RINTAMÄKI Chalmers University of Technology University of Gothenburg Department

More information

IP ASSETS MANAGEMENT SERIES. Successful Technology Licensing


More information

One Size Fits All : An Idea Whose Time Has Come and Gone

One Size Fits All : An Idea Whose Time Has Come and Gone One Size Fits All : An Idea Whose Time Has Come and Gone Michael Stonebraker Computer Science and Artificial Intelligence Laboratory, M.I.T., and StreamBase Systems, Inc. stonebraker@csail.mit.edu Uğur

More information

Hints for Computer System Design 1

Hints for Computer System Design 1 Hints for Computer System Design 1 Butler W. Lampson Abstract Computer Science Laboratory Xerox Palo Alto Research Center Palo Alto, CA 94304 Studying the design and implementation of a number of computer

More information

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science chord@lcs.mit.edu

More information

How To Write a Good PRD. Martin Cagan Silicon Valley Product Group

How To Write a Good PRD. Martin Cagan Silicon Valley Product Group How To Write a Good PRD Martin Cagan Silicon Valley Product Group HOW TO WRITE A GOOD PRD Martin Cagan, Silicon Valley Product Group OVERVIEW The PRD describes the product your company will build. It

More information

A Federated Architecture for Information Management

A Federated Architecture for Information Management A Federated Architecture for Information Management DENNIS HEIMBIGNER University of Colorado, Boulder DENNIS McLEOD University of Southern California An approach to the coordinated sharing and interchange

More information

What are requirements?

What are requirements? 2004 Steve Easterbrook. DRAFT PLEASE DO NOT CIRCULATE page 1 C H A P T E R 2 What are requirements? The simple question what are requirements? turns out not to have a simple answer. In this chapter we

More information


SOFTWARE ENGINEERING SOFTWARE ENGINEERING Report on a conference sponsored by the NATO SCIENCE COMMITTEE Garmisch, Germany, 7th to 11th October 1968 Chairman: Professor Dr. F. L. Bauer Co-chairmen: Professor L. Bolliet, Dr.

More information

On System Design. Jim Waldo

On System Design. Jim Waldo On System Design Jim Waldo On System Design Jim Waldo Perspectives 2006-6 In an Essay Series Published by Sun Labs December 2006 This work first appeared as part of the OOPSLA 2006 Essays track, October

More information

Abstract. 1. Introduction. Butler W. Lampson Xerox Palo Alto Research Center David D. Redell Xerox Business Systems

Abstract. 1. Introduction. Butler W. Lampson Xerox Palo Alto Research Center David D. Redell Xerox Business Systems Experience with Processes and Monitors in Mesa 1 Abstract Butler W. Lampson Xerox Palo Alto Research Center David D. Redell Xerox Business Systems The use of monitors for describing concurrency has been

More information


WHEN ARE TWO ALGORITHMS THE SAME? WHEN ARE TWO ALGORITHMS THE SAME? ANDREAS BLASS, NACHUM DERSHOWITZ, AND YURI GUREVICH Abstract. People usually regard algorithms as more abstract than the programs that implement them. The natural way

More information

developer.* The Independent Magazine for Software Professionals Code as Design: Three Essays by Jack W. Reeves

developer.* The Independent Magazine for Software Professionals Code as Design: Three Essays by Jack W. Reeves developer.* The Independent Magazine for Software Professionals Code as Design: Three Essays by Jack W. Reeves Introduction The following essays by Jack W. Reeves offer three perspectives on a single theme,

More information

Direct Manipulation Interfaces

Direct Manipulation Interfaces I b HUMAN-COMPUTER INTERACTION, 1985, Volume 1, pp. 311-338 4,T. Copyright 0 1985, Lawrence Erlbaum Associates, Inc. Direct Manipulation Interfaces Edwin L. Hutchins, James D. Hollan, and Donald A. Norman

More information

The Design of the Borealis Stream Processing Engine

The Design of the Borealis Stream Processing Engine The Design of the Borealis Stream Processing Engine Daniel J. Abadi 1, Yanif Ahmad 2, Magdalena Balazinska 1, Uğur Çetintemel 2, Mitch Cherniack 3, Jeong-Hyon Hwang 2, Wolfgang Lindner 1, Anurag S. Maskey

More information

Object-Oriented Programming in Oberon-2

Object-Oriented Programming in Oberon-2 Hanspeter Mössenböck Object-Oriented Programming in Oberon-2 Second Edition Springer Verlag Berlin Heidelberg 1993, 1994 This book is out of print and is made available as PDF with the friendly permission

More information


A PROBLEM-ORIENTED LANGUAGE PROGRAMMING A PROBLEM-ORIENTED LANGUAGE Charles H. Moore 2 Preface This is an unpublished book I wrote long ago. Just after I'd written the first versions of Forth. Perhaps it explains the motivation behind

More information


PLANNING ANALYSIS PLANNING ANALYSIS DESIGN PLANNING ANALYSIS Apply Requirements Analysis Technique (Business Process Automation, Business Process Improvement, or Business Process Reengineering) Use Requirements-Gathering Techniques (Interview,

More information

A Process for COTS Software Product Evaluation

A Process for COTS Software Product Evaluation A Process for COTS Software Product Evaluation Santiago Comella-Dorda John Dean Grace Lewis Edwin Morris Patricia Oberndorf Erin Harper July 2004 TECHNICAL REPORT ESC-TR-2003-017 Pittsburgh, PA 15213-3890

More information

Introduction to Data Mining and Knowledge Discovery

Introduction to Data Mining and Knowledge Discovery Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining

More information