A Fragment-Based Approach for Efficiently Creating Dynamic Web Content

Size: px
Start display at page:

Download "A Fragment-Based Approach for Efficiently Creating Dynamic Web Content"

Transcription

1 A Fragment-Based Approach for Efficiently Creating Dynamic Web Content JIM CHALLENGER, PAUL DANTZIG, ARUN IYENGAR, and KAREN WITTING IBM Research This article presents a publishing system for efficiently creating dynamic Web content. Complex Web pages are constructed from simpler fragments. Fragments may recursively embed other fragments. Relationships between Web pages and fragments are represented by object dependence graphs. We present algorithms for efficiently detecting and updating Web pages affected after one or more fragments change. We also present algorithms for publishing sets of Web pages consistently; different algorithms are used depending upon the consistency requirements. Our publishing system provides an easy method for Web site designers to specify and modify inclusion relationships among Web pages and fragments. Users can update content on multiple Web pages by modifying a template. The system then automatically updates all Web pages affected by the change. Our system accommodates both content that must be proofread before publication and is typically from humans as well as content that has to be published immediately and is typically from automated feeds. We discuss some of our experiences with real deployments of our system as well as its performance. We also quantitatively present characteristics of fragments used at a major deployment of our publishing system including fragment sizes, update frequencies, and inclusion relationships. Categories and Subject Descriptors: H.4.0 [Information Systems Applications]: General General Terms: Design, Performance Additional Key Words and Phrases: Caching, dynamic content, fragments, publishing, Web, Web performance 1. INTRODUCTION Many Web sites need to provide dynamic content. Examples include sport sites, stock market sites, and virtual stores or auction sites where information on available products is constantly changing. Based on: A Publishing System for Efficiently Creating Dynamic Web Content, by Jim Challenger, Arun Iyengar, Karen Witting, Cameron Ferstat, and Paul Reed, which appeared in Proceedings of INFOCOM 2000, c 2000 IEEE. Authors address: IBM T. J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598; Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY USA, fax: +1 (212) , or permissions@acm.org. C 2005 ACM /05/ $5.00 ACM Transactions on Internet Technology, Vol. 5, No. 2, May 2005, Pages

2 360 J. Challenger et al. There are several problems with providing dynamic data to clients efficiently and consistently. A key problem with dynamic data is that it can be expensive to create; a typical dynamic page may require several orders of magnitude more CPU time to serve than a typical static page of comparable size. The overhead for dynamic data is a major problem for Web sites that receive substantial request volumes. Significant hardware may be needed for such sites. A key requirement for many Web sites providing dynamic data is to completely and consistently update pages that have changed. In other words, if a change to underlying data affects multiple pages, all such pages should be correctly updated. In addition, a bundle of several changed pages may have to be made visible to clients at the same time. For example, publishing pages in bundles instead of individually may prevent situations where a client views a first page, clicks on a hypertext link to view a second page, and sees information on the second page that is older and not consistent with the information on the first page. Depending upon the way in which dynamic data are being served, achieving complete and consistent updates can be difficult or inefficient. Many Web sites cache dynamic data in memory or a file system in order to reduce the overhead of re-calculating Web pages every time they are requested [Iyengar and Challenger 1997]. In these systems, it is often difficult to identify which cached pages are affected by a change to underlying data that modifies several dynamic Web pages. In making sure that all obsolete data are invalidated, deleting some current data from cache may be unavoidable. Consequently, cache miss rates after an update may be high, adversely affecting performance. In addition, multiple cache invalidations from a single update must be made consistently. This article presents a system for efficiently and consistently publishing dynamic Web content. In order to reduce the overhead of generating dynamic pages from scratch, our system composes dynamic pages from simpler entities known as fragments. Fragments typically represent parts of Web pages that change together; when a change to underlying data occurs that affects several Web pages, the fragments affected by the change can easily be identified. It is possible for a fragment to recursively embed another fragment. Our system provides a user-friendly method for managing complex Web pages composed of fragments. Users specify how Web pages are composed from fragments by creating templates in a markup language. Templates are parsed to determine inclusion relationships among fragments and Web pages. These inclusion relationships are represented by a graph known as an object dependence graph (ODG). Graph traversal algorithms are applied to ODGs in order to determine how changes should be propagated throughout the Web site after one or more fragments change. Our system allows multiple independent authors to provide content as well as multiple independent proofreaders to approve some pages for publication and reject others. Publication may proceed in multiple stages in which a set of pages must be approved in one stage before it is passed to the next. Our system can also include a link checker, which verifies that a Web page has no broken hypertext links at the time the page is published. It is also scalable to handle high request rates.

3 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 361 In addition to describing the architecture of our fragment-based Web publishing system in detail, this article also presents the first empirical study of a large scale real deployment of a fragment-based Web publishing system that we are aware of. We present object size distributions, update frequency distributions, and characteristics of the fragment inclusion relationships. These statistics can be used to architect efficient Web publishing systems. For example, we have used fragment size distributions to optimize the manner in which fragments are stored on disk [Iyengar et al. 2001]. The remainder of the article is organized as follows. Section 2 discusses our methodology for constructing Web pages from fragments. Section 3 describes the architecture of our system in detail. Section 4 describes the performance of our system and presents statistics obtained from deploying our system at real Web sites. Section 5 discusses related work. Finally, Section 6 summarizes our main results and conclusions. 2. CONSTRUCTING WEB PAGES FROM FRAGMENTS 2.1 Overview A key feature of our system is that it composes complex Web pages from simpler fragments (Figure 9). A page is a complete entity that may be served to a client. We say that a fragment or page is atomic if it doesn t include any other fragments and complex if it includes other fragments. An object is either a page or a fragment. Our approach is efficient because the overhead for composing an object from simpler fragments is usually minor. By contrast, the overhead for constructing the object from scratch as an atomic fragment is generally much higher. Using the fragment approach, it is possible to achieve significant performance improvements without caching dynamic pages and dealing with the difficulties of keeping caches consistent. For optimal performance, our system has the ability to cache dynamic pages. Caching capabilities are integrated with fragment management. The fragment-based approach for generating Web pages makes it easier to design Web sites in addition to improving performance. It is easy to design a set of Web pages with a common look and feel. It is also easy to embed common information into several Web pages. Sets of Web pages containing similar information can be managed together. For example, it is easy to update common information represented by a single fragment but embedded within multiple pages; in order to update the common information everywhere, only the fragment needs to be changed. By contrast, if the Web pages are stored statically in a file system, identifying and updating all pages affected by a change can be difficult. Once all changed pages have been identified, care must be taken to update all changed pages in order to preserve consistency. Dynamic Web pages that embed fragments are implicitly updated any time an embedded fragment changes, so consistency is automatically achieved. Consistency becomes an issue with the fragment-based approach when the pages

4 362 J. Challenger et al. are being published to a cache or file system. Our system provides several different methods for consistently publishing Web pages in these situations; each method provides a different level of consistency. Fragments also provide a mechanism by which remote caches can store some parts of dynamic and personalized pages. The remote cache stores the static parts of a page. When a page is requested, the cache requests the dynamic or personalized fragments of the page from the server. Typically, this would only constitute a small fraction of the page. The cache then composes and serves the composite page. HTML contains an OBJECT tag which allows an arbitrary program object to be embedded in an HTML page. While the OBJECT tag has the potential to achieve fragment composition at the client, a major drawback is that major browsers don t support the tag properly. Therefore, we couldn t rely on the OBJECT tag for our Web sites. One of the key things our publishing system enables is separation of the creative process from the mechanical process of building a Web site. Previously, the content, look, and feel of large sites we were involved with had to be carefully planned well in advance of the creation of the first page. Changes to the original plans were quite difficult to execute, even in the best of circumstances. Lastminute changes tended to be impossible, resulting in a choice between delayed or flawed site publication. With our publishing system, the entire look and feel of a site can be changed and republished within minutes. Aside from the cost savings, this has allowed tremendous creativity on the part of designers. Entire site designs can be created, experimented with, changed, discarded, and replaced several times a day during the construction of the site. This can take place in parallel with and independently of the creation of site content. A specific example of this was demonstrated just before a new site look for the 2000 Sydney Olympic Games Web site was made public. One day before the site was to go live before the public, it was decided that the search facility was not working sufficiently well and must be removed. This change affected thousands of pages, and would previously have delayed publication of the site by as much as several days. Using our system, the site authors simply removed the search button from the appropriate fragment and republished the fragment. Ten minutes later, the change was complete, every page had been rebuilt, and the site went live on schedule. 2.2 Object Dependence Graphs When pages are constructed from fragments, it is important to construct a fragment f 1 before any object containing f 1 is constructed. In order to construct objects in an efficient order, our system represents relationships between fragments and Web pages by graphs known as object dependence graphs (ODGs) (Figures 1 and 2). Object dependence graphs may have several different edge types. An inclusion edge indicates that an object embeds a fragment. A link edge indicates that an object contains a hypertext link to another object.

5 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 363 Fig. 1. A set of Web pages containing fragments. Fig. 2. The object dependence graph (ODG) corresponding to Figure 1. In the ODG in Figure 2, all but one of the edges are inclusion edges. For example, the edge from f 4 to P 1 indicates that P 1 contains f 4 ; thus, when f 4 changes, f 4 should be updated before P 1 is updated. The graph resulting from only inclusion edges is a directed acyclic graph. The edge from P 3 to P 2 is a link edge, which indicates that P 2 contains a hypertext link to P 3. A key reason for maintaining link edges is to prevent dangling or inconsistent hypertext links. In this example, the link edge from P 3 to P 2 indicates that publishing P 2 before P 3 will result in a broken hypertext link. Similarly, when both P 2 and P 3 change, publishing a current version of P 2 before publishing a current version of P 3 could present inconsistent information to clients who view an updated version of P 2,click on the hypertext link to an outdated version of P 3, and then see information that is obsolete relative to the referring page. Link edges can form cycles within an ODG. This would occur, for example, if two pages both contain hypertext links to each other. There are two methods for creating and modifying ODGs. Using one approach, users specify how Web pages are composed from fragments by creating templates in a markup language. Templates are parsed to determine inclusion relationships among fragments and Web pages. Using the second approach, a program may directly manipulate edges and vertices of an ODG by using an API. We have deployed systems using both approaches. The first approach is easier for Web page designers to use but incurs higher overheads. Our system allows an arbitrary number of edge types to exist in ODGs. So far, we have only found practical use for inclusion and link edges. We suspect that

6 364 J. Challenger et al. there may be other types of important relationships that can be represented by other edge types. When our system becomes aware of changes to a set S of one or more objects, it does a depth-first graph traversal using topological sort [Cormen et al. 1990] to determine all vertices reachable from S by following inclusion edges. The topological sort orders vertices such that whenever there is an edge from a vertex v to another vertex u, v appears before u in the topological sort. For example, a valid topological sort of the graph in Figure 2 after P 3, f 4, and f 2 change would be P 3, f 4, f 2, f 5, P 2, f 1, f 3, and P 1. This topological sort ignores link edges. Objects are updated in an order consistent with the topological sort. Our system updates objects in parallel when possible. In the previous example, P 3, f 4, and f 2 can be updated in parallel. After f 2 is updated, f 1 and f 5 may be updated in parallel. A number of other objects may be constructed in parallel in a manner consistent with the inclusion edges of the ODG. After a set of pages, U, has been updated (or generated for the first time), the pages in U are published so that they can be viewed by clients. In some cases, the pages are published to file systems. In other cases, they are published to caches. Pages may be published either locally on the system generating them or to a remote system. It is often a requirement for a set of multiple pages to be published consistently. Consistency can be guaranteed by publishing all changed (or newly generated) pages in a single atomic action. One potential drawback to this method of publication is that the publication process may be relatively long. For example, pages may have to be proofread before publication. If everything is published together in a single atomic action, there can be considerable delay before any information is made available. Therefore, incremental publication, wherein information is published in stages instead of together, is often desirable. The disadvantage to incremental publication is that consistency guarantees are not as strong. Our system provides three different methods for incremental publication, each providing different levels of consistency. The first incremental publishing method guarantees that a freshly published page will not contain a hypertext link to either an obsolete or unpublished page. This consistency guarantee applies to pages reached by following several hypertext links. More specifically, if P 1 and P 2 are two pages in U, ifaclient views an updated version of P 1 and follows one or more hypertext links to view P 2, then the client is guaranteed to see a version of P 2 that is not obsolete with respect to the version of P 1 that the client viewed (a version of P 2 is obsolete with respect to a version of P 1 if the version of P 2 was outdated at the time the version of P 1 became current, regardless of whether P 1 or P 2 have any fragments in common). For example, consider the Web pages in Figure 3. A client can access P 3 by starting at P 1, following a hypertext link to P 2 and then following a second hypertext link to P 3. Suppose that both P 1 and P 3 change. The first incremental publishing method guarantees that the new version of P 1 will not be published before the new version of P 3, regardless of whether P 2 has changed.

7 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 365 Fig. 3. A set of Web pages connected by hypertext links. Fig. 4. A set of Web pages containing common fragments. This incremental publishing method is implemented by first determining the set R of all pages that can be reached by following hypertext links from a page in U. R includes all pages of U; itmay also include previously published pages that haven t changed. R is determined by traversing link edges in reverse order starting from pages in U. Let K be the subgraph of the ODG consisting of all nodes in R and link edges in the ODG connecting nodes in R. The strongly connected components of K are determined. A strongly connected component of a directed graph is a maximal subset of vertices S such that every vertex in S has a directed path to every other vertex in S. A good algorithm for finding strongly connected components in directed graphs is contained in Cormen et al. [1990]. A graph K scc is constructed in which each vertex represents a different strongly connected component of K and there is an edge between two nodes (S, T) ofk scc if and only if there exist nodes s S and t T such that (s, t) is an edge in K. K scc is then topologically sorted and examined in an order consistent with the topological sort to locate pages in U. Each time a page in U for which the updated version hasn t been published yet is examined, the page is published together with all other pages in U belonging to the same strongly connected component. Each set of pages that are published together in an atomic action is known as a bundle. The second incremental publishing method guarantees that any two pages in U that both contain a common changed fragment are published in the same bundle. For example, consider the Web pages in Figure 4. Suppose that both f 1 and f 2 change. Since P 1 and P 3 both embed f 1, their updated versions must be published together. Since P 2 and P 3 both embed f 2, their updated versions must also be published together. Thus, updated versions of all three Web pages must be published together. Note that updated versions of P 1 and P 2 must be published together, even though the two pages don t embed a common fragment. In order to implement this approach, the set of all changed fragments contained within each changed object d 1 is determined. We call this set the changed fragment set for d 1 and denote it by C(d 1 ). All changed objects are constructed in topological sorting order. When a changed object d 1 is constructed, C(d 1 )is calculated as the union of f 2 and C( f 2 ) for each changed fragment f 2 such that an inclusion edge ( f 2, d 1 ) exists in the ODG.

8 366 J. Challenger et al. Fig. 5. Another set of related Web pages. After all changed fragment sets have been determined, an undirected bipartite graph D is constructed in which the vertices of D are pages in U on one side and changed fragments contained within pages of U on the other side. For each page P U, anundirected edge ( f, P) iscreated for each f C(P). D is examined to determine its connected components (two vertices are part of the same connected component if and only if there is a path between the vertices in the graph). All pages in U belonging to the same connected component are published in the same bundle. The third incremental publishing method satisfies the consistency guarantees of both the first and second methods. In other words, (1) A freshly published page will not contain a hypertext link to either an obsolete or unpublished page. More specifically, if P 1 and P 2 are two pages in U, ifaclient views an updated version of P 1 and follows one or more hypertext links to view P 2, then the client is guaranteed to see a version of P 2 that is not obsolete with respect to the version of P 1 that the client viewed. (2) Any two changed pages that both contain a common changed fragment are published together. This method generally results in publishing fewer bundles but of larger sizes than the first two approaches. For example, consider the Web pages in Figure 5. Suppose that both P 1 and f 1 change. Updated versions of P 2 and P 3 must be published together because they both embed f 1. Since P 1 contains a hypertext link to P 3, the updated version of P 1 cannot be published before the bundle containing updated versions of P 2 and P 3. If, instead, the first incremental publishing method were used to publish the Web pages in Figure 5, the updated version of P 1 could not be published before the updated version of P 3. However, the updated version of P 2 would not have to be published in the same bundle as the updated version of P 3. If the second incremental publishing method were used, updated versions of both P 2 and P 3 would have to be published together in the same bundle. However, publication of the updated version of P 1 would be allowed to precede publication of the bundle containing updated versions of P 2 and P 3. The third incremental publishing method is implemented by constructing K as in the first method and D as in the second. Additional edges are then added to K between pages in U to ensure that all Web pages belonging to the same connected component of D belong to the same strongly connected component of

9 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 367 K. The same procedure as in the first method is then applied to K to publish pages in bundles. Incremental publishing methods can be designed for other consistency requirements as well. For example, consider Figure 3. Suppose that both P 1 and P 3 change. It may be desirable to publish updated versions of P 1 and P 3 in the same bundle. This would avoid the following situation, which could occur using the first method. Aclient views an old version of P 1. After following hypertext links, the client arrives at a new version of P 3. The browser s cache is then used to go to the old version of P 1. The client reloads P 1 in order to obtain a version consistent with P 3 but still sees the old version because the new version of P 1 has not yet been published. It is straightforward to implement an incremental publishing method which would publish P 1 and P 3 in the same bundle using techniques similar to the ones just described. 2.3 Combined Content Pages Many Web sites contain information that is fed from multiple sources. Some of the information, such as the latest scores from a sporting event, is generated automatically by a computer. Other information, such as news stories, is generated by humans. Both types of information are subject to change. A page containing both human and computer-generated information is known as a combined content page. A key problem with serving combined content pages is the different rates at which sources produce content. Computer-generated content tends to be produced at a relatively high rate, often as fast as the most sophisticated timing technology permits. Human-generated content is produced at a much lower rate. Thus, it is difficult for humans to keep pace with automated feeds. By the time an editor has finished with a page, the actual results on the page may have changed. If the editor takes time to update the page, the results may have changed yet again. A requirement for many of the Web sites we have helped design is that computer-generated content should not be delayed by humans. Computergenerated results, such as the latest results from a sporting event, are often extremely important and should be published as soon as possible. If computergenerated results are combined with human-edited content using conventional Web publishing systems, publication of the computer-generated results can be significantly delayed. What is needed is a scheme to combine data feeds of differing speeds so that information arriving at high rates is not unnecessarily delayed. In order to provide combined content pages, our system divides fragments into two categories. Immediate fragments are fragments that contain vital information, which should be published quickly with minimal proofreading. On the sports Web sites that our system is being used for, the latest results in a sporting event would be published as an immediate fragment. Quality controlled fragments are fragments that don t have to be published as quickly as

10 368 J. Challenger et al. immediate fragments but have content that must be examined in order to determine whether the fragments are suitable to be published. Background stories on athletes are typically published as quality controlled fragments at the sports sites that use our system. Combined content Web pages consist of a mixture of immediate and quality controlled fragments. When one or more immediate fragments change, the Web pages affected by the changes are updated and published with minimal proofreading. If both immediate and quality controlled fragments change, the system first performs updates resulting from the immediate fragments and publishes the updated Web pages with only minor proofreading to ensure that the immediate fragments are properly placed in the Web pages; the immediate fragments themselves are not proofread. The system subsequently performs updates resulting from quality controlled fragments and only publishes these updated Web pages after they have been proofread. Multiple versions of a combined content page may be published using this approach. The first version would be the page before any updates. The second version might contain updates to all immediate fragments but not to any quality controlled fragments. The third version might contain updates to all fragments. It is possible for an update to an immediate fragment f 1 to be published before an update to a quality controlled fragment f 2 even though f 2 changed before f 1. This might occur if the changes to f 2 are delayed in publication due to proofreading. 3. SYSTEM ARCHITECTURE Web pages produced by our system typically consist of multiple fragments. Each fragment may originate from a different source and may be produced at a different rate than other fragments. Fragments may be nested, permitting the construction of complex and sophisticated pages. Completed pages are written to sinks, which may be file systems, caches, or even other HTTP servers. The Trigger Monitor is the software that takes objects from one or more sources, constructs pages, and writes the constructed pages to one or more sinks (Figure 6). Relationships among fragments are maintained in a persistent ODG, which preserves state information in the event of a system crash. Whenever the Trigger Monitor is notified of a modification or addition of one or more objects, it fetches new copies of the changed objects from the appropriate source. The ODG is updated by parsing new and changed objects. The graph traversal algorithms described in Section 2.2 are then applied to the latest version of the ODG to determine all Web pages that need to be updated and an efficient order for updating them. Finally, bundles of published pages are written to the sinks. Since the Trigger Monitor is aware of all fragments and pages, synchronization is possible to prevent corruption of the pages. The ODG is used as the synchronization object to keep the fragment space consistent. Many trigger handlers, each with their own sources and sinks, may be configured to use a common ODG. This design permits, for example, a slow-moving, carefully edited human-generated set of pages and fragments to be integrated with

11 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 369 Fig. 6. Schematic of the publish process. a high-speed, automated, database-driven content source. Because the ODG is aware of the entire fragment space and the interrelationship of the objects within that space, synchronization points can be chosen to ensure that multiple, differently-sourced, differently-paced content streams remain consistent. Multiple Trigger Monitor instances may be chained, the sinks of earlier instances becoming the sources for later ones. This allows publication to take place in multiple stages. We have typically used the following stages in real deployments (Figure 7): Development is the first step in the process. Fragments that appear on many Web pages (such as generic headers and footers) as well as overall site design occur here. The output of development may be structurally complete but lacking in content. Staging takes as its input, or source, the output, or sink, of Development. Editors polish pages and combine content from various sources. Finished pages are the result. Quality Assurance takes as its source the sink of Staging. Pages are examined here for correctness and appropriateness. The quality assurance people examine whole pages, which include content from the automated results feed. In determining whether to reject a page, however, the quality assurance people will not scrutinize fragments corresponding to automated results. Automated Results are produced when a database trigger is generated as the result of an update. The trigger causes programs to be executed that extract

12 370 J. Challenger et al. Fig. 7. Schematic of the publish process. current results and compose relevant updated pages and fragments. Unlike the previous stages, no human intervention occurs in this stage. Production is where pages are served from. Its source is the sink of QA, and its sinks are the serving directories and caches. Note how one stage can use the sink of another stage as its source. The automated feed updates sources at the same time, but independently of the human-driven stages. This achieves the dual goals of keeping the entire site consistent while immediately publishing content from automated feeds. Stages can be added and deleted easily. Data sources can be added and deleted with little or no disruption to the flow. 3.1 Actual Deployments We now describe how our publishing system is typically deployed. Approximately three dozen different object types are produced by various data sources. These objects include entities such as images, PDF files, style sheets, movies, and HTML fragments. Objects are categorized into four primary classes based on how they participate in page assembly, and two secondary classes based on whether they are distributed as servable pages. Table I describes the four primary object types.

13 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 371 Table I. The Four Primary Object Types Embeds Other Fragments Doesn t Embed Other Fragments Embedded in other Fragment Intermediate Leaf Not Embedded in other Fragment Top-Level Binary Table II. Secondary Object Types Input to Assembly Generated by Assembly Intermediate Partial Top-Level Servable Html Binary objects such as images, sound clips, and movies are embedded in pages by virtue of HTML tags and therefore do not affect the page assembly process. The publishing system passes such objects directly from their source location (e.g. authors, web-cams) to the sink, which distributes them to the origin server s serving directory. The page assembly process produces two secondary classes of objects. A toplevel object is transformed into a final, servable HTML page after page assembly. This page is sent to the sink for distribution to the servers. An intermediate object is transformed into a partially assembled object by embedding those objects it refers to. The partials are written to a persistent cache for potential reuse in subsequent page assemblies. Table II summarizes the two secondary object types. Objects sent to the publishing system generally go through four steps: (1) Read object from source, (2) Update object dependence graph, (3) Assemble pages affected by the object, and (4) Save input object and assembled objects to a persistent cache and to sink for distribution. Figure 8 shows the flow of data through the publishing system for each type of object. The two repositories, source and assembled, are disk-based caches. The source repository caches new objects for potential reuse in subsequent publish operations. The assembled repository caches partial objects for potential reuse. Finally, servable pages are written to sinks for distribution to the servers. The act of publishing a page consists of the authoring system sending a message to the publishing system containing the names of the objects that have changed and that have been validated as ready for distribution. The first step taken by the publishing system is to fetch each object from the authoring system. If the object is a fragment of any sort, that is, a leaf, intermediate, or top-level object, it is then saved into the source repository. Binary objects (which do not participate in assembly) do not need to be cached. The fragments are then analyzed for dependencies, and the ODG is updated to reflect the new state of the system. After ensuring that the dependencies in the ODG are consistent with the new objects received, the ODG update task issues a request that returns a list of objects needing reassembly.

14 372 J. Challenger et al. Fig. 8. Dataflow through the publishing system. Page assembly now occurs. The page assembler will pull needed objects from the cache. If a partial object is being embedded in an object requiring reassembly, then the page assembler will pull the embedded object from the assembled repository (rather than the source repository). This extra cache eliminates the need to reassemble embedded partial objects, which might be costly. Finally, the output of the assembly process is written to the appropriate place. If the output of assembly is a partial object, it is written to the assembled repository for possible use in a future assembly. If the output of assembly is a servable page, then it is written to the sinks for distribution to the content servers. Triggered binary objects are also written to sinks. There are three major disk-based caches used in the publishing process: The source repository. All page fragments are placed into this repository upon entering the publishing system and retrieved during page assembly. The publish message plays the role of a cache-invalidation message for the source repository. The assembled repository. During page assembly, intermediate fragments (Table II) are built into partial fragments. These partial fragments are saved for reuse during page assembly. Objects in this cache are invalidated when ODG analysis determines that at least one of the fragments making up the partial fragment changes. The Object Dependence Graph (ODG). This is, in effect, a cache, because all of the information in it also resides in the database containing the fragments (which is, however, prohibitively expensive to search).

15 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 373 All three of these caches can be rebuilt in the event of a failure by republishing all the fragments in the authoring system s database. It is critical that these caches be persistent. The time to rebuild all three caches after total failure can be several hours for a major Web site. The caches are implemented as disk-backed hash tables [Iyengar et al. 2001]. The cache interfaces are implemented as Java(2) Map interfaces, compatible with the Java(2) HashMap, making it trivial to exchange a memory-based hash table with a disk-backed hash table. We exploit the disk-based cache in several ways: Rapid start-up after shutdown or failure. The time required to restart the entire publishing system is about three orders magnitude faster with a warm cache than without one. Space consumed by the publishing system can easily overflow main memory. The disk-based caches provide sufficient storage for situations where main memory would be insufficient. Replication of the publishing system. It is often desirable to have several instances of the publishing system installed for purposes such as development, test, recovery, and so on. Because each cache is implemented as a single file, it is easy to replicate the state of the publishing system for use elsewhere. 3.2 Examples To demonstrate how a site might be built from fragments, we present an example from a Web site for a French Open Tennis Tournament. A site architect views the player page for Steffi Graf (shown in Figure 9) as consisting of a standard header, sidebar, and footer, with biographical information and recent results thrown in. The site architect composes HTML similar to the following, establishing a general layout for the site: <html> <!-- %include(header.frg) --> <table> <tr> <td><!-- %include(sidebr.frg) --></td> <td><table> <tr><!-- %fragment(graf_bio.frg) --></tr> <tr><!-- %fragment(graf_score.frg) --></tr> </td></table> </tr> </table> <!-- %include(footer.frg) --> </html> where footer.frg consists of <!-- %fragment(factoid.frg) --> <!-- %fragment(copyr.frg) --> The form just shown for specifying fragments is concise and efficient. Since the fragments are specified by HTML comments, browsers and other entities,

16 374 J. Challenger et al. Fig. 9. Sample screen shot demonstrating the use of fragments. which don t recognize fragments will simply ignore directives related to fragments. Our system also allows fragments to be specified using the ESI language [Tsimelzon et al. a standard that has been developed for specifying fragments. When the first version of our publishing system was developed, ESI was not yet in existence. Prior to the beginning of play, the contents of graf score.frg will be empty, since no matches have commenced. This means the part of the page outlined by the dashed box in Figure 9 will, at first, be empty. The first publication of this fragment will result in the ODG seen to the right of Steffi Graf s player page in Figure 9. Again, the objects and edges within the dashed box will not yet be within the ODG, since no match play has yet occurred. Using fragments in this way permits many architects, editors, and even automated systems to modify the page simultaneously. Our system ensures that all changes are properly included in the final page that is seen by the user. An architect updating the structure of the page does not need to know anything about copyrights, trademarks, the size of the sponsor s logos, the look-and-feel of the site, or any of the data that will be included on the page. Similarly, an editor wishing to change the look-and-feel of a site does not need to understand the structure of any particular page. Certain types of major site changes are greatly facilitated by our publishing system. For example, changing the sidebar to reflect the end of a long event

17 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 375 is as simple as updating sidebr.frg. To change the look-and-feel of a site, an editor only needs to change header.frg and footer.frg. For both these kinds of changes, the system will use the ODG from Figure 9 to determine that Steffi Graf s page must be rebuilt (along with many others). Once all pages have been rebuilt, they will be republished. The user will see the changes on every page, although the vast majority of underlying fragments will not have changed. More static information, like player biographies, can be kept up-to-date in one place but used on many pages. For example, graf bio.frg is used on our example page, but may also be used in many other places. To include a new photo or update the information included in the biography, the editors need only concern themselves with updating graf bio.frg. The system ensures that all pages that include graf bio.frg will automatically be rebuilt. Since scoring information will change frequently once a tennis match is in progress, updating that aspect of a page can be handled by an automated process. As a match begins, graf score.frg is updated to include the match in progress. This means that once the final has begun, the graf score.frg page will consist of HTML similar to <!-- %fragment(final.frg) --> <!-- %fragment(semi.frg) --> When the updated graf score.frg is published, the system will detect that it now includes final.frg and semi.frg and will update the ODG as shown in the dashed box within Figure 9. Now, as the final match progresses, only final.frg needs to be updated and published through our system. As part of the publication process, the system will detect that final.frg is included in graf score.frg, causing graf score.frg to be rebuilt using the updated score. Likewise, the system will detect that Steffi Graf s page must be rebuilt as well, and a new page will be built including the updated scoring information. Eventually, when the match completes, the complete page shown in the example is produced. The score for the final match will be displayed on many pages other than Steffi Graf s player page. For instance, Martina Hingis s player page will also include these results, as will the scoreboard page while the match is in progress. A page listing match-ups between different players will also contain the score. To update all of these pages, the automated system only updates one fragment. This keeps the automated system independent of the site design. A more complex example of a Web page with fragments is shown in Figure 10, which depicts the Athletics Home Page from the 2000 Olympic Games Web Site on October 1, Both the header and footer are in separate frames. This reduces the size of pages and the amount of information that needs to be loaded when navigating between pages. It also allows clients to access the top and bottom navigation elements at any time since when scrolling through pages, they do not move. The page contains a total of 46 fragments, a typical number for the Web site. The header contains 1 top-level fragment and 12 embedded fragments. The footer contains 1 top-level fragment and 3 embedded fragments. Neither the header nor footer were changed during the games. The Athletics Home

18 376 J. Challenger et al. Fig. 10. Another example of a Web page composed of fragments.

19 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 377 Fig. 11. The CPU time in milliseconds required to construct and publish bundles of various sizes. Page frame contains 9 top-level fragments and 20 embedded fragments. This page was updated frequently, and fragments were an essential component in reducing the overhead for updates. 4. SYSTEM PERFORMANCE AND DEPLOYMENT STATISTICS This section describes the performance of a Java implementation of our system running on an IBM Intellistation containing a 333 Mhz Pentium II processor with 256 Mbytes of memory and the Windows NT (version 4.0) operating system. The distribution of Web pages sizes is similar to the one for the 1998 Olympic Games Web site [Iyengar et al. 1999] as well as more recent Web sites deploying our system; the average Web page size is around 10 Kbytes. Fragment sizes are typically several hundred bytes but usually less than 1 Kbyte. The distribution of fragment sizes is also representative of real Web sites deploying our system. Figure 11 shows the CPU time in milliseconds required for constructing and publishing bundles of various sizes. Times are averaged over 100 runs. All 100 runs were submitted simultaneously, so the times in the figure reflect the ability for the runs to be executed in parallel. The solid curve depicts times when all objects that need to be constructed are explicitly triggered. The dotted line depicts times when a single fragment that is included in multiple pages is triggered; the pages that need to be built as a result of the change to the fragment are determined from the ODG. Graph traversal algorithms applied to the ODG have relatively low overhead. By contrast, each object that is triggered has to be read from disk and parsed; these operations consume considerable CPU overhead. As the graph indicates, it is more desirable to trigger a few objects that are included in multiple pages than to trigger all objects that need to be constructed. Our implementation allows multiple complex objects to be constructed in parallel. As a result, we are able to achieve near 100% CPU utilization, even when construction of an object was blocked due to I/O, by concurrently constructing other objects.

20 378 J. Challenger et al. Fig. 12. The breakdown in CPU time required to construct and publish a typical complex Web page. The breakdown as to where CPU time is consumed is shown in Figure 12. CPU time is divided into the following categories: Retrieve, parse: time to read all triggered objects from disk and parse them for determining included fragments. ODG update: time for updating the ODG based on the information obtained from parsing objects and for analyzing the ODG to determine all objects that need to be updated and an efficient order for updating the objects. Assembly: time to update all objects. Save data: time to save all updated objects on disk. Send ack: time to send an acknowledgment message via HTTP that publication is complete. In the bars marked 1to100, one fragment included in 100 others was triggered. The 100 pages that needed to be constructed were determined from the ODG. In the bars marked 100 to 100, the 100 pages that needed to be constructed were all triggered. The times shown in Figure 12 are the average times for a single page. The total average time for constructing and publishing a page in the 1 to 100 page is milliseconds (represented by the aggregate of all bars); the corresponding time for the 100 to 100 case is milliseconds. The retrieve and parse time is significantly higher for the 100 to 100 case because the system is reading and parsing 100 objects compared with 1 in the 1to100 case. Since the source for every object that is triggered must be saved, the time it takes to save the data is somewhat longer when 100 objects are triggered than when only one object is triggered. Figure 13 shows how the construction and publication time averaged over 100 runs varies with the number of embedded fragments within a Web page. While the time grows with the number of fragments, the overhead for assembling a Web page with several embedded fragments is still relatively low. A Web

21 Fragment-Based Approach for Efficiently Creating Dynamic Web Content 379 Fig. 13. The average CPU time in milliseconds required to construct and publish a complex Web page as a function of the number of embedded fragments. In each case, one fragment in the page was triggered. Fig. 14. The average CPU time in milliseconds required to construct and publish a complex Web page as a function of the number of fragments triggered. page constructed from multiple database accesses may require over a second of CPU time to construct from scratch without using fragments. The benefits of reducing database accesses by assembling Web pages from pre-existing fragments can thus be significant. Figure 14 shows how the construction and publication time averaged over 100 runs varies with the number of fragments that are triggered for a Web page containing 20 fragments. Since each fragment that is triggered has to be read from disk and parsed, overhead increases with the number of fragments triggered. 4.1 Deployment Statistics We now present statistics we collected from a deployment of our publishing system at a major Web site. 1 The statistics from this section can be used to design efficient publishing systems. This deployment did not make use of link edges or incremental publication as described in Section 2.2. Standard third 1 The 2000 Olympic Games Web site.

22 380 J. Challenger et al. Fig. 15. The distribution of object sizes in the source repository. Each bar represents the number of objects contained in the size range whose upper limit is shown on the X-axis. Fig. 16. The distribution of object sizes in the assembled repository. Each bar represents the number of objects contained in the size range whose upper limit is shown on the X-axis. party tools were used to check hypertext links for correctness. Pages were extensively tested to make sure that they worked as designed. The publication system used two stages. The first stage consisted of development, staging, and quality assurance. The second stage consisted of production. Real-time results were fed directly to the production server, and quality assurance for such pages was handled after the pages were published. As the site grew in size over the course of the event, such changes were only done at night. Figures 15 and 16 characterize the object size distributions within two of the disk-based caches used in the publishing system. Figure 17 shows the distribution of the number of incoming edges for ODG nodes. This graph represents the number of top level fragments embedded in the object corresponding to the node. The total number of fragments embedded in the object may be higher due to the fact that a fragment may recursively embed another fragment. Figure 18 shows the distribution of the number of outgoing edges for ODG nodes. This graph represents the number of objects that

Web Caching With Dynamic Content Abstract When caching is a good idea

Web Caching With Dynamic Content Abstract When caching is a good idea Web Caching With Dynamic Content (only first 5 pages included for abstract submission) George Copeland - copeland@austin.ibm.com - (512) 838-0267 Matt McClain - mmcclain@austin.ibm.com - (512) 838-3675

More information

Hypercosm. Studio. www.hypercosm.com

Hypercosm. Studio. www.hypercosm.com Hypercosm Studio www.hypercosm.com Hypercosm Studio Guide 3 Revision: November 2005 Copyright 2005 Hypercosm LLC All rights reserved. Hypercosm, OMAR, Hypercosm 3D Player, and Hypercosm Studio are trademarks

More information

EXTENDING JMETER TO ALLOW FOR WEB STRUCTURE MINING

EXTENDING JMETER TO ALLOW FOR WEB STRUCTURE MINING EXTENDING JMETER TO ALLOW FOR WEB STRUCTURE MINING Agustín Sabater, Carlos Guerrero, Isaac Lera, Carlos Juiz Computer Science Department, University of the Balearic Islands, SPAIN pinyeiro@gmail.com, carlos.guerrero@uib.es,

More information

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper Connectivity Alliance Access 7.0 Database Recovery Information Paper Table of Contents Preface... 3 1 Overview... 4 2 Resiliency Concepts... 6 2.1 Database Loss Business Impact... 6 2.2 Database Recovery

More information

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper Connectivity Alliance 7.0 Recovery Information Paper Table of Contents Preface... 3 1 Overview... 4 2 Resiliency Concepts... 6 2.1 Loss Business Impact... 6 2.2 Recovery Tools... 8 3 Manual Recovery Method...

More information

WEB CONTENT MANAGEMENT PATTERNS

WEB CONTENT MANAGEMENT PATTERNS WEB CONTENT MANAGEMENT PATTERNS By YOAB GORFU RENEL SMITH ABE GUERRA KAY ODEYEMI May 2006 Version Date: 23-Apr-06-06:04 PM - 1 - Story TABLE OF CONTENTS Pattern: Separate it. How do you ensure the site

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Highly Available AMPS Client Programming

Highly Available AMPS Client Programming Highly Available AMPS Client Programming 60East Technologies Copyright 2013 All rights reserved. 60East, AMPS, and Advanced Message Processing System are trademarks of 60East Technologies, Inc. All other

More information

How WebSphere Caches Dynamic Content for High-Volume Web Sites

How WebSphere Caches Dynamic Content for High-Volume Web Sites How WebSphere Caches Dynamic Content for High-Volume Web Sites Authors: High-Volume Web Site Team Web address: ibm.com/websphere/developer/zones/hvws Management contact: Larry Hsiung larryh@us.ibm.com

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

SOA REFERENCE ARCHITECTURE: WEB TIER

SOA REFERENCE ARCHITECTURE: WEB TIER SOA REFERENCE ARCHITECTURE: WEB TIER SOA Blueprint A structured blog by Yogish Pai Web Application Tier The primary requirement for this tier is that all the business systems and solutions be accessible

More information

ZooKeeper. Table of contents

ZooKeeper. Table of contents by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...

More information

Rational Application Developer Performance Tips Introduction

Rational Application Developer Performance Tips Introduction Rational Application Developer Performance Tips Introduction This article contains a series of hints and tips that you can use to improve the performance of the Rational Application Developer. This article

More information

PIONEER RESEARCH & DEVELOPMENT GROUP

PIONEER RESEARCH & DEVELOPMENT GROUP SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for

More information

Building Your First Drupal 8 Company Site

Building Your First Drupal 8 Company Site Building Websites with Drupal: Learn from the Experts Article Series Building Your First Drupal 8 Company Site by Todd Tomlinson July, 2014 Unicon is a Registered Trademark of Unicon, Inc. All other product

More information

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

Inmagic Content Server Workgroup Configuration Technical Guidelines

Inmagic Content Server Workgroup Configuration Technical Guidelines Inmagic Content Server Workgroup Configuration Technical Guidelines 6/2005 Page 1 of 12 Inmagic Content Server Workgroup Configuration Technical Guidelines Last Updated: June, 2005 Inmagic, Inc. All rights

More information

Load Manager Administrator s Guide For other guides in this document set, go to the Document Center

Load Manager Administrator s Guide For other guides in this document set, go to the Document Center Load Manager Administrator s Guide For other guides in this document set, go to the Document Center Load Manager for Citrix Presentation Server Citrix Presentation Server 4.5 for Windows Citrix Access

More information

Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys

Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph Client: Brian Krzys June 17, 2014 Introduction Newmont Mining is a resource extraction company with a research and development

More information

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Front-End Performance Testing and Optimization

Front-End Performance Testing and Optimization Front-End Performance Testing and Optimization Abstract Today, web user turnaround starts from more than 3 seconds of response time. This demands performance optimization on all application levels. Client

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

Skynax. Mobility Management System. System Manual

Skynax. Mobility Management System. System Manual Skynax Mobility Management System System Manual Intermec by Honeywell 6001 36th Ave. W. Everett, WA 98203 U.S.A. www.intermec.com The information contained herein is provided solely for the purpose of

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Registry Tuner. Software Manual

Registry Tuner. Software Manual Registry Tuner Software Manual Table of Contents Introduction 1 System Requirements 2 Frequently Asked Questions 3 Using the Lavasoft Registry Tuner 5 Scan and Fix Registry Errors 7 Optimize Registry

More information

Change Management for Rational DOORS User s Guide

Change Management for Rational DOORS User s Guide Change Management for Rational DOORS User s Guide Before using this information, read the general information under Appendix: Notices on page 58. This edition applies to Change Management for Rational

More information

Enterprise Mobile Application Development: Native or Hybrid?

Enterprise Mobile Application Development: Native or Hybrid? Enterprise Mobile Application Development: Native or Hybrid? Enterprise Mobile Application Development: Native or Hybrid? SevenTablets 855-285-2322 Contact@SevenTablets.com http://www.seventablets.com

More information

Scaling Web Applications in a Cloud Environment using Resin 4.0

Scaling Web Applications in a Cloud Environment using Resin 4.0 Scaling Web Applications in a Cloud Environment using Resin 4.0 Abstract Resin 4.0 offers unprecedented support for deploying and scaling Java and PHP web applications in a cloud environment. This paper

More information

Improve application performance and scalability with Adobe ColdFusion 9

Improve application performance and scalability with Adobe ColdFusion 9 Adobe ColdFusion 9 Performance Brief Improve application performance and scalability with Adobe ColdFusion 9 Table of contents 1: Executive summary 2: Statistics summary 3: Existing features 7: New features

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Unicenter Desktop DNA r11

Unicenter Desktop DNA r11 Data Sheet Unicenter Desktop DNA r11 Unicenter Desktop DNA is a scalable migration solution for the management, movement and maintenance of a PC s DNA (including user settings, preferences and data.) A

More information

Ligero Content Delivery Server. Documentum Content Integration with

Ligero Content Delivery Server. Documentum Content Integration with Ligero Content Delivery Server Documentum Content Integration with Ligero Content Delivery Server Prepared By Lee Dallas Principal Consultant Armedia, LLC April, 2008 1 Summary Ligero Content Delivery

More information

q for Gods Whitepaper Series (Edition 7) Common Design Principles for kdb+ Gateways

q for Gods Whitepaper Series (Edition 7) Common Design Principles for kdb+ Gateways Series (Edition 7) Common Design Principles for kdb+ Gateways May 2013 Author: Michael McClintock joined First Derivatives in 2009 and has worked as a consultant on a range of kdb+ applications for hedge

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform

More information

Running a Workflow on a PowerCenter Grid

Running a Workflow on a PowerCenter Grid Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

An Introduction To The Web File Manager

An Introduction To The Web File Manager An Introduction To The Web File Manager When clients need to use a Web browser to access your FTP site, use the Web File Manager to provide a more reliable, consistent, and inviting interface. Popular

More information

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.

More information

Novell ZENworks 10 Configuration Management SP3

Novell ZENworks 10 Configuration Management SP3 AUTHORIZED DOCUMENTATION Software Distribution Reference Novell ZENworks 10 Configuration Management SP3 10.3 November 17, 2011 www.novell.com Legal Notices Novell, Inc., makes no representations or warranties

More information

5.1 Features 1.877.204.6679. sales@fourwindsinteractive.com Denver CO 80202

5.1 Features 1.877.204.6679. sales@fourwindsinteractive.com Denver CO 80202 1.877.204.6679 www.fourwindsinteractive.com 3012 Huron Street sales@fourwindsinteractive.com Denver CO 80202 5.1 Features Copyright 2014 Four Winds Interactive LLC. All rights reserved. All documentation

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0 NS DISCOVER 4.0 ADMINISTRATOR S GUIDE July, 2015 Version 4.0 TABLE OF CONTENTS 1 General Information... 4 1.1 Objective... 4 1.2 New 4.0 Features Improvements... 4 1.3 Migrating from 3.x to 4.x... 5 2

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems Finding a needle in Haystack: Facebook

More information

Designing portal site structure and page layout using IBM Rational Application Developer V7 Part of a series on portal and portlet development

Designing portal site structure and page layout using IBM Rational Application Developer V7 Part of a series on portal and portlet development Designing portal site structure and page layout using IBM Rational Application Developer V7 Part of a series on portal and portlet development By Kenji Uchida Software Engineer IBM Corporation Level: Intermediate

More information

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011 the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful

More information

Content Management Implementation Guide 5.3 SP1

Content Management Implementation Guide 5.3 SP1 SDL Tridion R5 Content Management Implementation Guide 5.3 SP1 Read this document to implement and learn about the following Content Manager features: Publications Blueprint Publication structure Users

More information

Design and Performance of a Web Server Accelerator

Design and Performance of a Web Server Accelerator Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias IBM Research T. J. Watson Research Center P. O. Box 74 Yorktown Heights, NY 598 Abstract

More information

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip Load testing with WAPT: Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. A brief insight is provided

More information

Patterns of Information Management

Patterns of Information Management PATTERNS OF MANAGEMENT Patterns of Information Management Making the right choices for your organization s information Summary of Patterns Mandy Chessell and Harald Smith Copyright 2011, 2012 by Mandy

More information

Original-page small file oriented EXT3 file storage system

Original-page small file oriented EXT3 file storage system Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn

More information

DnsCluster: A networking tool for automatic domain zone updating

DnsCluster: A networking tool for automatic domain zone updating DnsCluster: A networking tool for automatic domain zone updating Charalambos Alatas and Constantinos S. Hilas * Dept. of Informatics and Communications Technological Educational Institute of Serres Serres,

More information

F9 Integration Manager

F9 Integration Manager F9 Integration Manager User Guide for use with QuickBooks This guide outlines the integration steps and processes supported for the purposes of financial reporting with F9 Professional and F9 Integration

More information

WHITE PAPER: DATA PROTECTION. Veritas NetBackup for Microsoft Exchange Server Solution Guide. Bill Roth January 2008

WHITE PAPER: DATA PROTECTION. Veritas NetBackup for Microsoft Exchange Server Solution Guide. Bill Roth January 2008 WHITE PAPER: DATA PROTECTION Veritas NetBackup for Microsoft Exchange Server Solution Guide Bill Roth January 2008 White Paper: Veritas NetBackup for Microsoft Exchange Server Solution Guide Content 1.

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

CentreWare Internet Services Setup and User Guide. Version 2.0

CentreWare Internet Services Setup and User Guide. Version 2.0 CentreWare Internet Services Setup and User Guide Version 2.0 Xerox Corporation Copyright 1999 by Xerox Corporation. All rights reserved. XEROX, The Document Company, the digital X logo, CentreWare, and

More information

Business Continuity: Choosing the Right Technology Solution

Business Continuity: Choosing the Right Technology Solution Business Continuity: Choosing the Right Technology Solution Table of Contents Introduction 3 What are the Options? 3 How to Assess Solutions 6 What to Look for in a Solution 8 Final Thoughts 9 About Neverfail

More information

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure TIBCO Use Cases How in-memory computing supercharges your infrastructure is a great solution for lifting the burden of big data, reducing reliance on costly transactional systems, and building highly scalable,

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

How To Backup A Database In Navision

How To Backup A Database In Navision Making Database Backups in Microsoft Business Solutions Navision MAKING DATABASE BACKUPS IN MICROSOFT BUSINESS SOLUTIONS NAVISION DISCLAIMER This material is for informational purposes only. Microsoft

More information

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.3 An Oracle White Paper January, 2009 Maximum Availability Architecture Oracle Best Practices For High Availability Backup and Recovery Scenarios

More information

Case Management for Blaise using Lotus Notes. Fred Wensing, Australian Bureau of Statistics Brett Martin, Statistics New Zealand

Case Management for Blaise using Lotus Notes. Fred Wensing, Australian Bureau of Statistics Brett Martin, Statistics New Zealand Case Management for Blaise using Lotus Notes Fred Wensing, Australian Bureau of Statistics Brett Martin, Statistics New Zealand Introduction A significant aspect of field based interviewing is the need

More information

Getting started with API testing

Getting started with API testing Technical white paper Getting started with API testing Test all layers of your composite applications, not just the GUI Table of contents Executive summary... 3 Introduction... 3 Who should read this document?...

More information

Avaya P330 Load Balancing Manager User Guide

Avaya P330 Load Balancing Manager User Guide Avaya P330 Load Balancing Manager User Guide March 2002 Avaya P330 Load Balancing Manager User Guide Copyright 2002 Avaya Inc. ALL RIGHTS RESERVED The products, specifications, and other technical information

More information

Vector HelpDesk - Administrator s Guide

Vector HelpDesk - Administrator s Guide Vector HelpDesk - Administrator s Guide Vector HelpDesk - Administrator s Guide Configuring and Maintaining Vector HelpDesk version 5.6 Vector HelpDesk - Administrator s Guide Copyright Vector Networks

More information

Web-based Reporting and Tools used in the QA process for the SAS System Software

Web-based Reporting and Tools used in the QA process for the SAS System Software Web-based Reporting and Tools used in the QA process for the SAS System Software Jim McNealy, SAS Institute Inc., Cary, NC Dawn Amos, SAS Institute Inc., Cary, NC Abstract SAS Institute Quality Assurance

More information

orrelog Ping Monitor Adapter Software Users Manual

orrelog Ping Monitor Adapter Software Users Manual orrelog Ping Monitor Adapter Software Users Manual http://www.correlog.com mailto:info@correlog.com CorreLog, Ping Monitor Users Manual Copyright 2008-2015, CorreLog, Inc. All rights reserved. No part

More information

Solutions for Disaster Recovery Using Grass Valley Integrated Playout Systems

Solutions for Disaster Recovery Using Grass Valley Integrated Playout Systems APPLICATION NOTE Solutions for Disaster Recovery Using Grass Valley Integrated Playout Systems Alex Lakey and Guido Bozward February 2012 In mission-critical broadcast playout environments, the best practice

More information

A Tool for Evaluation and Optimization of Web Application Performance

A Tool for Evaluation and Optimization of Web Application Performance A Tool for Evaluation and Optimization of Web Application Performance Tomáš Černý 1 cernyto3@fel.cvut.cz Michael J. Donahoo 2 jeff_donahoo@baylor.edu Abstract: One of the main goals of web application

More information

The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments

The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments New flexible technologies enable quick and easy recovery of data to any point in time. Introduction Downtime and data loss

More information

Symantec Mail Security for Domino

Symantec Mail Security for Domino Getting Started Symantec Mail Security for Domino About Symantec Mail Security for Domino Symantec Mail Security for Domino is a complete, customizable, and scalable solution that scans Lotus Notes database

More information

New Generation of Software Development

New Generation of Software Development New Generation of Software Development Terry Hon University of British Columbia 201-2366 Main Mall Vancouver B.C. V6T 1Z4 tyehon@cs.ubc.ca ABSTRACT In this paper, I present a picture of what software development

More information

Google Sites: Creating, editing, and sharing a site

Google Sites: Creating, editing, and sharing a site Google Sites: Creating, editing, and sharing a site Google Sites is an application that makes building a website for your organization as easy as editing a document. With Google Sites, teams can quickly

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

NetApp Big Content Solutions: Agile Infrastructure for Big Data

NetApp Big Content Solutions: Agile Infrastructure for Big Data White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data

More information

EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS

EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS This white paper describes the various distributed architectures supported by EMC Documentum and the relative merits and demerits of each model. It can be used

More information

In-memory Tables Technology overview and solutions

In-memory Tables Technology overview and solutions In-memory Tables Technology overview and solutions My mainframe is my business. My business relies on MIPS. Verna Bartlett Head of Marketing Gary Weinhold Systems Analyst Agenda Introduction to in-memory

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

This paper defines as "Classical"

This paper defines as Classical Principles of Transactional Approach in the Classical Web-based Systems and the Cloud Computing Systems - Comparative Analysis Vanya Lazarova * Summary: This article presents a comparative analysis of

More information

CHAPTER 17: File Management

CHAPTER 17: File Management CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

VERITAS Cluster Server v2.0 Technical Overview

VERITAS Cluster Server v2.0 Technical Overview VERITAS Cluster Server v2.0 Technical Overview V E R I T A S W H I T E P A P E R Table of Contents Executive Overview............................................................................1 Why VERITAS

More information

Model Simulation in Rational Software Architect: Business Process Simulation

Model Simulation in Rational Software Architect: Business Process Simulation Model Simulation in Rational Software Architect: Business Process Simulation Mattias Mohlin Senior Software Architect IBM The BPMN (Business Process Model and Notation) is the industry standard notation

More information

1 An application in BPC: a Web-Server

1 An application in BPC: a Web-Server 1 An application in BPC: a Web-Server We briefly describe our web-server case-study, dwelling in particular on some of the more advanced features of the BPC framework, such as timeouts, parametrized events,

More information

Multimedia Applications. Mono-media Document Example: Hypertext. Multimedia Documents

Multimedia Applications. Mono-media Document Example: Hypertext. Multimedia Documents Multimedia Applications Chapter 2: Basics Chapter 3: Multimedia Systems Communication Aspects and Services Chapter 4: Multimedia Systems Storage Aspects Chapter 5: Multimedia Usage and Applications Documents

More information

Addressing Mobile Load Testing Challenges. A Neotys White Paper

Addressing Mobile Load Testing Challenges. A Neotys White Paper Addressing Mobile Load Testing Challenges A Neotys White Paper Contents Introduction... 3 Mobile load testing basics... 3 Recording mobile load testing scenarios... 4 Recording tests for native apps...

More information

Tableau Server Scalability Explained

Tableau Server Scalability Explained Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted

More information

Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure

Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure Server Manager Diagnostics Page 653. Information. Audit Success. Audit Failure The view shows the total number of events in the last hour, 24 hours, 7 days, and the total. Each of these nodes can be expanded

More information

2) What is the structure of an organization? Explain how IT support at different organizational levels.

2) What is the structure of an organization? Explain how IT support at different organizational levels. (PGDIT 01) Paper - I : BASICS OF INFORMATION TECHNOLOGY 1) What is an information technology? Why you need to know about IT. 2) What is the structure of an organization? Explain how IT support at different

More information

Chapter 1 - Web Server Management and Cluster Topology

Chapter 1 - Web Server Management and Cluster Topology Objectives At the end of this chapter, participants will be able to understand: Web server management options provided by Network Deployment Clustered Application Servers Cluster creation and management

More information

PTC Integrity Eclipse and IBM Rational Development Platform Guide

PTC Integrity Eclipse and IBM Rational Development Platform Guide PTC Integrity Eclipse and IBM Rational Development Platform Guide The PTC Integrity integration with Eclipse Platform and the IBM Rational Software Development Platform series allows you to access Integrity

More information

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper. www.parallels.

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper. www.parallels. Parallels Cloud Server White Paper An Introduction to Operating System Virtualization and Parallels Cloud Server www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3 Operating

More information

SharePoint Server 2010 Capacity Management: Software Boundaries and Limits

SharePoint Server 2010 Capacity Management: Software Boundaries and Limits SharePoint Server 2010 Capacity Management: Software Boundaries and s This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Step into the Future: HTML5 and its Impact on SSL VPNs

Step into the Future: HTML5 and its Impact on SSL VPNs Step into the Future: HTML5 and its Impact on SSL VPNs Aidan Gogarty HOB, Inc. Session ID: SPO - 302 Session Classification: General Interest What this is all about. All about HTML5 3 useful components

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information