How To Write A Structured Authoring



Similar documents
Enterprise Content Management (ECM) Strategy

Migrating from Unstructured to Structured FrameMaker

Authoring Within a Content Management System. The Content Management Story

Introduction to XML Applications

How To Create A Content Management System

Terms and Definitions for CMS Administrators, Architects, and Developers

Structured Authoring: A First Step to Content Management

Working With Templates in Web Publisher. Contributed by Paul O Mahony Developer Program

Data Integration through XML/XSLT. Presenter: Xin Gu

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Managing Life Sciences Content: A Unified Content Strategy

The Intelligent Content Framework

XML and the College Website A Practical Look at the Use of XML and XSL

As the old adage goes, Work smarter, not harder. These days, technical writers and Help authors face perpetual

Studio. Rapid Single-Source Content Development. Author XYLEME STUDIO DATA SHEET

Ektron to EPiServer Digital Experience Cloud: Information Architecture

zen Platform technical white paper

Creating an EAD Finding Aid. Nicole Wilkins. SJSU School of Library and Information Science. Libr 281. Professor Mary Bolin.

MODULE 7: TECHNOLOGY OVERVIEW. Module Overview. Objectives

11 ways to migrate Lotus Notes applications to SharePoint and Office 365

ART 379 Web Design. HTML, XHTML & CSS: Introduction, 1-2

Formatting with FrameMaker + SGML s EDD

Lightweight Data Integration using the WebComposition Data Grid Service

DITA Adoption Process: Roles, Responsibilities, and Skills

Increasing Development Knowledge with EPFC

Mission Possible: Move to a Content Management System to Deliver Business Results from Legacy Content

Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access

Chapter 19: XML. Working with XML. About XML

The Spectrum of Data Integration Solutions: Why You Should Have Them All

Agents and Web Services

Strategic Management of Learning Assets

An XML Based Data Exchange Model for Power System Studies

Protecting Business Information With A SharePoint Data Governance Model. TITUS White Paper

Overview Document Framework Version 1.0 December 12, 2005

The Hitchhiker s Guide to XML Authoring

Document Management. Introduction. CAE DS Product data management, document data management systems and concurrent engineering


Introduction to Web Design Curriculum Sample

Common Questions and Concerns About Documentum at NEF

Experiences with an XML topic architecture (DITA)

Content Management Using Rational Unified Process Part 1: Content Management Defined

Automating Rich Internet Application Development for Enterprise Web 2.0 and SOA

XBRL Processor Interstage XWand and Its Application Programs

DTD Tutorial. About the tutorial. Tutorial

Guiding Principles that work Ruel L.A. Ellis

Predicting Cost Savings Backing Up Your Claims

MadCap Software. Import Guide. Flare 11

The Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform

File Formats. Summary

Purpose What is EDI X EDI X12 standards and releases Trading Partner Requirements EDI X12 Dissected... 3

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

October 2007 Jeff Deskins JustSystems Inc.

Introduction to Web Services

Center for Faculty Development and Support. OU Campus Faculty Website Guide

XML: extensible Markup Language. Anabel Fraga

About XML in InDesign

Oct 15, Internet : the vast collection of interconnected networks that all use the TCP/IP protocols

Overview of DatadiagramML

Content Management: Whose Job Is It Anyway?

7 GOOD REASONS FOR GENUINE DIGITAL ASSET MANAGEMENT

Windchill Service Information Manager Curriculum Guide

Web Design Foundations ( )

Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting

The Web Web page Links 16-3

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

Getting Started in Arbortext and Documentum. Created by Michelle Partridge Doerr April 21, 2009

Translation and Localization Services

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

XML WEB TECHNOLOGIES

Lou Burnard Consulting

OpenOffice.org Writer

Business Insight Report Authoring Getting Started Guide

Dynamic Publishing Software

A Workbench for Prototyping XML Data Exchange (extended abstract)

Web Design Specialist

Windchill PDMLink Curriculum Guide

Basic Trends of Modern Software Development

XML. CIS-3152, Spring 2013 Peter C. Chapin

Vector HelpDesk - Administrator s Guide

Structured Content: the Key to Agile. Web Experience Management. Introduction

Figure 1 - BI Publisher Enterprise Capabilities. OAUG Collaborate 08 Page 2 Copyright 2008 by Lee Briggs

Day 1 - Technology Introduction & Digital Asset Management

Content Management Implementation Guide 5.3 SP1

Streamlining the drug development lifecycle with Adobe LiveCycle enterprise solutions

Transforming Information Silos into Shareable Assets through Automated Content Conversion

Content Management Using the Rational Unified Process By: Michael McIntosh

Deconstructing the Editorial and Production Workflow Bill Kasdorf

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

EFFECTIVE STORAGE OF XBRL DOCUMENTS

A Guide To Evaluating a Bug Tracking System

Change Management for XML, in XML

A white paper discussing the advantages of Digital Mailrooms

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY

Chapter 2. Fundamental concepts of reuse

XML Processing and Web Services. Chapter 17

Overview of sharing and collaborating on Excel data

Participant Guide RP301: Ad Hoc Business Intelligence Reporting

DOCUMENTS ON WEB OBJECTIVE QUESTIONS

What's New In DITA CMS 4.0

Transcription:

Structured authoring and XML WHITE PAPER Sarah O Keefe President Structured authoring and XML represent a significant paradigm shift in content creation. Implementing structured authoring with XML allows organizations to enforce content organization requirements. The addition of hierarchy and metadata to content improves reuse and content management. These benefits, however, must be weighed against the effort required to implement a structured authoring approach. The business case is compelling for larger writing organizations; they will be the first to adopt structured authoring. Over time, improvements in available tools will reduce the cost of implementing structured authoring and make it affordable for smaller organizations. What is structured authoring? Structured authoring is a publishing workflow that lets you define and enforce consistent organization of information in documents, whether printed or online. In traditional publishing, content rules are captured in a style guide and enforced by (human) editors, who read the information and verify that it conforms to the approved style. A few simple examples of content rules are as follows: A heading must be followed by an introductory paragraph. A bulleted list must contain at least two items. A graphic must have a caption. In structured authoring, a file either a document type definition (DTD) or a schema captures these content rules. Authors work in software that validates their documents; the software verifies that the documents they create conform to the rules in the definition file. Consider, for example, a simple structured document a recipe. A typical recipe requires several components: a name, a list of ingredients, and instructions. The style guide for a particular cookbook states that the list of ingredients should always precede the instructions. In an unstructured Copyright 2009, Scriptorium Publishing Services, Inc.

2 authoring environment, the cookbook editor must review the recipes to ensure that the author has complied with the style guideline. In a structured environment, the recipe structure requires and enforces the specified organization. Elements and hierarchy Structured authoring is based on elements. An element is a unit of content; it can contain text or other elements. You can view the hierarchy of elements inside other elements as a set of nodes and branches. Elements can be organized in hierarchical trees. In a recipe, the ingredient list can be broken down into ingredients, which in turn contain items, quantities, and preparation methods, as shown in Figure 1. Recipe Name List Instructions Figure 1: Recipe hierarchy Preparation The element hierarchy allows you to associate related information explicitly. The structure specifies that the List element is a child of the Recipe element. The List element contains elements, and each element contains two or three child elements (,, and optionally Preparation). In an unstructured, formatted document, these relationships are implied by the typography, but unstructured publishing software (a word processor or desktop publishing application) does not capture the actual relationship. In structured documents, the following terms denote hierarchical relationships: Tree The hierarchical order of elements. Branch A section of the hierarchical tree. Copyright 2009, Scriptorium Publishing Services, Inc.

3 Leaf An element with no descendant elements. Name, for example, is a leaf element in Figure 1. Parent/child A child element is one level lower in the hierarchy than its parent. In Figure 1, Name, List, and Instructions are all children of Recipe. Conversely, Recipe is the parent of Name, List, and Instructions. Sibling Elements are siblings when they are at the same level in the hierarchy and have the same parent element.,, and Preparation are siblings. Element attributes You can store additional information about the elements in attributes. An attribute is a namevalue pair that is associated with a particular element. In the recipe example, attributes might be used in the top-level Recipe element to provide additional information about the recipe, such as the author and cuisine type (Figure 2). Recipe Author = "John Doe" Cuisine = "American" Figure 2: Attributes capture additional information about an element Attributes provide a way of further classifying information. If each recipe has a cuisine assigned, you could easily locate all Greek recipes by searching for the attribute. Without attributes, this information would not be available in the document. To sort recipes by cuisine in an unstructured document, a cook would need to read each recipe. Formatting structured documents To format structured documents, you associate formatting with particular elements or element sequences. Such formatting is usually highly automated; once an author assigns elements to content, the formatting is implemented automatically to create the final output files. What is XML? Extensible Markup Language (XML) defines a standard for storing structured content in text files. The standard is maintained by the World Wide Web Consortium (W3C). 1 XML is closely related to other markup languages, such as Standard Generalized Markup Language (SGML). Implementing SGML is an enormous undertaking. Because of this complexity, SGML s acceptance has been limited to industries producing large volumes of highly structured information (for example, aerospace, telecommunications, and government). XML is a simplified form of SGML that s designed to be easier to implement. 2 As a result, XML is attractive to many industries that create technical documents (including parts catalogs, training manuals, reports, and user guides). 1. Detailed information: http://www.w3.org/xml/ 2. SGML vs. XML details: http://www.w3.org/tr/note-sgml-xml-971215 scriptorium

4 XML syntax XML is a markup language, which means that content is enclosed by tags. In XML, element tags are enclosed in angle brackets: <element>this is element text.</element> A closing tag is indicated by a forward slash in front of the element name. Attributes are stored inside the element tags: <element my_attribute="my_value">this is element text.</element> XML does not provide a set of predefined tags. Instead, you define your own tags and the relationships among the tags. This makes it possible to define and implement a content structure that matches the requirements of your information. Figure 3 shows an XML file that contains a recipe. <Recipe Cuisine = "Italian" Author = "Unknown"> <Name>Marinara Sauce</Name> <List> <> <>2 tbsp.</> <>olive oil</> </> <> <>2 cloves</> <>garlic</> <Preparation>minced</Preparation> </> <> <>1/2 tsp.</> <>hot red pepper</> </> <> <>28 oz.</> <>canned tomatoes, preferably San Marzano</> </> <> <>2 tbsp.</> <>parsley</> <Preparation>chopped</Preparation> </> </List> <Instructions> <Para>Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer on medium-low heat for at least 20 minutes. Add parsley, Figure 3: A recipe in XML Copyright 2009, Scriptorium Publishing Services, Inc.

5 XML is said to be well-formed when basic tagging rules are followed. For example: All opening elements have a corresponding closing element, and empty elements use a terminating slash: <element>this element has content</element> <empty_element /> Attribute information is enclosed in double quotes: <element attribute="name">this is a legal attribute</element> <element attribute=name>this is not well-formed.</element> Tags are nested and do not cross over each other <element>this is <strong>correct.</strong></element> <element>this is<strong>not correct.</element></strong> XML is said to be valid when the structure of the XML matches the structure specified in the structure definition. When the structure does not match, the XML file is invalid (Figure 4). Recipe Name Instructions List Structure is invalid because the List element is required before the Instructions element. Preparation Figure 4: Invalid structure Entities An XML entity is a placeholder. Entities allow you to reuse information; for example, you could define an entity for a copyright statement: <!ENTITY copyright "Copyright 2008 Scriptorium Publishing Services, Inc. All rights reserved."> To reference the entity, you refer to the entity name: &copyright; scriptorium

6 The entity text is displayed instead of the entity name: Copyright 2008 Scriptorium Publishing Services, Inc. All rights reserved. Storing common information in entities lets you make a change in one location (the entity definition) and have the change show up everywhere that references the entity. Entities are also used to include information that can t be easily rendered as text. Graphics, for example, can be referenced as entities. In the following example, the entity definition contains the entity name, graphic file name, and file type: <!ENTITY my_image SYSTEM "image.gif" NDATA gif> In the XML file, a Graphic element references this entity: <Graphic entity = "my_image" /> How are XML and structured authoring related? Structured authoring is a concept. XML is a specification that lets you implement structured authoring using plain text files. In the past, most structured authoring implementations were based on SGML; today, XML is the standard. The terms XML and structured authoring are often used almost interchangeably. Unlike SGML, XML is widely used outside the technical publishing world, especially for data interchange and web services applications. Defining structure in XML In XML, you define your structure using either a DTD or schema. In either case, you specify elements and how they are related to each other. For example, a Recipe element definition might read as follows in a DTD: <!ELEMENT Recipe (Name, History?, List, Instructions)> In an XML schema, the definition is itself an XML document. For the Recipe element, a simplified Recipe definition would read as follows: <xsd:complextype name="recipe"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="history" type="xsd:string" minoccurs="0" maxoccurs="1">/> <xsd:element name="list" type="xsd:string"/> <xsd:element name="instructions" type="xsd:string"/> </xsd:sequence> <xsd:complextype> Once you define the structure, authors create documents that comply with the structure. At a bare minimum, this allows you to specify, for instance, that the list of ingredients in a recipe must occur before the instructions. Schema are especially useful in XML-based programming applications, where they allow you to validate and restrict data inside the structure. DTDs are more common in publishing applications, partly because of the legacy with SGML. In long, technical documents that consist mostly of paragraphs, the validation provided by schema does not add a significant amount of value. Copyright 2009, Scriptorium Publishing Services, Inc.

7 The impact of structured authoring on a publishing workflow Thirty years ago, technical writers began to make the transition from typewriters to computerbased writing. Initially, authors stored text in word-processing files, but formatting was done in a separate typesetting operation (Figure 5). Text is stored in a file but does not have any formatting associated with it. Typesetting Unformatted text Print-ready galleys Figure 5: First-generation word-processing workflow Next, the transition from dedicated word processing equipment to personal computers led to word processing software with the added ability to control formatting with embedded formatting codes. Authors learned how to write and format their documents (Figure 6). [font Arial] [size 12 points] Formatting codes let you embed formatting instructions [bold] directly [bold] in the document.[hrt] Text with formatting codes Figure 6: Codes let you embed formatting information in a document Formatting codes were soon grouped into paragraph styles or tags. Instead of specifying font, font size, alignment, and the like, the author specified a style code, which contained a group of formatting settings (Figure 7). [style = Heading1] Understanding styles [style = Normal] Paragraph styles let you store formatting settings as a group. Text with styles Figure 7: Paragraph styles reference a style sheet instead of encoding style explicitly for each paragraph With paragraph style sheets, a template designer could define the look and feel of documents for an entire workgroup by setting up a formatting template. In some environments, templates are enforced strictly; in others, individual authors are allowed to customize formatting to suit their document and their personal preferences. When formatting and content development are separated, this special formatting becomes impossible. scriptorium

8 In a structured authoring environment, authors create documents by assembling elements and text in an order permitted by the structure definition document (Figure 8). You might think of structured authoring as being similar to template-based authoring with a strict template. Authors do not assign formatting; formatting is automatically assigned based on the structure of the document. Each output format has its own formatting specification. Recipe Recipe Recipe Name Marinara sauce Name Marinara sauce List garlic 2 cloves Figure 8: Structured authoring from the author s point of view Copyright 2009, Scriptorium Publishing Services, Inc.

9 Changing perceptions XML and structured authoring result in a completely different way of looking at information. Instead of the familiar page- and paragraph-based metaphor, structured authoring requires that authors consider information as a hierarchy with a separate formatting layer (Figure 9). Marinara Sauce 2 tbsp. olive oil 2 cloves garlic, minced 1/2 tsp. hot red pepper 28 oz. canned tomatoes 2 tbsp. parsley, chopped Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer Marinara Sauce on medium-low heat for at least 2 tbsp. olive oil 20 minutes. Add parsley, simmer 2 cloves garlic, minced for another five minutes. Serve 1/2 tsp. hot red pepper over long pasta. 28 oz. canned tomatoes 2 tbsp. parsley, chopped Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer on medium-low heat for at least 20 minutes. Add parsley, simmer for another five minutes. Serve over long pasta. Figure 9: Representing a document as a series of layers A document s formatting can imply a certain structure for example, a large, sans-serif font often indicates an important heading but unstructured files do not describe how paragraphs are related to each other (Figure 10). XML makes it possible to encode structure into a document explicitly (Figure 11). Large, sans-serif font at the top of the page indicates a heading. Marinara Sauce 2 tbsp. olive oil 2 cloves garlic, minced 1/2 tsp. hot red pepper 28 oz. canned tomatoes 2 tbsp. parsley, chopped Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer on medium-low heat for at least 20 minutes. Add parsley, simmer for another five minutes. Serve over long pasta. Indent and space above/below list visually groups the ingredient list into a unit. Figure 10: Formatting can imply structure scriptorium

10 The List element groups the elements. Figure 11: XML captures structure explicitly Adding metadata to documents Metadata is information that describes or classifies other information. A word-processing document usually contains basic metadata, such as the document s title, author, and keywords. Structured authoring supports metadata with elements and attributes. Element names themselves can provide metadata; for example, naming elements GlossaryTerm and GlossaryDefinition encapsulates a lot of information about the elements content. Attributes provide a way to label elements with additional information. Once the attributes are set up, you can then include, exclude, or process information based on the value of the attributes. In structured authoring, you can assign metadata to elements in a document. With metadata, you label information with identifiers, such as: Version User level Revision date Author <Recipe Cuisine = "Italian" Author = "Unknown"> <Name>Marinara Sauce</Name> <List> <> <>2 tbsp.</> <>olive oil</> </> <> The connections among, <>2 cloves</>, and Preparation are <>garlic</> <Preparation>minced</Preparation> captured because they are </> children of the element. <> <>1/2 tsp.</> <>hot red pepper</> </> <> <>28 oz.</> <>canned tomatoes, preferably San Marzano</> </> <> <>2 tbsp.</> <>parsley</> <PrepMethod>chopped</PrepMethod> </> </List> <Instructions> <Para>Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer on medium-low heat for at least 20 minutes. Add parsley, simmer for another five minutes. Serve over long pasta. </Para> </Instructions> </Recipe> Element attributes give you much finer control over metadata than the basic file-level information you can store in word-processing documents. Copyright 2009, Scriptorium Publishing Services, Inc.

11 Workflow options XML and structured authoring do not provide an actual workflow; they must be incorporated into a complete workflow. Before establishing a structured publishing workflow, you should consider the following tasks: Defining content sources Establishing content repositories Implementing content reuse Delivering formatted output Defining content sources It s important to examine existing content to establish how it is currently developed and how it will benefit from structure. Other questions include the following: Can all of the content be stored in a single location? Is it necessary to keep different versions of the same content, or can information come from a single source? Who develops the content? How often is it updated? Are there dependencies between different sets of content? The following table shows a simplified audit of content: Information product Current tool Benefit from structure? Dependencies Updated? User guide FrameMaker Yes Twice a year Training manuals PowerPoint Yes Uses info from user guides Online help Dreamweaver Yes Uses info from user guides Quarterly Monthly Release notes Word No No Marketing white papers Word No No Establishing content repositories A content repository or database is not required to work in XML. However, a content repository makes it possible to manage content modules, which allows you to do the following: Search content by elements and attribute Locate content created by a specific author Locate content by topic scriptorium

12 Identify content chunks that are being used in multiple locations Extract chunks that match certain criteria XML works very well with content repositories; as a text format, XML is easier to manage than the proprietary binary formats of word-processing programs. Structured authoring improves consistency across documents. This makes it easier to manage them in a content repository. Content can be automatically chunked at specified element levels, which makes content reuse easier (Figure 12). Extract chunks based on search criteria Assemble chunks into documents Content repository Store updated chunks Decompose edited document into chunks Figure 12: Structured authoring with a content repository Structured document, ready for publishing Implementing content reuse Content reuse, or single sourcing, doesn t require an XML-based workflow. In XML, though, it s easier to enforce the consistency that s required to make content reuse work. Content reuse means that you develop a particular chunk of information once, and then use it wherever it s needed. Reuse can occur across media for example, a chunk of content is used in both the printed manual and in the online help for a product. In other cases, you might write a chunk of information that s needed in several different printed books. Reusing that chunk minimizes maintenance and ensures consistency across all of the information products (Figure 13). Document 1 Document 2 Module A is used in both documents. Module B is used only in Document 2. Module C is used in both documents. A B C Figure 13: Content reuse minimizes the total amount of information being developed Delivering formatted output Structured authoring separates structure and formatting; this provides both the greatest advantage and the greatest challenge in the structured environment. Copyright 2009, Scriptorium Publishing Services, Inc.

13 Authors are accustomed to working in a visual environment. Requiring them to work solely with tags is impractical, yet providing an approximation of the final output s appearance will bias them toward a particular medium. For example, an authoring environment that looks something like a printed page makes it more difficult to consider how content will function in an online help format. Figure 14 outlines how structured information is processed to produce the final deliverable documents in specific formats. Instead of working with commercial authoring applications, some XML developers prefer to use open-source tools. While it is possible to create print and PDF output from open-source solutions, it is generally easier to do so through commercial tools (which tend to produce higherquality output). Recipe Name List Instructions Marinara Sauce Preparation Para Preparation Preparation olive oil 2 tbsp. garlic 2 cloves minced hot red pepper 1/2 tsp. canned tomatoes 28 oz. parsley minced 2 tbsp. chopped Print transformation to paragraph styles Name: Put text in Heading1 paragraph tag. : Put in Indent paragraph tag. Put first, then. If there is a Preparation element, insert a comma, then the Preparation text. Instructions: For each Para, put the text in the Body paragraph tag. Figure 14: Processing structured documents to produce final output Marinara Sauce 2 tbsp. olive oil 2 cloves garlic, minced 1/2 tsp. hot red pepper 28 oz. canned tomatoes 2 tbsp. parsley, chopped Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer on medium-low heat for at least 20 minutes. Add parsley, simmer for another five minutes. Serve over long pasta. Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and sweat until fragrant. Add tomatoes, breaking up into smaller pieces. Simmer on medium-low heat for at least 20 minutes. Add parsley, simmer for another five minutes. Serve over long pasta. Apply transformation scriptorium

14 Roles and responsibilities The roles and responsibilities in a typical publishing group change when structured authoring is implemented. This section explains how traditional roles change and describes the new role of the document architect. Note that in a small group, one person may hold any or all of these roles. Document architect The document architect defines and implements document structure. The document architect must identify information types and establish their required structure. For example, a document architect would build a structure for a company s training manuals (Figure 15). Lesson 1: Zoo maintenance In this lesson, students will learn: Zoo maintenance types Working with animals Cleaning cages safely Handling inquisitive visitors Have the following items ready: Zoo maps Pictures of injuries caused by lions Bleach and buckets Videotape of misbehaving visitors Lesson Objectives Resources Lesson Title ObjList Objective Objective Objective Objective Resources Figure 15: Structuring training manuals Template designer The template designer is responsible for establishing the look and feel of content deliverables, such as books, online help, e-learning, and so on. In traditional desktop publishing, the template designer is usually a tools expert who can create templates in the appropriate publishing tools. In a structured authoring environment, the designer might also be asked to learn Extensible Stylesheet Language (XSL) and XSL Formatting Objects (XSL-FO) to create HTML, PDF, and other output from the XML files. 3 3. More information about XSL and XSL-FO: http://www.w3.org/style/xsl/ Copyright 2009, Scriptorium Publishing Services, Inc.

15 Writer In a structured workflow, writers, as always, create content. In the 1990s, writers often were asked to take on additional formatting and publishing responsibilities; in a structured workflow, these tasks are generally automated. The document architect establishes the overall structure of the documents; the template designer implements a look and feel that is automatically assigned based on the structure of the document. Many writers who are new to structure are uncomfortable with the perceived lack of control over the final document. They have become accustomed to tweaking the final output to make it look right. Any implementation of a structured workflow must anticipate some resistance and perhaps even outright hostility from a minority of writers. 4 This resistance seems misplaced, though. Instead of wrestling with formatting problems, writers can focus on content and organization typically a better fit for writers skills and interests than desktop publishing. After the initial transition and learning curve, working within a structure increases writer productivity and improves the quality and consistency of the final output. Technical editors By enforcing correct structure during content development, a structured workflow eliminates the need for editors to check a document for structure. Instead, editors can focus on word choice, grammar, and overall organization. By automating some of the most tedious parts of the editing job, a structured authoring environment makes it possible for editors to do a more thorough edit in the same amount of time. Editors are also uniquely positioned to assist with structure implementation. Technical editors see more of a total document library than any other member of a publishing team (with the possible exception of production editors). Because of this familiarity with the overall documentation set, editors are excellent resources to assist in establishing an information architecture. Editors may also have the skills to establish the needed taxonomy for metadata. Taxonomy is a classification system; in a structured workflow, classifying includes defining element names and hierarchy, which elements need attributes, and what values those attributes could have. Production editors In many companies today, writers and editors handle production tasks, but there are a few documentation teams that still have production editors. With formatting generated automatically by the structure of a document, the workload for these production editors should decrease. Many production editors will refocus their efforts on the transformation part of the workflow. Instead of correcting formatting errors after the fact, they will get involved in defining the transformation files that assign formatting based on structure. Production editors will verify that output formatting is working correctly. 4. Wider resistance may indicate the new structure does not accommodate all types of content. A thorough analysis of multiple document types before implementation will minimize this problem. However, it is important to have a change process in place to handle revisions to the structure. scriptorium

16 A difficult transition? The transition from free-form writing to structure can be difficult. Just as some writers dislike working in a template-driven environment where formatting is constrained, some dislike the regimentation of structured authoring. Structured authoring offers the business organization compelling advantages, including improved consistency and increased productivity because manual editing and formatting time are decreased. The widespread implementation of structured workflows will likely result in structure being used to deliver information in ways we have not yet even anticipated. It is indisputable, though, that structured information is more valuable than unstructured information. These advantages must be weighed against the arguments from the writers that writing in a structured environment is less interesting. Developing a business case for structured authoring and XML Not every content-creation group will benefit from structured authoring and XML. Sometimes, the expense of implementation outweighs the benefits realized, especially in smaller groups with a smaller amount of content. There are a number of imperatives that lead to implementation of structured authoring and XML. The following are some of the most common scenarios: Enabling content exchange between incompatible applications Extracting information from databases for publication Reducing content duplication and reusing information Extracting information based on structure and metadata Improving formatting consistency Reducing author learning curve Improving compliance with required document structure, especially in regulated industries Enabling content exchange XML is platform- and vendor-neutral, which makes it an excellent choice as an intermediate format. It is quite common in a single company for two departments to standardize on different, incompatible publishing tools. As a result, the information developed in one department cannot be reused in another department without extensive manual conversion and reformatting work. This leads to content silos, where each department owns a separate, private set of information, often with significant amounts of content duplication (Figure 16). Structured authoring and XML can eliminate this silo mentality without necessarily forcing either group to implement the preferred software tools of the other group. Each group authors in its preferred application, and then exports to XML for interchange. Intensive coordination is required to ensure that the structures used by each group are compatible. Copyright 2009, Scriptorium Publishing Services, Inc.

17 Information exchange is difficult and tedious. Department A Figure 16: Content silos limit information exchange Department B To make this system work, each group must use a publishing tool that supports XML import/ export (Figure 17). Information exchange is difficult and tedious. Department A Department B XML serves as interchange format. Figure 17: Breaking down content silos scriptorium

18 Extracting information from databases for publication XML provides a useful intermediate format for content that s exported from a database. Most commonly, database publishing is used for parts catalogs, directories, and similar large data sets. The records are extracted from the database and marked up as XML; the XML is then processed to produce the final output (Figure 18). Extract records and create XML Apply formatting template Database XML file Final document Figure 18: Database publishing with XML Traditionally, database publishing has required customized, application-specific solutions. XML offers a generalized and significantly less expensive approach, which better separates the data generation task from the output formatting task. Reducing content duplication and reusing information Imposing structure results in improved consistency of content. When combined with an XMLbased content repository, structure makes it easier to manage content. Once content is under control, you can search for particular chunks of information and reuse them. The alternative, in a disorganized environment, is that content is written several times. The first writer creates a piece of information. A second writer needs that same information but doesn t know that the first writer already created it. The second writer rewrites the information. Now, there is duplicated content, which is probably inconsistent. As the two information sources are maintained and updated, they diverge further. Minimizing the total amount of content being created and modified is one of the most powerful ways to reduce the total cost of content development. Creating what s needed just once requires that all of the writers can locate content as necessary. Reusing content results in decreased costs, especially as documents are updated from one version to the next. If documents are also translated, significant cost savings will be realized in that effort. The cost savings from translation alone can justify the implementation of an entire structured workflow. Extracting information based on structure and metadata Once information is structured and stored with metadata attached to it, it becomes much easier to search for specific information. Consider a structured environment in which each major topic has the following attributes: Author name Revision date Copyright 2009, Scriptorium Publishing Services, Inc.

19 Product/topic User level Platform Based on these attributes, you could perform a search that extracts all of the topics written in the past year that are Windows specific and for administrators. Improving formatting consistency In a structured environment, formatting is handled automatically based on the structure. Formatting by rule greatly improves consistency across a document set authors or production editors are not required to remember, for example, that in a list of bullets, the first bullet gets a special paragraph tag. Instead, the software applies these types of formatting rules automatically. Reducing author learning curve Instead of learning to format documents using a specific publishing tool, writers focus on creating and organizing content. The process of formatting information is automated, which greatly reduces the need for writers to act as their own desktop publishers. However, writers do need to learn to assign useful metadata tagging to documents. Improving compliance with required document structure United States government contractors, especially those who work with the military, have long been required to deliver documents using specific standards. The aerospace industry also has specific rules for documents such as aircraft maintenance manuals, and the pharmaceutical industry has rules for labeling. SGML and XML have been used heavily to enforce such standards. Does your organization need structure? Armed with basic information about structured authoring, the next logical question is whether your publishing workflow should be moved to a structured environment. In some scenarios, the decision is simple: Content interchange. XML provides an excellent medium for content interchange. If you need to move content from one format to another, structured content will allow you to automate and systematize the process. Enforcing uniformity across a document set. Defining a structure lets you apply and enforce consistency across documents. Larger workgroups, higher turnover, and complex formatting requirements for output all make the automation provided by a structured workflow more appealing. Content management. XML files are in text format, which lends itself to setting up a repository for storage. You can also divide files into small chunks and place them in the repository. The larger the volume of content being produced, the more useful and compelling content management becomes. scriptorium

20 Structure is not the solution for all content development workflows. In some environments, implementing structure will be more trouble than it s worth. The following are some examples where structure probably doesn t make sense: Fiction and other creative writing. Fiction is unlikely to fit into a predefined structure, and it probably doesn t require the type of reuse and management that technical content does. Low-value content. If you do not plan to reuse content, or if a document doesn t contain sufficient information, the effort of structuring it is probably not worth it. Day-to-day business communications, such as email and memos, generally fall in this category. Be on the lookout, though, for higher-value content, such as complex proposals, that could be reused. Small sets of technical content. Organizations with thousands of pages of content need to consider structure. Organizations with tens of thousands or more pages almost certainly need both structure and content management. An organization that only manages 100 pages of content doesn t need elaborate structure and content management. Somewhere between 100 and 1,000 pages, there is a point where the value of structure outweighs the implementation cost. Implementing a structured workflow If you decide to establish a structured workflow, expect a lengthy and probably painful transition. In an environment where formatting templates are already established and enforced consistently, the addition of structured templates should be relatively straightforward. A workgroup making a transition from a free-form authoring environment where templates aren t used to structured authoring should expect major disruption. Structured authoring will completely change the authoring experience. A minimal implementation process requires that you do all of the following: Analyze content and develop structure definitions Design a new publishing workflow Roll out the new workflow Train users Set up a maintenance process Analyzing content and developing structure definitions Document analysis requires different skills from template design. Instead of creating formatting tags based on a document s appearance, the document architect must identify content elements. Often, formatting is a visual indicator of structure (for example, headings are usually larger than surrounding text), but structure elements may be needed in areas where formatting does not provide a cue. The document architect begins by reviewing existing documents and analyzing their structure. Any structure that s developed must also take into account new document types that might be needed. Copyright 2009, Scriptorium Publishing Services, Inc.

21 Analysis should also include consideration of how well content meshes with industry-standard structures. Adopting a standard means significantly less development time because DTDs and schemas are often available at no cost. NOTE: Some established standards are DITA for topic-based technical documentation (http:// dita.xml.org), S1000D for military equipment (http://www.s1000d.org), and SPL for pharmaceutical labeling (http://www.fda.gov/oc/datacouncil/spl.html). Designing a new publishing workflow Once a structure definition is established, it s time for the most controversial part of the implementation process choosing tools. The following tools will be needed: Authoring tool for creating structured documents Formatting tool in which automated formatting definitions are set up Content management system to keep track of content A detailed discussion of tools is beyond the scope of this document. The content management system is likely to be by far the most expensive component of a structured workflow and requires the most extensive analysis. Rolling out the new workflow A rollout will require two major tasks: notifying users about what s coming and installing the software, servers, and systems that make everything work. Larger numbers of users will add complexity, as will different location types. For example, rolling out a new system to users in two offices would be relatively simple. Integrating hundreds of users in remote home offices adds a degree of difficulty. Training users Users need training in several different knowledge areas: Structured authoring concepts Basic XML concepts Creating usable metadata Working with a content management system If writers are not accustomed to creating content for multiple output formats, they may also need training on how to write modular, delivery-neutral information. Setting up a maintenance process Once the structured workflow is established, it s critical to set up a process that allows authors to request changes to the structure and the metadata framework. scriptorium

Summary Structured authoring offers the prospect of automated formatting and better management of information. New skills are required both to implement a structured workflow and to work within it. Treating content as complex data that can be managed and manipulated requires a significant shift in mindset from authors, editors, and other publishing professionals. Scriptorium Publishing Services, Inc. PO Box 12761 Research Triangle Park, NC 27709-2761 USA info@scriptorium.com 919-481-2701 www.scriptorium.com