The Basics of Deploying Structured Authoring Volume 1: Differences in Base Material

This short series of articles is intended for those who are considering a shift to structured authoring. This first installment is about the differences in the base material between direct text editors and structured authoring. Expecting both to operate in the same manner can make it hard to enjoy the benefits related to structured authoring.

Because this piece is intended for those uninitiated to the practices of structured authoring, the detauls on the practices and benefits related to structured authoring are first introduced below. The rest of the article will focus on the differences between traditional text editing and structured authoring. Such differences include (1) the transition from micromanaging to regularized layouts, (2) conditioned content, and (3) using references to manage content.

Structured Authoring in a Nutshell

The core concept of structured authoring is that content is divided between elements that follow a specific scheme, that can be reused in different contexts, and to which you can apply separately specified styling rules.

In this respect, a major difference to editing tect with a program like MS Word is how all the content can be transferred between documents and systems which are based on the same content scheme.

Imagine, for example, that you have made a table with a specific layout in MS Word and you then try to paste the whole table to a different text editor.

How likely do you consider it, based on prior experience, that the table will be successfully transferred over in this manner?

I bet that the majority will agree that at least a part of the table will be broken by this process.

In the case of structured authoring, the table’s layout will be based on a scheme which in turn uses a more widespread meta language. Such meta languages include HTML for web applications and DITA for technical documentation, for example. Thus, if an element based on such a scheme is copied from one document, it can be freely applied to any content based on the same scheme. Instead of being limited to individual programs, the same material is applicable in more contexts and it can even be separated from other content in an automated manner with the help of the identifiers that schemes involve.

As structured authoring is based on schemes that are independent from specific programs and it separates content into individually processed elements, specific elements can be deployed in different contexts as needed. As such, it can be used to manage pieces of reused content through singular source versions, for example. Such methods allow structured authoring to both reduce the labour involved in writing and the risks related to updating content.

More detailed explanations on the subject are available elsewhere. For example, this older article (only in Finnish) explains the related technical terminology in more detail.

Three Notable Differences in Base Material

Transitioning to structured authoring involves growing accustomed to a distinct workspace, among other things. The benefits of direct text editors such as MS Word include how the workspace largely corresponds to the look of the end result. For the most part, the same cannot be said of the workspaces for structured authoring. Because of this, it is best to clarify these differences in workspaces to new users and those who consider deplying structured authoring.

From Micromanagement to Regularizing

*Structured base material and one of its publication formats. Note the identifier values that were added during publishing and the differences in the styles of lists and the note element.*

The main benefit to direct text editing software is how such programs allow you to separately manage the layout of content on each page. This makes the final position of each piece of content in the end result perfectly predictable. Thus, using them lets you determine each detail of a document and how it will appear in the final product. At times, such specific and targeted adjustments may be required. At other times, an experienced need to manage each detail like this may be overstated, though, and it is based mainly on expectations derived from prior habituation. In this respect, structured authoring also demonstrates how other practices can be sound in many contexts.

If the practices related to direct text editing software thus incentivize micromanaging content in this respect, one might instead call the approach that characterizes structured authoring ‘regularizing’.

In this context, ‘regularizing’ refers to the layout of content being linked to a set of standardized rules.

The main difference related to this approach involves the discrepancy between the base material and the published version. The layout-related details of the final publication are determined by the rules being applied during the actual publishing process. This entails that page layout, for example, must be anticipated by constructing the content layout while being conscious of these rules and their limitations. These details cannot be controlled directly as part of the writing process.

Despite how this feature can feel limiting in many respects, at the same time, it is a condition for the high level of reusability that structured authoring enables. For example, translated content need not be directly manipulated to separately manage its layout. Instead, the automated process that determines their layout follows the same rules as it did for the original. Thus, if content is ordered with enough prescience and the rules that handle its layout have been written properly, translations involve practically no added layout-related labour.

The transition to structured authoring thus involves relinquishing the power to micromanage content layouts that direct text editors can grant. The process of content creation becomes distinct from managing the layout which is a change that I have personally found mostly liberating. At the same time, one must remain conscious of how rules for layouts have their limits and how the ordering of content will affect the manner in which these rules get implemented. It remains important to anticipate such details. An example of this is how large images ought to be arranged at the ends of their respective sections to avoid any page breaks that they might require introducing empty space in the middle of a section. Should said sections be set to instead always start from a fresh page, it may be best to include such images as the first elements within them.

Conditional Content

The second notable difference is how structured authoring often involves base material that shows contents from multiple differing publications. As such, the same base material may include contents related to different applications, and this makes it not correspond to any single publication.

The reason behind this feature is that such organization of content allows the shared portions to be used in multiple publications without them having to be copied or referenced each time. The parts that are conditioned, on the other hand, allow content, such as specifications related to different models, to only be used in their respective publications. To account for this, writers must mentally simulate how all the different pieces of content will be set in the published versions as they work on such base material.

In some instances, even the same content may be replicated within the same base material. The preferred way to handle such situations is to use the original version as a source that the rest may reference (see below). This method is not always available, though. This method is mostly commonly used to resolve cases where some otherwise shared element must be arranged differently between the publications that include it. For example, a proper page layout may in some instances require a forced page break after a specific element. If a forced page break is not available as its own element, one may be added as part of the styling rules of another element with a corresponding identifier. When the identified version of the element in question is conditioned to only be used in said publication(s), the default version must then be conditioned to be used in all other publications in turn.

In the case of structured authoring, the same base material includes the contents of multiple publications. The contents of the final publications are selected from this whole based on how they are conditioned. To successfully handle such base material it is thus important to perceive the differences between the contents of different publications.

Referenced Content

The third main differences relates to how not all content used in a publication shows as part of the base material in the case of structured authoring. Such is the case with content that is based on references to other content. In these instances, the base material will generally just show an empty element of the relevant kind, which has been designated as referencing its content from elsewhere.

In DoX CMS, there are several features which operate like this. The most noteworthy among them are content references (conref), variables, and attachments.

Content references consist of source elements that are used to manage the referenced content and target elements that retrieve their content from such sources. A source element is a normal part of content which has an identifier value. A target element is an element of the same type where its contents are replaced with those of the source element because it references the identifier value of the source element.

Thus, only the source element shows its contents as they will appear in the published version(s). As part of the base material, target elements are often left empty or their content only consists of a brief reminder about the contents of the used source element.

Meanwhile, variables are smaller phrase elements where their content is controlled from the outside in a manner unique to DoX CMS. Generally, they only contain strings of text but they may also include images, for example. In the case of publishing variables, their content, such as the name of the relevant client, is determined during the publishing process. The base material shows variables only as identifiers wrapped in two sets of curly braces, such as ‘{{ProductID}}’.

Attached files, including images, generally show as part of the content preview. Yet the principle by which they operate still involves retrieving them based on a reference value rather than them being included directly as part of the base material. The content preview does not necessarily show the proper dimensions for these images as those remain subject to the rules that determine the layout. They also cannot be arranged freely in relation to other elements as part of the writing process. Instead, all such layout-related details rely on the rules which specify them during the process of publishing.

Because structured authoring uses referenced content like this which is not part of the base material being worked on, the matter must be handled with care. It is usually best to place source content in a position which is distinct from all publications and through which such content can be managed. One should also get used to not being able to place images as freely, except with the help of stylesheets.

How You Benefit from These Changes

If users’ expectations are based on the merits of direct text editors such as control over page layout, the aforementioned features of structured authoring may sound like nothing but limitations. This is far from the truth. For this reason, I will end this piece by emphasizing the value gained from these differences and the things that they allow you to do.

The main benefit is how you need not control all content to be published directly. The value thus gained is particularly salient in relation to translations. If the base material is organized in a manner that accounts for the rules that determine layout and that leaves elbow room for language-specific differences, each translation can be compiled using the same set of rules.

Additionally, you may simultaneously change shared content between multiple publications by only editing one piece of content. Should any such content need to be changed or otherwise updated, the change occurs at the same time in each position where said content is used. This removes the risk that some content between versions is accidentally left in the prior state.

For example, variables allow controlling details such as contact information in each position simultaneously. You no longer need to remember where said content has been used or to search for it in the base material for various publications.

As will become evident as this series priceeds, shifting to structured authoring does require preparatory labour. The rules related to layours must be written separately, for example, and this requires related expertise. (DoX Systems offers tailored stylesheets for new customers as part of deploying the system.) As compensation for such preparatory efforts, though, the process of writing may then be dedicated fully to producing content. In other words, the required preparations are largely a singular investment of time and effort which removes the need for constant additional effort to be dedicated to such matters over the course of creating content.