Skip to content
iSAQB-blog-babylon_as_a_feature

Babylon as a Feature

Multi-lingual documentation, made simple

The Tower of Babylon is a myth meant to explain why the world’s peoples speak different languages. Refer to Bibli­og­raphy [0] below for details. In modern IT systems, it’s often a requirement to support multiple languages.

Such inter­na­tion­al­ization (i18n for short) is a tough challenge – and this post describes a simple solution to just a tiny part of multi­lingual documents. Our solution combines the simplicity of the plain-text format AsciiDoc with a simple yet versatile build script to support multiple languages (like EN and DE) and multiple output formats (like PDF and HTML).

Let’s start with some requirements:

  • Imagine you need to maintain documents.
  • The desired output format is PDF or HTML, although our approach could easily handle *.docx or LaTeX. But we will keep it simple for now.
  • Several people constantly provide updates to these documents, they need to collab­orate without inter­fering with each other.
  • Changes should be reviewed and approved by somebody else.
  • From time to time, you need to release updated versions of your documents.
  • For readers, this version number is important, therefore it needs to be contained within the documents.
  • Maybe it is self-evident for you, but we strive for a high degree of automation. So please don’t come up with a “Save document as PDF” function within a word processor.

Just in case a few of these requirements sound familiar to you since source code needs to be maintained that way: The good news is you will recognize a few of our proposals.

Let’s visualize the situation: Figure 1 depicts a few authors that indepen­dently update distinct parts of an English and a German document.

Figure 1: Authors maintain documents

Figure 2 shows three hypothetical document releases with two languages.

Figure 2: Document releases

 

What Kind of Documents?

We (Ben and Gernot) are (co-)authors and maintainers of a few documents, for example, an extensive glossary of software architecture termi­nology (refer to Bibli­og­raphy [1] below) and a number of technical curricula (see Bibli­og­raphy [2] below).

We maintain these documents (together with a group of additional authors) in English and German. Our problem is that we write and speak only these two languages, but you will see below that additional languages can be easily integrated.

 

Collab­o­ration First

As software devel­opers, you will have experi­enced the numerous advan­tages of profes­sional version control, namely git. Combined with services like Gitlab or Github, you get a rock-solid and proven platform for collab­o­ration, including pull/merge requests (in our case: document reviews and approvals).

Therefore, we obviously maintain our documents on such a git platform.

Pull and merge requests require that differ­ences between documents can be automat­i­cally deter­mined, so the technical format for documents needs to be plain text. Several such formats are used in practice (see our explanatory box below). Several of these lack the babylonic features we require to process several languages automat­i­cally, which is why we decided to use AsciiDoc (see Bibli­og­raphy [3] below). AsciiDoc is open-source and provides several incredibly powerful features that will come in handy later on.

Markup Languages

A few markup languages have become popular in software developer communities:

  • Markdown is likely the most common markup language. Used primarily for shorter documents, like blog posts (this one has actually been authored in Markdown). On the positive side, it is extremely easy to use. However, it also has a few downsides: 
    • There are several dialects in the wild that each adds certain features, usually not compatible with the other dialects.
    • No built-in support to modularize/structure documents.
  • AsciiDoc is our language of choice, as it has been designed with large documents and language simplicity in mind, has excellent documen­tation and is used in several open-source projects. For example, the arc42 architecture template relies on AsciiDoc.
  • Textile has been designed to be a shorthand syntax for creating HTML. We haven’t seen it in our projects and therefore did not consider using it.
  • ReStruc­tured Text and Sphinx: Used heavily in the Python world. Can create a variety of output formats, like HTML, LaTeX, Windows-Help, ePub, and others.

Wikipedia has a nice overview of these and other light­weight markup languages.

 

AsciiDoc HelloWorld

Using the AsciiDoc processor (either on your favorite shell or wrapped in a build script), you get the following output from the text above:

Image 1: Screenshot Hello Asciidoc(uments)

We compiled the AsciiDoc with gradle, using the following simple build file:

 

Split Documents into Parts

Now that we know how to create a document, let’s prepare for more compli­cated stuff. At first, we should modularize our document and split it into distinct parts. It’s like creating a larger software system from distinct compo­nents or modules, but for AsciiDoc documents. Luckily, AsciiDoc comes with a highly practical feature called include, which allows for the modular­ization of documents – see the following diagram.

Figure 3: Document made up of distinct parts

Of course, these include direc­tives may contain path or directory information so that you can organize your files in adequate ways.

 

Hey Babylon: Multiple Languages

For multiple languages, you have two different options to organize your content (explained in Fig. 4 for EN and DE, English and German):

  • Put EN content in an English-only file tree and DE German content in a second file tree.
  • Put EN and DE content in the same files, and find a clever mechanism to separate these languages when creating output for a single language.

Figure 4: Multi-language options

Let’s consider an important text passage in both English and German: (we took the liberty of using the intro­ductory paragraph of the Agile Manifesto):

We are uncovering better ways of developing
software by doing it and helping others do it.

Wir erschließen bessere Wege, Software zu entwickeln,
indem wir es selbst tun und anderen dabei helfen.

We have the two language versions next to each other, but we need to create an English-only output, without the German stuff in it.

Excursion: The C Preprocessor

A few old-gener­ation devel­opers might remember the days of the C programming language. Programs sometimes contained nerdy state­ments like the following:

In C or C++, these condi­tional includes are quite common. Sometimes, even the behavior of the compiler is controlled via such direc­tives. We tell you this for a reason, just read on.

But We Are Writing Documents, Not C?

If we had a similar directive, a kind of condi­tional compi­lation, for our documents, then we could for example write #ifdef ENGLISH #include page-1-EN.adoc, and leave out the other languages for a moment.

The AsciiDoc processors have learned their lessons from history and came up with a condi­tional include on steroids: One can include specific parts of a file, for example just the English parts. Such include state­ments can even be written with variables, and these variables can be set during the build process. Wow!

Fig. 5 gives an overview.

Figure 5: One build per language

AsciiDoc performs this magic by using tags, explicitly marked parts of a document. Here is a simple example:

We can then tell AsciiDoc to pass the tag for EN when including the file. See the following image.

Figure 6: Include only certain parts

Now our build script needs to iterate over all the desired output languages, call the asciidoc trans­former and create a distinct output for each one. The common build tools like Gradle, Maven, or make have their specific mecha­nisms, a detailed expla­nation would exceed the scope of this article. The structure of such a build script (in Gradle) looks as follows:

You find a specific task defin­ition per language (here: EN and DE), where the generic Render­Doc­u­mentTask gets called with the filename and the language as parameters. The heavy lifting of AsciiDoc conversion is done by the Asciidoctor Gradle plugin.

More Condi­tions AsciiDoc offers additional options to include condi­tions in your documents: You can use ifeval:: or the plain old ifdef::

But let’s have a look at a more realistic example.

 

Config­uring the Output

When we started with this toolchain, we knew that we had to find a way to be able to create either a PDF file or an HTML repre­sen­tation of our documents. Fortu­nately, AsciiDoc allows us to do both.

PDF Files

AsciiDoc allows you to create a PDF theme which is used to configure the output. It allows you to configure all sorts of stuff, like a cover image, the position of elements on the pages, background images, and more. You can even use variables in the theme file, which are in our case filled with language-dependent text, like the date in the footer (you can have a look at our PDF theme here). All you need to do is to tell the Asciidoctor task where to look for the theme, and that’s it. Let’s have a look at our gradle task to generate the PDF.

We removed every­thing from the task that is not relevant to the PDF creation (you can check the full file here). You have to enable pdf as the backend (line 11) and then set the name of the theme (pdf-style), the directory where to look for the fonts that are used (pdf-fontsdir), and the directory where to look for the theme (pdf-stylesdir). Why are there two more lines that don’t seem to be related to PDF? Well, glad you asked!

HTML Files?

The two additional lines you see in the code snipped above can be used to also style the HTML output. Asciidoctor has a default theme that is used for HTML output. If you want to adjust the result, all you have to do is to provide a CSS file that contains all the magic you want for your result. Enable HTML as the backend and tell AsciiDoc where to find the stylesheet (stylesheet) and where to look for images or fonts that might be refer­enced in the stylesheet (stylesheet-dir). You can check one of our examples below to see the PDF and HTML results.

Ok, that’s fine for a single project, but the Advanced Level has more than ten curricula, so we would have to copy the themes to each project. If we adjusted the PDF theme in one repos­itory, how can we make sure that all other curricula also benefit from the changes?

A Family of Similar Documents

To be able to only define both the HTML theme and the PDF theme once, we moved them to separate repos­i­tories. These repos­i­tories are then linked in each curriculum repos­itory as a submodule. This has several advantages.

  • There is only one place where we have to change the themes. If we’re working on a specific curriculum and want to improve on one of the themes, we can open the submodule and commit/push our changes.
  • Owners (curators) of other curricula don’t have to think about doing the same changes. All they have to do is to update the respective submodule.
  • Should owners of a curriculum not want to upgrade the themes for whatever reason, they can decide to just keep their submodules at the revision they are happy with.

We also identified the copyright of each curriculum as a candidate for a separate submodule. It is changed every year (to add the current year to it) and has to be done in each repos­itory. Extracting the copyright file as a submodule allows us to only change one single file. Everyone who updates their curriculum also updates the submodule to the latest revision, and that’s it.

 

Real World Examples

The Curriculum for Software Architecture, iSAQB CPSA‑F®

Worldwide courses and classes in software architecture are taught based upon the iSAQB Software Architecture Foundation curriculum, guiding thousands of devel­opers towards their “Profes­sional for Software Architecture” certification, CPSA‑F. Therefore, the iSAQB needs to provide versions in different languages, both in HTML and PDF formats. This curriculum consists of approx­i­mately 40 learning goals (LGs) in 5 parts, resulting in about 30 pages Every two years the iSAQB releases an updated version of the curriculum, based on new ideas and input from the inter­na­tional software architecture community.

We (Ben and Gernot) belong to the core maintainers’ group of this document.

Let’s dissect its structure:

  • The entry point is the file curriculum-foundation.adoc, which contains a number of include statements.
  • The first is adoc, which defines several variables that are used all around the document. Among others, the document type (book), the position of the table-of-contents (left), and the location of the image directory.
  • Now a list of all learning goals is included. Please note that this list is generated as part of the build process, to ensure we always have an up-to-date list of learning goals.
  • Next all the chapters are included, one by one, which in turn include important terms, all learning goals, and the refer­ences helpful for this chapter.

This allows us to be able to change and review each single learning goal without conflicting with other learning goals of the document. We keep both the English and the German trans­lation of a learning goal in a file, so if one language is changed, the other one is less likely to be omitted.

For trans­la­tions in other languages, we added the possi­bility to easily upload PDF files to the repos­itory which will be added to the next release automatically.

The Curricula of the iSAQB Advanced Level CPSA‑A®

We use the same template for each Advanced Level module that we also described in the previous example. This ensures a clear and overar­ching design and structure of the documents, so that partic­i­pants can navigate through the different modules with ease, always knowing where to find what. Updating the formatting is no real effort since this is done via the submodules. Only changes to the build environment or GitHub actions require manual adjust­ments in each repository.

A Large Glossary

We maintain a glossary of software architecture termi­nology (available for free from the iSAQB), with close to a dozen authors. A few parts of this document change quite frequently (new terms are added, expla­na­tions are updated), while others are highly stable (e.g., the intro­duction, copyright notice, and authors’ biographies).

We maintained this glossary in GitHub before, but we had to manually create a PDF and upload it to Leanpub. The current approach with AsciiDoc and our build pipeline allows us to create a new release by creating a new git tag and pushing it to GitHub. That’s it.

 

Summary

You can maintain multi-lingual documents with a pragmatic, simple, and free (as in open-source) toolchain that is developer-friendly and proven in practice. Business- and other non-IT people might miss their favorite word processing tool, but the benefit of multiple languages organized along the principle of one fact, one place will help you in the long run. Until then – may the power of expressive wording be with you.

 

Bibli­og­raphy

[0] Tower of Babylon: Brief expla­nation and history on Wikipedia

[1] iSAQB Glossary of Software Architecture Termi­nology, available in the following formats:

[2] iSAQB public documents

[3] AsciiDoc

 

About the Authors

Gernot Starke, INNOQ Fellow, co-founder of arc42.org and aim42.org. He “drinks his own champagne”:
Within iSAQB, he leads the Foundation Level Working Group and urgently needs to create and manage multi­lingual documents.
That’s why he sat down with Ben to create (and use) the toolchain described here.

Ben Wolf is an architect, iSAQB member, and a developer at INNOQ. He barely puts up with bad code and does not shy away from enormous refac­torings. He shares his ideas about software quality and proper software devel­opment as a trainer, consultant, and speaker at confer­ences and meetups.
It is important to him that we recognize that the attitude of a team is crucial for good software quality and far exceeds the value that is provided by technology alone.

 

 

Share this article:

Related Posts

Featured in this article

Dr. Gernot Starke
Organisation
Location
Germany

Benjamin Wolf
Organisation
Location
Germany

Stay Up-to-Date with the iSAQB® Newsletter!

Scroll To Top