• English

Multilingual Antora

Antora is the choice static site generator for Asciidoc and technical documentation. However out of the box, Antora does not support multilingual documention. That said with a few tweaks, a multilingual site is possible, but still with some caveats. Included here are a few comments as well as a modified Antora Default UI for multilingual support using the version as language indicator.

This post is a work in progress. And there are probably several ways to accomplish this. The following is one attempt and at this point I’m not sure if this is the best or not. I will certainly fix things as I figure them out! Any suggestions or pull requests welcome!

1. Language by version

It has been suggested in community discussions that the Antora version could be used as a language selector. This seems reasonable as that still allows different components for the site, ie blog and book and docs for example.

Also, if you actually are translating documentation for software versions, you may simply be able to append the language symbol to the software version. Since I am not versioning my documentation I don’t have to do that and will not be able to comment how well that works at this point. And since here we are using the version in the html header and elsewhere to specify 'language', this will be an issue.

I’m going to put that aside for now…​

On the other hand if you don’t have versions in your docs, ie a blog, book, or other material, the version selector does work very well with iso language symbols and no sigificant changes to the UI. The default UI version selector at the bottom left and top right just becomes the 'language selector'. In addition, the Toolbar 'language' selector will grey out the languages where the current document has no translation. It will act as a document language switcher, whereas the component, version (now language) switcher at the bottom left will change the language of the site and leave you at the start page. It just works! Nice feature IMO!

Also you can set the version to the two or three letter iso language symbol and the display_version to the language name in the document source yaml config. Since the display_version is optional this gives you the choice of using either the language symbol or language name in the version language selectors.

en_US or en_CA or en?

I would recommend using 'en' over 'en_US'.

The former is the language code, the latter is the 'locale', which also designates the country and cultural aspects of the language. This is meaningful and not just a preference.

For example documentation written for fr_CA, should require accents on captial letters (É vs E), as fr_FR accepts but does not require this. fr_CA will use letter size paper and comma thousands separator, and although not specified in the locale 100v for espresso machines. fr_FR will use a4 paper size and space thousands separator and 220v which makes much better expresso. Commercial web sites have to take this into account for their potential customers, and maybe yours does too.

Another important point to bear in mind is that users in en_CA, en_GB and en_AU may very well set their browsing preference or Android app or whatever to:

  1. en_CA < home country language

  2. en < The language itself

  3. en_US < US English

If you want the widest audience for your language, you should be using html="en" in your header and then additional locale metadata tags for en_US as required. The same principle of course applies to fr, fr_fR, fr_CA and many other languages.

1.1. Version/Language sort order

The version sort order does not seem to be configurable. (https://docs.antora.org/antora/latest/how-component-versions-are-sorted/).

component This creates a situation where the 'default' language of the Nav Explorer menu (bottom left), is not easily configurable. It sorts in reverse order, and this can be a show stopper for some people. I have four languages on my personal site, and the most obscure, starting with W, is listed first. Oh well…​

I would much prefer French or English to be listed first, but it seems this is just not possible.

The only sort of acceptable work-around for this issue I found was to set a start_page in the playbook to [email protected]:module:page.adoc. (In the case of this site [email protected]) Then at least the component initially opens up on the specified language.

Normally a politically correct list of languages should be alphabetic, (with perhaps a default language on top). In any case, if language sort order is important, then using the 'version' as language symbol could be an real problem.

(If anyone has any suggestions on other work arounds, please let me know.)

2022-02-08: Finally found a solution to this issue.

The components versions list is actually already a flex column. Simply tweaking the css to column-reverse mostly solves the issue of language order. Just remember that the languages are sorted by the version name (which in my case is the iso code). This may not be the order required by the 'Display' language. But its a huge improvement.

2. Components

The components, (ie web site sections for non Antora readers), also have some issues without some tweaks. Most importantly, the component in the default UI does not 'translate'. Well you can translate it of course in the config file, but Antora will only display the value for the 'first' component with all 'languages'. component This of course makes perfect sense for technical documentation for software versions. The component will not change between v1 and v1.2. But in many situations the component or section is a meaningful and translatable term or title that must be translated.

Fortunately a little tweaking of the nav-explorer template can totally fix this. This is done in the UI below.

More importantly another issue is that when a user clicks on a component in the component explorer menu, the webpage will display in the 'default' version/language of that component. This is totally expected when your components are unilingual 'software docs'. When you click on another component, the version of the current component would be irrelevant. However if the 'version' is a language the opposite is true. An Italian user should not expect on clicking a different component, (ie blog or docs), to flip the language to English, all the more so when the documents do exist in Italian.

Some adjustments have been made to the template to use the current (version/language), with the components in the explorer menu. This maintains the users language. If the language does not exist for a particular component, the user will just get a not found error. That’s just fine.

<div class="nav-panel-explore{{#unless page.navigation}} is-active{{/unless}}" data-panel="explore">
  {{#if page.component}}
  <div class="context">
    {{#with @root.page.componentVersion}} (1)
    <span class="title">{{./title}}</span>
    {{/with}}
    <span class="version"><img class=icon src="../../_/img/globe.svg">&nbsp;{{page.componentVersion.displayVersion}}</span> (2)
  </div>
  {{/if}}
  <ul class="components">
    {{#each site.components}}
    <li class="component{{#if (eq this @root.page.component)}} is-current{{/if}}">
    {{#with @root.page.componentVersion}}
      <a class="title" href="{{{@root.site.url}}}/{{./name}}/{{{@root.page.version}}}">{{{./title}}}</a> (3)
    {{/with}}
      <ul class="versions">
        {{#each ./versions}}
        <li class="version
          {{~#if (and (eq .. @root.page.component) (eq this @root.page.componentVersion))}} is-current{{/if~}}
          {{~#if (eq this ../latest)}} is-latest{{/if}}">
          <a href="{{{relativize ./url}}}">{{./displayVersion}}</a>
        </li>
        {{/each}}
      </ul>
    </li>
    {{/each}}
  </ul>
</div>
1 Translates the component title to the current language
2 Language icon (globe)
3 Uses the current page language for the selected component

3. SEO and metadata

3.1. <html lang="??">

Proper language metadata is extremely important for SEO. See: https://www.w3.org/International/techniques/authoring-html#language.

While asciidoctor will give you an html=YourDocumentISOLangCode in the head section, Antora out of the box will give you <html lang="en"> for every and any document.

If you use the an iso language code as 'version', the following can go into the Default layout template in order to ensure each document gets its correct language identifier.

<!DOCTYPE html>
{{#with (or page.attributes.lang page.version)}}
<html lang={{this}}>
{{/with}}
  <head>
{{> head defaultPageTitle='Untitled'}}
  </head>
  <body class="article{{#with (or page.attributes.role page.role)}} {{{this}}}{{/with}}">
{{> header}}
{{> body}}
{{> footer}}
  </body>
</html>

With the above code, if you have a :page-lang: attribute set in your document this can override the 'version' language, and allow you to have an exceptional English page in the French 'version' if necessary.

3.2. Alternate metadata translations

Ideally there should also be metadata <links> pointing to all translated pages available for any given page. I have this partially working, but this still needs some work. Ideally you would need links like this in the header metadata of the English page:

<link rel="alternate" hreflang="dyu" href="https://coastsystems.net/blog/dyu/" title="Kunafoni Julankan">
<link rel="alternate" hreflang="fr" href="https://coastsystems.net/blog/fr/" title="French blog post">

The following code in the head-meta.hbs will insert a link for the alternate page languages with the correct url, but not the title. Any solutions for this issue welcome!

{{#with page.versions}}
  {{#each this}}
  {{~#if (ne ./version @root.page.version)}} (1)
    {{#unless ./missing }}
      <link rel="alternate" hreflang="{{./version}} {{./missing }}" href="{{{@root.site.url}}}{{{url}}}" title="{{component.version.page.title}}">
    {{/unless}}
  {{/if~}}
  {{/each}}
{{/with}}
1 Get a link for any language that is not the current page language

4. i18n function needed…​

For the few issues below, I believe the best solution would be to use one of many i18 libraries to keep a yaml file for each language with translatable strings. I did try to add http://i18njs.com/#multiple_languages to my Antora site. The package seems to promise to do what is needed and is not bloated.

The problem is there is practically not a single line of the code that passes elint. And I haven’t explored further as yet. But something like this would be a solution to the java "Table of Contents" problem and facilitate translating handlebars template strings as well. If anyone has some ideas on this for right now, I would love to hear about them.

Here are the issues that an i18n function could address:

4.1. header content (toolbar)

There are some things like menu items that will always need to be translated. For now, I have a header-content-lang.hbs for each language. The header.hbs template picks these up as follows:

{{> header-scripts}}
{{#if (eq page.version 'fr')}}
{{> header-content-fr}}
{{/if}}
{{#if (eq page.version 'en')}}
{{> header-content-en}}
{{/if}}
{{#if (eq page.version 'dyu')}}
{{> header-content-dyu}}
{{/if}}
{{#if (eq page.version undefined)}}
{{> header-content-fr}}
{{/if}}

If this stays in multilingual templates it could undoubtedly be improved upon. But Handlebars does not have an abundance of documentation with examples and howtos. I did find that it won’t load a dynamic header-content-{{page.version}} template. Not sure how/if this could be done, but the above works for now. An i18n function could perhaps eliminate the need for multiple templates for each language.

4.2. hard coded stuff

There are also some English strings that are hard coded into Antora in various spots.

  • For example the header for the Table of Contents on the right column, is hard coded in 02-on-this-page.js line 43 with ( title.textContent = sidebar.dataset.title || 'Contents'). Someone with javascript skills could find a means to hack their own languages into the script. On this site to the right it is hard coded in French: "Sur cette page…​" (The English techies can learn French!)

Alright this is ok for my personal blog, but for a serious multilingual site this won’t work. So my workaround for the Multilingual UI was to remove it.

  • Another example is the 'Prev' and 'Next' labels for the pagination links.

    These are hard coded in css. This is easy to change of course. I have removed them as the links are pretty self explanatory, and actually already include left and right chevrons. The terms also could also be replaced in css with unicode or icons. But if you want or need to translate this you would probably want to do some kind of i18n function in the handlebars templates.

  • There are also tooltips and messages in hbs templates that need translation.

    I haven’t touched these at this point. Although it would be possible to make all necessary translations in language specific templates, it would be nice to have the choice of using an i18n function to grab these from an i18n yaml file.

5. URLs

Most multilingual web sites have a url structure like https://example.com/en/docs https://example.com/fr/posts etc. However the Antora url is https://example/com/component/version, which is totally normal for a software version, but can be a little weird non standard looking for content translated in different sections of a web site. So the url for this post is: https://coastsystems.net/blog/en/blog/2022-01-04-antora/. I just have to live with it.

The lunr search extension is the only extension added below to the Antora Multilingual playbook. It works awesome and supports many major languages! Even if your language is not officially supported, at least if it has a latin based script lunr may also just work fine.

7. Fonts

This is a personal preference in most cases. But for me personally Roboto does not contain the extended latin characters I need. The font substitution is good, but why settle for good when you can have a font that contains all the characters you need? Noto Sans seems a much better choice for wider latin and non latin support. Of course the fonts are easily changed with Antora.

But just one other problem here. The npm typeface-* packages do not (generally?) provide latin extended versions even where they exist.

Its really convenient to just run npm i typeface-your-preferred-font-for your language --save-dev, in the UI. But that won’t provide me the characters I need. Even more so if you are working in Vietnamese or Tagalog.

In the supplementary UI, this is going to require manual install in any case. npm @fontsource seems to be very up to date and as soon as Google puts out a new font they will have it available. If you need a particular font, look for it @fontsource. Then follow the Antora instructions for Adding fonts manually.

You will probably need to be renaming font.css files and copying the woff2 files to the font directory. Remember that the path in the font.css file pointing to the actual woff2 font file is relative to where that css file resides.

Then be sure to set the font family in i18n/vars.css and import the css files in i18n/site.css.

8. Conclusion

Antora is the best platform for technical documentation and writing in Asciidoc. It can be currently used in a multilingual environment with some hiccups and work arounds. Sites like Fedora docs use Antora and somehow manage to support a 'lot' of languages. (Its also a very plain utilitarian looking site. Fedora design dept??? Are you there?) It would be IMO a huge asset to Antora to be readily adaptable and useable out of the box to the international world of educational material and more. And yes even technical docs are and must be translated. In the global village, the web site generator needs to totally accommodate multiple languages!

As aptly stated in the linux journal in 2010:
Just as information exchange standards such as XML allow systems to be more interoperable, at its core, I18N allows applications to be more usable - by a broader, more global user base. I’m not suggesting that every trivial shell script necessarily warrants I18N, but because all commercial software is potentially a global commodity, language independence is something that needs to be considered - and considered early in the design/development process. The lack of such planning would be quite shortsighted in 2010.
— Luis Icanona on September 27
2010

Hopefully full on i18n and multilingual support is on its way.

In the mean time, if I have missed something or misunderstood something or you have some suggestion, please let me know!

I originally had a Playbook with everything (modified handlebars templates, css etc) in an supplemental ui. However after getting gettext mostly working I was not able to have to translate the 'Contents' title, as noted above with the javacript in the supplement ui. I’m not sure if there is a way to do this. So for now the playbook contains lunr search configured for French and English. All other i18n modifications are in the antora-ui-i18n repo.

9. Usage

  1. Ensure your docs are organized by language code.

    For example:

    docs
      en/
        antora.yml
        modules/
          ROOT/
            pages/
              index.adoc
      fr/
        antora.yml
        modules/
          ROOT/
            pages/
              index.adoc

    In the antora.yml file, the component name should be identical across languages. The version should be the iso symbol for the language ie 'en'. Optionally, the display_version should be the language name ie 'English'.

    name: docs
    title: Docs
    version: en
    display_version: English
    start_page: index.adoc
    nav:
      - modules/ROOT/nav.adoc

    Store your documents and configuration in a git repository.

  2. Clone the Antora i18n UI.

    At a minimum, you will *need* to customize header.hbs according to the languages you will be using. Then you will need to add the appropriate header-content-* files for each language. These files are located in the src/partials directory.

  3. Build a custom UI bundle as per the usual Antora howto in the readme.

    1. npm -i install

    2. npm -i install gulp-cli

    3. gulp bundle

  4. Clone (or make your own): playbook demo repo.

    1. Edit the playbook.yml file and set the document source to the correct location.

    2. Point the the ui bundle source to your customized version. (either local or on a git cloud repo)

  5. Run npm -i to install Antora.

  6. Run npm run clean-build to build your site

    (This includes the lunr supplementary UI.) Otherwise if you remove the lunr extension you can run the command below.

  7. Run npx antora playbook.yml to build your site.