The Basics of Markdown & Machine Translation
by Denis Augsburger
Markdown is a plain text format commonly used as an alternative to HTML. It is used on different platforms like Slack, Trello, Stack Overflow and Github. The format is popular because of its compact syntax which makes it easy to read and efficient to write in. It comes in different flavors which provide some additional features. A common format is the Github Flavored Markdown (GFM), which is based on the Common Mark spec and used by Github.
So here is an example with a title, a paragraph and a code snippet.
# Example documentation about Simpleen This makes it possible to easily translate locales (i.e. JSON) as well as documentations in markdown. With yarn: `yarn install packageName --frozen-lockfile`
Markdown is mainly used to write (blog) posts, documentations and Readme.md files. More and more Content Managers start to use markdown instead of WYSIWYG editors. Why? Because of
- the growing popularity of static site generators like GatsbyJS, Hugo and Jekyl
- the use of Headless Content Management Systems like Strapi, Contentful and Sanity
- more traditional CMS which supports markdown directly or via a plug-in
Markdown also is the preferred format with developers and designers, because it can easily be transformed to HTML. This gives additional flexibility to reuse your content. For example: If you write your blog posts in Markdown and want to redesign your Website a few months or year later, you don't need to go over your text again. Instead, you just adapt the transformation step, i.e. use another css-class for all of your headings.
Although English often is the go-to language, the globalization of services - online and offline - calls for multilingual blogs and documentations.
Machine translation services like Google Translate or DeepL do not specifically support Markdown. Therefore, it can be troublesome to translate your Markdown files with it. Especially when you have code-snippets in your Markdown files like shell commands or JSON structures. Parts of it will get translated and break your documentation, blog posts or Readme.md files. Our example from above becomes the following with DeepL for translating to Dutch:
# Voorbeeld documentatie over Simpleen Dit maakt het mogelijk om zowel locales (d.w.z. JSON) als documentaires in markdown gemakkelijk te vertalen. Met garen: GareninstallatiepakketNaam -bevroren-lockfile`.
Our code-snippet is syntactically broken as well as unrecognizable because of the translation.
Another common problem is the incorrect translation of your brand or product name (in our example yarn = garen in Dutch). If you don't have a world-known brand or it has another meaning in your source language, it's likely that the translation service tries to translate it.
A webtool that translates your Markdown without breaking your code-snippets is Simpleen because it supports the translation of Markdown text. It integrates machine translation services (currently DeepL) and improves the translation result by automatically handling the markdown format. With Simpleen our example looks as follows:
# Voorbeeld documentatie over Simpleen Dit maakt het mogelijk om zowel locales (d.w.z. JSON) als documentaires in markdown gemakkelijk te vertalen. Met yarn: `yarn install packageName --frozen-lockfile`
In the glossary I've added yarn to the ignored list, so that it's not translated. Our code-snippet is valid and not translated anymore. Supported formats are for
inline code as well as
distinct blocks (single vs triple backtick). The application handles the format for you to easily translate your files.
Note: Simpleen currently provides machine translation with DeepL. Let us know if you would like to use other translation service.