Neural Machine Translation – Has the Future Finally Arrived?

Overcoming the language barrier is a dream as old as humanity, and every era has responded to the challenge in its own way, according to its abilities, needs, and technological means: multilingualism, lingua francas, artificial languages, standardised national languages, eradication of languages etc. Even miracles have been brought into play, as in the case of the Septuagint Bible when according to legend seventy-two scholars translating the Torah from Biblical Hebrew into Greek ended up producing exactly the same text. (Compared to that, immaculate conception must have been a walk in the park.) And, of course, there has always been common or garden translation (more blasted than blessed, perhaps). So it comes as no surprise that in the age of computers we have been trying to attack the Babylonian dilemma with automation.

The first serious attempts at automating translation started in the 1950s with rule-based systems (RBTM). These used various combinations of grammatical rules and dictionaries, and predicted a translator-free future within a few years, but in the end didn’t even get close to achieving that lofty goal. Language and communication, messy, unpredictable, creative – and, oh yes, rule-based, too – proved to be a code that was too hard to crack and translate into neat rules and algorithms.

The next surge in research, in the 1980s, coincided with the beginning of personal computing and office automation, and also with the birth of the translation industry as we know it today. When the world went global and online, the single- or multi-practitioner agencies serving a local clientele, and advertising in the Yellow Pages, morphed into Language Service Providers (LSPs). Thanks first to fax, then email and internet, these could now draw on talent from all over the world, and indeed offer their services to clients anywhere in the world. The increased amount of content available then inspired statistical machine translation (SMT), which drew on large volumes of analysed data for specific language pairs, and prompted hopes that “crowd sourcing” might provide the answer where linguistics had failed. In the famous words of the information theory researcher, Frederick Jelinek: "Every time I fire a linguist, the performance of the speech recognizer goes up." The SMT idea was used by Franz Josef Och to build Google Translate in the early 2000s. This approach, along with various SMT-RBTM hybrid and fine-tuning combinations, clearly seemed to be more promising. Since 2007, sophisticated platforms based on pure statistical machine translation for related language pairs have been built, along with hybrids in cases where the language pairs involved had little grammatical relation. One example is Moses, a statistical machine translation system funded by the European Commission that allows the automatic training of translation models for various language pairs. The training process was based on a collection of translated texts or parallel corpora. Once trained, the system could quickly find the highest probability translation among an exponential number of choices, with the help of an efficient search algorithm. Still, even these elaborate and costly systems were far from able to deliver the hoped-for quantum leap to full automation, or even a “good enough” quality of translation – as many a frustrated translator asked to “post-edit” still rather clumsy or even incomprehensible machine translated texts can testify. Also, with SMT humans are still needed to build and tweak the multi-step statistical models. The repeated rash promises to rid the world of expensive translators and interpreters within a matter of a few years became a bit of a joke among translators – even if the laughter was perhaps tinged with a sigh of relief – and further research into SMT stagnated somewhat. Apparently here to stay, however, as the most fruitful technological innovation in the translation industry, were the simultaneously developed, but much less ambitious CAT tools (computer-assisted translation tools) that could (and can) handle a variety of file formats, document formats, and tags – and into which MT options can be incorporated. Their translation memories and automated glossaries speed up the translation process, particularly for repetitive texts and updates, while providing easier terminology management and quality control (even though competitive pressures soon demanded that the initial profits of those technologies be passed on to clients).

Now there is a new kid on the block, further proof not only of the impressive ingenuity of IT minds, but also of the enormous desire (and real or perceived need) of humanity to automate itself out of existence. The buzz word is “neural machine translation” (NMT) or – to be up to date, because processes and processors are heating up real fast again – “deep NMT”. The finer conceptual and technological minutiae of the process are somewhat beyond this translator’s mental capacity, but basically the starting point, as in SMT, is again parallel corpora. The difference this time around is that the “learning” the machine does in order to be able to translate between two languages is said to happen largely without human intervention – SMT meets AI. The magic words that make this “self-learning translation system” possible are “deep learning”, “black box system”, and above all “recurrent neural networks” (RNN) and “encoding”. Here is a (very, very) simplified explanation of how that works:

With simple neural networks the same input results always in the same output:

For some applications, that’s exactly what you want. Recurrent neural networks are a bit more “clever”. They don’t just use an input to produce an output, but save previous inputs and uses them as inputs alongside new inputs, thus learning patterns. If, for example, you enter a word sequence like “How” “are” “you?” once, next time you enter “How” and “are” the machine can predict that the next word could be “you?” because it uses the previous input as additional input:

That way the system doesn’t process only the new input, but previous inputs as well. It is said to “learn from itself”. If you now add word for word encoding into the mix to get a set of unique numbers for every sentence you feed into the machine, as well as a second RNN that decodes the unique numbers into words in a sentence of another language, you get basic NMT:

(Diagrams adapted from Adam Geitgey’s text on the Medium website)

Neural networks themselves are not a new concept. What is new is their utilisation for MT, and this has been made possible by GPUs – graphic processing units that are much more powerful than regular processors. The initial results are quite impressive! Here is one example:

German original:

Sollten sich Unklarheiten betreffend der Auslegung einer oder mehrerer der

nachstehenden Vertragsbestimmungen ergeben, gilt im Zweifel die für den

Auftraggeber günstigere Auslegung.

SMT translation:

Should ambiguities concerning the interpretation of one or more of the following provisions of the Treaty, is in doubt for the customer more favorable interpretation.

NMT translation:

In the event of any uncertainties concerning the interpretation of one or more of the following provisions of the contract, the interpretation which is more favorable to the contracting entity shall apply.

What stands out immediately is the quality of the syntax of the neural translation. NMT seems to cope significantly better with syntax and produces more “natural” sounding and therefore easier to edit sentences than SMT, at least in the context of medium-sized translation companies that don’t have the capacity to get involved in high-end training of SMT. The reason for this is that the system is able to translate the semantic meaning of entire sentences rather than working with individual words and phrases. This, at least at first sight, makes post-editing much more like an editing process than one of retranslation in disguise. According to assessments by a number of researchers, NMT also looks more promising for grammatically complex and highly inflected languages and needs less corpora input for its training.

From a translator’s point of view, this represents an amazing starting point for a new translation technology. Evaluations by the MT industry, however, have thrown up some predictable issues – although it has to be said that these evaluations, too, are still in the very early stages, considering that as recently as 2014, NMT was not what they were focused on. One of the main issues is specific terminology, an area SMT developers have spent a lot of time, money and effort on by incorporating glossaries and other linguistic information. The potential for further developments and refinements with NMT in this critical area for translation providers remains an unknown at this stage. Another very common scenario in a translation business environment – tags, strings, and marked-up content – also awaits evaluation.

More generally, NMT still processes only one sentence at a time and therefore can’t take the wider context – let alone knowledge of the world – into account. Here is a simple example that illustrates this particular limitation:

1st sentence

The water was cold and the current strong. I went to the bank. My feet sunk into the mud.

2nd sentence

I had run out of cash. I went to the bank. It had just closed.

In both of these sentences, NMT renders the English word “bank” as “Bank” in German, i. e. the financial institution. Also, the pronoun “it” in the second sentence is translated as “es” when it should be “sie” (because “Bank” in German is feminine).

Changing the first example slightly, on the other hand, illustrates the strength of NMT compared to SMT, in terms of its ability to encode all the semantic qualities of the words within the sentence, rather than operating by memorizing phrase-to-phrase translations:

The water was cold and the current strong, when I went to the bank. My feet sunk into the mud.


Das Wasser war kalt und die Strömung stark, als ich zum Ufer ging. Meine Füße versanken im Schlamm.

This time “bank” is translated correctly as “Ufer”, i. e. river bank.

The same type of “correction” occurs in the second sentence, too, when we change it slightly:

I had run out of cash, so I went to the bank, but it had just closed.


Ich hatte kein Bargeld mehr, also ging ich zur Bank, aber sie hatte gerade geschlossen.

Again, within a sentence NMT is able to synchronise linguistic qualities, this time the grammatical agreement between noun and pronoun gender. A real breakthrough with NMT in this area, it seems, will depend on whether it will be able to consider whole chunks of text or even whole texts.

There are other, more mysterious, problems with NMT. For example, it sometimes leaves out whole chunks of sentences or adds chunks for no apparent reasons. This brings us to the peculiar fact that we actually don’t know exactly how NMT works, yet. By this I don’t mean that linguists like me don’t fully understand the technology, but that nobody fully understands what this technology does linguistically. Until this has become a bit clearer, it will be hard to fix existing problems technically. (Having said that, who fully understands what’s going on in a translator’s mind?)

NMT is undoubtedly still in its infancy. Even Google, Microsoft and Facebook’s foray into NMT is barely a year old and the existing NMT engines are so far only general-purpose solutions. Other issues to consider, particularly for LSPs, are data security (NMT systems generally run on the cloud), the evaluation of quality across different text types and purposes, and the incorporation of NMT in existing technology and business models. On the other hand, things are moving fast. Google Neural Machine Translation (GNMT) has already been integrated into SDL Trados Studio 2017 via an API and is available for 20 USD / one million characters (including spaces), while a new smaller player in the NMT market, DeepL (ex-Linguee), is hot on the giant’s heels. The company launched its DeepL Translator in August 2017, boldly claiming “a new standard in neural machine translation”. They, too, have announced the release of an API in the coming months.

So what will this new technology mean for LSPs? Should LSPs who have invested heavily in SMT count their losses and start over, or will the two technologies complement each other? Is this a new chance for smaller LSPs who have never had the resources to catch up with high-end SMT technology to become competitive again? At this point in time, pending further tests and developments, there are few certainties regarding the future of NMT. One is that given the more sophisticated starting point of this new technology, translation providers, big and small, can’t afford to ignore it and should be trying out the options already available and keeping a close eye on the rapidly developing situation, because NMT may well bring yet another change in what and how we translate, perhaps in a very radical and fundamental way. But the other certainty is that it is not going to replace human translators anytime soon. With NMT, the future of automated translation may indeed have arrived, but the future of a world without translators – automatic translation requiring no human input – most definitely has not.

#NeuralMachineTranslation #NMT

Featured Posts
Recent Posts
Search By Tags
No tags yet.
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square