For personal reasons, Kevin Lossner was unable to attend “the conference on the bubbly, bright future of MT post-editors and why all good translators should be eager to hop on that gravy trainhosted by the Dutch Association of Translation Agencies (ATA) on September 9, 2011. I didn’t have a blog at the time, so we agreed that I’d take the notes, and if they were worth reading, Kevin would kindly post them on his blog. The notes were extensive and resulted in a three-part post. After 9 years and a walk down memory lane, I decided it was time to share those note on my own blog. This is the second part of my notes.

In my opinion, Dr. O’Brien’s presentation was the highlight of the day. I still have a hard time believing that someone who doesn’t believe in MT as it is being shoved down our throats had been invited to the conference. But boy, I’m sure everyone in the room was glad they had chosen this particular workshop. It was so encouraging to see that research is being done by people who are interested in the research and not in selling vaporware.

She started with a short introduction of MT and its history. MT, we learned, has only really taken off in the last ten years when rule-based systems and statistic-based systems were married to create a hybrid paradigm. Rule-based systems consist of coding dictionaries, creating rules and ensuring the rules do what they’re supposed to do. TMs, so-called data-driven corpora, are used to create a statistic-based data-driven engine. The quality of the MT’s “training,” which is done by editing translated segments, is crucial to the quality of the output.

Symantec, which uses Systran, funded the MT research at Dublin City University, but as Dr. O’Brien says herself, she was not there to blow the Symantec or Systran horn but to give us a picture of MT that is based on a real scenario in a real, live environment. Symantec uses Systran because it enables them to quickly translate virus alerts. An engineer in Latvia, for example, doesn’t need a highly polished translation but a set of understandable instructions he needs to carry out. Here, the accuracy of the translation outweighs its style. This is a perfect example of “fit for purpose,” which is taught in translation theory and implies that a translation has to be accurate rather than polished. Symantec uses MT successfully because they know what they want, have taken the pre-processing steps, have involved the engineers and translators in the process and have implemented guidelines for writing for machine translation.

Dr. O’Brien ran her post-editing test in French and Spanish and used the LISA QA metric to assess it. The test was run with a good terminology database and a good MT. The results for French and Spanish were very similar, but would have varied if other, not so well-prepared, MT engines had been used. She pointed out that quality may be subjective but that we would probably all agree that “good quality” generally means a translation that accurately reflects the meaning of the source text and that one could rely on if one’s life were in danger. She also pointed out that Asian languages will produce different errors than Western European languages because the markers are different.

Quality being the hot topic of the day, she overtly disagreed with Renato’s statement and explained that she would talk a lot about quality. According to the research, the highest quality is achieved when there is a fit between the source text and the contents of the MT. Domain-driven engines are more successful than engines based on generic data. The assumption used to be the more data, the higher the quality, but new research has shown that the quality rather than the quantity of the data is crucial and that pre-processing steps are essential! If she didn’t have our full attention, she sure had it now!

So what does the post-editing challenge consist of? It consists of, well, trained bilingual translators fixing errors in a combined MT environment. MT developers are talking about monolingual post-editing, but no one really thinks that is a good idea because there is no way of checking the accuracy of a translation if the person reviewing the text doesn’t speak the source language. Throughout her presentation, Dr. O’Brien points out time and time again that tight control is the key in every area that touches on MT and that quality issues can and should be tackled at the source.

We also learned that there are in fact several levels of post-editing: Fast post-editing, which is also referred to as gist post-editing, rapid post-editing and light-post editing, consists of essential corrections only and therefore has a quick turnaround time, and Conventional post-editing, which is also referred to as full post-editing, consists of making more corrections, which result in higher quality but a slower turnaround time.

These levels are problematic because there are no standard definitions for the terms and no agreement on what each level means, and this creates a mismatch of expectations. A good way of defining which level of post-editing a customer needs it to discuss:

  • Volume – How many words/pages?
  • Turnaround time – How much time has been planned for post-editing?
  • Quality – How polished does the translation have to be?
  • User requirements – Who are the readers and why will they be reading it?
  • Perishability – Time in the sense of when the translation is really needed
  • Text function – What is the purpose of the text?

The distinction between light and full post-editing is in fact useful. The key to determining the level of post-editing needed depends on the effort involved, meaning the quality of the initial MT and the level of output quality expected. However, the customer may not know what they want themselves and may therefore be disappointed by what they get. It should, however, be clear whether the customer wants “good enough” quality, or quality that is similar or equal to human translation.

The nature of the post-editing task will vary depending on whether the quality of the output is good. If the quality is good, post-editing will consist mainly of minor changes, such as capitalization, numbers, gender, style and maybe a few sentences that need retranslating. If the quality is bad, the situation is reversed and post-editing will consist mainly of major changes, meaning more sentences that need retranslating and a few minor changes such as capitalization, numbers, gender etc.

There are many ways of measuring the quality of MT, some of which are more useful for post-editing and localization processes than others. The quality metric example in Dr. O’Brien’s presentation is that used by Symantec. There are, however other metrics such as General Text Matcher (GTM) and Translation Edit Rate (TER). The post-task edit distance is measured by comparing raw MT output to the post-edited segment and gives a score based on the number of insertions, deletions, shifts, etc. Whichever metric is used, it is important to remember that quality issues can be tackled at the content creation and pre-processing stages.

In order to get around the cost and subjectivity of the evaluation of translation output, IBM developed Bleu scores. This metric consists of taking a raw MT sentence and comparing it to a human translation, which is the Gold metric. This metric, however, only determines the similarity between the two, not the quality. This score only works in conjunction with a reference translation. MT providers all have Bleu scores and compare them with each other, but they are only useful for system development and comparison – they are not meaningful for the post-editing effort.

An alternative to Bleu scores are confidence scores, which are generated by MT by using its own knowledge about its own probabilities and its confidence of producing a good quality translation.

In terms of productivity, research has shown that post-editing is faster than translating and that the throughput rates vary between 3,000 and 9,000 words a day. However, comparisons are often made on first-pass translation versus post-editing, i.e. there is no revision. There will always be individual variations in speed that will differ across systems and languages. Experiments of post-editing using keyboard logging software show that post-editing involves less typing than translation, which probably matters more in terms of RSI than speed because translators are generally fast typers.

The cognitive effort required by translation and editing is rarely considered in research. However, translators report being more tired after post-editing and find post-editing more tedious, probably because they have to correct something they wouldn’t have written in the first place.

Dr. O’Brien didn’t spend much time on pricing, but she did make it clear that a whole new pricing model will have to be developed for MT post-editing. In her opinion, structured feedback to the system owner should be paid for and translators should be involved in the development of the system, terminology management, dictionary coding etc.

New generations of translators will benefit the most from post-editing because they will have grown up with technology and social networks and will be more flexible in terms of quality. Research suggests that students can learn about translation through post-editing.


Diane McCartney was born in California and raised in Germany where she attended a French-German school. She set up the translation department at ASK Computer Systems, where she used a UNIX program to prepare text for translation and review. Today she is based in the Netherlands and has been running her own company since 1997.