[Estimated reading time: 10 minutes, 36 seconds. (2,121 words)]
As machine translation usage becomes more and more popular for subtitling, with leading companies now using MT in projects for the most important streaming services out there, I felt the need of testing it to see how helpful it can be for freelance AVT professionals. As I’ve been collaborating with AppTek in recent months, I decided to test theirs against Google’s to see if there are any differences when working with an MT service specialized in subtitling (AppTek) and a more general one (Google). Both services will soon be available on different translation platforms which are used on a daily basis by freelance subtitlers, so I think we need to be prepared to choose the right one for our projects.
For this test, I’ve chosen an episode of the Mexican soap opera “Te doy la vida,” which is currently being aired in Argentina with great success. My experiment was quite simple: I used an SRT transcript in Spanish of one episode (814 subtitles) and translated it with Google Translate using Subtitle Edit. Additionally, I asked AppTek to machine translate the file for me. Then, I proceeded to randomize the translations and put them together with the source text in an Excel file that looked like this:
Then, I watched the whole episode with my Excel file and went through every single subtitle trying to choose which one would be better for post-editing. I will share below some of the most relevant findings (and the final head-to-head results!), but first, let me tell you the basis for my decisions on each subtitle.
When you’re post-editing, the most important thing is to actually save time when using this work method in comparison with translating from scratch, but even though we might believe that there’s only one way of saving time with post-editing (that’s choosing the best translation, right?), I put my focus not only on recognizing which service did better at grasping the meaning of the original, but also which service rendered better-segmented subtitles. Getting help with segmentation is useful, and that’s something that only a specialized service like AppTek can give you, right? Well, that’s what I wanted to find out. Let’s dive into my findings and results.
Since we’re talking about subtitling, let’s see some of the differences in segmentation that I was able to find in the comparison.
Although we could say it’s not a crime to have two lines in example 1, we can see that Google doesn’t keep the units together and breaks the “tells/me” in two lines. Nonetheless, it’s particularly helpful to have the MT do something clever here and decide to merge two lines into one for the translation, as AppTek did.
Just like in example 1, the sense-unit “to watch/over…” is broken in Google’s case. Additionally, here we notice something that occured several times in the comparison: Google does not use a contraction for their translation in some subtitles, while AppTek does. We should probably have to dig deeper to understand the reason why this happens but suffice to say that on a very informal audiovisual text like this soap opera, it’s always better to use contractions.
Here we have an example of segmentation across subtitles, and we can see that Google’s version is wrongly segmented, while we could perfectly use AppTek’s in a professional subtitle.
Throughout the file, there were some cases in which Google did a better job with line breaks within subtitles and across subtitles, but AppTek was the winner here.
- Translation of names and places
Another interesting point in my comparison was to see how both MT services managed the (unnecessary) translation of people’s names and places. In a project like this, in which we’re dealing with a drama, there’s no need to be translating the characters’ names, or the names of the places where they eat, for instance. However, I was able to find that some names were translated.
In the cases of names like Elena, Nico, Nelson, Gabriela, Isabel, Samuel, and Gina, that don’t have a clear version in English, they were never translated. Also, the name Ernesto, which could have been translated into Ernest, was always left in Spanish. On the other hand, the name Pedro did cause some trouble, and for some reason, AppTek’s MT always translated to Peter (10 occurrences), while Google kept the original Pedro. Something similar happened with the name Agustín, which was translated to Augustine by AppTek, as well as Domingo, which was translated to Sunday. An interesting case was the word doctor, which was never capitalized by Google, but properly capitalized by AppTek for the cases in which it was used in a direct address.
The funniest of all translations was provided by AppTek, for this subtitle: – Oye, ¿ya llegó Catita?/– ¿Quién es Catita? (Catita is an affectionate version of Catalina), which was translated as: – Hey, is catty here yet?/– Who’s pussy?. Google came closer: Hey, is Catita here?/– Who is Catita?
As for the name of places (mostly names of restaurants in “Te doy la vida”) that shouldn’t be translated, we had the case of Cazuela de Lola which was translated by Google as Lola’s Cazuela, and by AppTek as Lola’s casserole, which I thought was pretty funny. Also, Pierangelo, which appeared in this sentence: ¿Qué tal le parece el Pierangelo?, and was translated by AppTek as How’s that for the pears?, and more properly by Google as How do you like the Pierangelo?
- Formal and informal treatment
As a general analysis of the usage of MT in subtitling, we could conclude that dealing with formal language is easier for the machine, which means that these services could be more helpful when working with a documentary, for instance, than very informal audiovisual texts. In the case of “Te doy la vida,” we have some serious characters (yeah, you guessed it, the villains) and some goofy characters, which provided nearly impossible-to-solve puzzles both for Google and AppTek.
Let’s see some examples of useful translations in formal contexts:
Both translations are useful, but AppTek’s has better segmentation.
Same as before, both translations are useful, but AppTek’s has better segmentation.
AppTek’s “under the name” conveys the meaning better, but both subtitles are useful.
I would prefer “have left” instead of “remains” here.
Struggling with informal texts:
The “rendida a tus pies” could be translated as “she’ll be at your feet” or “you’ll have her at your feet”. Also, the word “ánimo” is properly translated to “cheer up” by Google here.
The phrase “metele enjundia a la labia” is a very Mexican expression which in this case means “work on your words”. A possible translation here would be: Just work on your words/because you’re an ignorant, okay? Also here, donkey is awfully out of context.
Funny enough, the Spanish source uses a common English expression, which should have been translated as: Because you’re ugly with a capital U. Both Google and AppTek didn’t recognize the word efe as the letter F and that’s why they left it in Spanish.
In this last example, the screwed proposed by AppTek does work as a translation of the informal fregados.
There are many other things that we could analyze in detail, but I’m going to mention here some specific cases that caught my attention.
Sadly, there were many cases in which the MT needed to understand the context to translate properly because it was not able to do it otherwise. In this example, the source is an answer of one of the characters to a dinner invitation. AppTek was able to produce a good translation with no context in Nice to meet you, but Google failed catastrophically with a Haunted. The correct translation here would have been I’d love to.
Gender was also a big problem throughout the translation, with several cases of using the male pronoun him when it should have been the female pronoun her. Here the character is talking about a woman, so the proper translation would be we started sending her messages. Although this problem appeared throughout the whole translation, I found that AppTek did a better job in grasping these situations and using the correct pronoun.
In some cases, the original dialogues were not so good, which led to confusion during the translation. In this example, the original should have been: Pero tú tenías/que habérselo dicho,*y juntos hubieran buscado una solución. It’s just one word that’s wrong, but that leads to incorrect translations from both parts. A possible version could be: But you should have told her*and together,/you could have found a solution.
In this case, Google wrongly used the context to translate She’s only…, but AppTek did good in the second subtitle, but not on the second line of the first subtitle. A proper translation would have been: At this point, even if she knows it,/it won’t do any good.*It’s just gonna destroy Gina’s life.
- Results and conclusions
So, you wanted to know the results? Well, here they are: Out of 814 subtitles, I chose AppTek on 464 subtitles (57.0%), Google on 211 subtitles (25.9%), and both AppTek/Google (mostly cases in which both translations were the same, but also cases in which both translations were useless) on 139 subtitles (17.1%).
The most important conclusion I took out of this experiment is that MT can be helpful for the translation of soap operas unless they have an extremely large amount of informal language, which was not the case in this series—maybe next time, we’ll analyze a different type of content. Also, I find that an integral part of the post-editing process is for the translator to make the translation less artificial; I noticed that it is difficult for both MTs to move away from the original text, thus rendering translations which are close to the source, both in terms of vocabulary but also in terms of syntax.
As for my choice of MT service, I think it’s obvious that AppTek did a better job, and I believe that’s mostly because it is an MT specialized in subtitling, so I’m looking forward to seeing it integrated into subtitling software platforms for us freelance translators to use on our projects.