There's more than one way to skin a CAT

Updated: 14 January 2015
Today SDL is all about SDL Language Cloud and not BeGlobal, but I hope the article is still as relevant today.  There are more ways to look at how you use Machine Translation so if you’re interested take a look at these two more recent articles as well.
The ins and outs of AutoSuggest
Language Cloud… word-counts… best practice?
The title of this post could be quite tricky to translate in many languages because not everyone uses the expression in the same way, and certainly don’t use the same words.  I chose this especially because I thought I’d write a little about using Machine Translation in SDL Trados Studio.
I’m not going to talk about properly trained Machine Translation engines such as SDL BeGlobal, which can be configured and improved to provide remarkably good translations in a short period of time for very large numbers of words… so achieving economies of scale that would be unthinkable with human resources alone.  Instead, I’m going to talk about how a Translator can make use of the growing number of Machine Translation resources in a way that might make sense for them.
We can all have a good laugh at the way a Machine Translation can often make a mess of a translation, so a sentence like this:
“There are several ways to skin a CAT, and one of them is to use Machine Translation.”
becomes a translation like these:
“Der er flere måder at huden en kat, og en af dem er at bruge maskinoversættelse.”
“Er zijn verschillende manieren om de huid van een kat, en een van hen is het gebruik van Machine Translation.”
The Machine Engine would need to understand more about the context, or the sentence would need to be authored in a way that made it easier language to interpret. In my single sentence example the Danish fares better, probably because this expression is also used in Danish.  One of SDL’s Freelance users, Eunike Hanson, gave me a better translation where we won’t actually skin a cat:
“Der er mange måder at flå en CAT på. Maskinoversættelse er en af dem.”
The Dutch on the other hand, courtesy of an SDL Business Consultant, Fleur Schut, needed a trip to Rome to explain the same thing:
“Bij het gebruik van CAT tools zijn er meerdere wegen die naar Rome leiden. Een ervan is de integratie van automatische vertaling.”
Certainly this illustrates just one of the difficulties Machine Translation has to overcome. However, not all documentation is written in language like this, so we’re presented with a useful opportunity for Translators who have access to this kind of technology.
So first of all let’s take a look at what technology we’re talking about here.  Out of the box Studio 2011 is installed with these Machine Translation Providers:

  1. SDL BeGlobal Community
  2. SDL BeGlobal Enterprise
  3. Google Translate
  4. SDL Automated Translation

Of these the second one requires you to have access to a BeGlobal setup that has probably been trained specifically for a customer using Machine Translation for large volume translation.  But the other three are readily accessible, although Google does charge for this service.
Now imagine you have translated this sentence before
“There are several ways to use a translation tool, and one of them is to use Machine Translation.”
as;
“Een vertaaltool kan op verschillende manieren worden gebruikt, en een ervan is het toepassen van Machine Translation.”
If I then open a document, obviously one I made up, that contains this sentence and then variants of it, how long would it be before a Machine Translation of the fuzzy matches becomes more useful to you than the fuzzy match from your own Translation Memory?

The answer to this question is of course subjective, and in some cases it may even be faster to simply clear the translation and write it manually.  However, it may also be that after testing this for a while you may find that most of the time anything less than a 50% match becomes faster to translate if the Machine Translation is used instead… for example.  So in this case you could configure Studio to pre-translate your files when you create a Project or run a Pre-translate Batch Task so that only matches of 50% or more are presented from your Translation Memory, and that your chosen Machine Translation provider does the rest.  So in effect you would be doing nothing more than improving your ability to get better matching, and for some texts the results could be good enough to use as they are.
To set Studio up like this is simple.  To do this for all subsequent work you use Tools – Options and for a current Project you use Project – Project Settings, then in both cases you first add your chosen Machine Translation provider.  I’ve used the SDL BeGlobal Community account and added it as I would a normal Translation Memory here:

Next I change my settings to use the Machine Translation provider when there is no match found, or one that is less than 50% (in my example).  So first I change the minimum match value for a search to 50%:

Then I go to my batch processing options and choose the Pre-translate Files node.  In here I set the minimum match value that I wish to be used for a pre-translation from my Translation Memory, and also tell Studio to apply the  automated translation from my chosen Machine Translation provider (BeGlobal) if there are no matches meeting the above 50% criteria:

That’s it.  I save this and then run the batch task on my file and this time I see this:

So the last three segments that were all below 50% are now translated by Machine Translation.  Now, I’m sure you could debate how good or bad the Machine Translation really is, but maybe the question should really be “Can I fix this faster than I could a low fuzzy or doing it from scratch?”  The anwer won’t always be the same, but I hope you can see the potential for some texts that you may work with, particularly where you have nothing in your Translation Memory at all?
Just to finish off with I wanted to add a quick note on where to find Machine Translation providers for Studio.  This is all done through the SDL OpenExchange (now RWS AppStore).  There are a growing number available for use already which just shows how widespread Machine Translation is becoming already.  I have these currently installed:

The MyMemory plugin is a little debatable as a resource for this type of provider, but they do use Google and so do draw on Machine Translation for their results.
I left out TAUS because this is really a Translation Memory Provider based on Translation Memories uploaded by members of the TAUS Data community.
LetsMT! and LucyLT are currently in Beta so you won’t find them on the OpenExchange just yet, but they are due very soon.
Certainly I think it’s clear that with eleven choices here the Translator, or organisation looking for the ability to plug into a Machine Translation provider, has plenty of choice if they use the Studio platform for translation.  I won’t be surprised to see this number growing considerably larger as time goes by.  The platform SDL have provided through the SDL OpenExchange (now RWS AppStore) is a game changer in terms of being able to create more opportunities for interoperability and synergy.
And finally… I’ll get there in the end… you can also use as many of these as you like at the same time.  Quite a funny exercise when you have source material like the one I used at the start of this post!

Interestingly only BeGlobal knew you had to go via Rome!

11 thoughts on “There's more than one way to skin a CAT

  1. Hi, Paul. Thank you for your article. Sorry for delayed comments 🙂
    1. In the (nearest) future I would like to use my favorite MT system (PROMT) which is not available yet in your above list.
    Can you recommend me something :)? Please note, I do not work for PROMT company.
    2. Are solutions and plugins mentioned above based on any type of MT tools (desktop, server based, cloud…) or some of them only?
    Thank you in advance.

    1. Hello Oleg. The MT plugins, the easiest of all to create in fact, are developed by the vendors themselves. So if Promt want to develop one there is nothing stopping them. The list you see there is already way out of date as we have many more… some on the OpenExchange already and some not. But I guess this isn’t interesting enough for Promt. Your best bet is to contact them and ask them.
      In the meantime you can use BeGlobal Community, which is free if you like… or one of the others available through the OpenExchange.

    1. Try disabling your TM? Or lower the match value you want so MT kicks in below 70% or 60%… whatever your preference is. If you have 100% matches in your TM then this will always take precedence and you won’t get an MT match from a pretranslate. Makes sense of course because you would probably not want an MT match to replace a 100% match from your TM.

Leave a Reply