More useful resources… and multilingual TMs

In October 2012 the European Union (EU) agency ‘European Centre for Disease Prevention and Control’ (ECDC) released a translation memory into the public domain containing 25 languages… the 23 official European languages plus Icelandic and Norwegian.  This comes in a similar format to the DGT Multilingual Translation Memory of the Acquis Communautaire that I described here in this article but this time it’s much smaller… so we can look at how to handle a single TMX file that contains all of these languages in one file using Studio.
Actually there are two ways to do this.  You can upgrade the TMX, or you can import a specific language pair.  I’ll look at the upgrade first…. but before I do you need to know where to find this TMX.  So go to the ECDC website here and download their translation memory from this section:

Once you download this file and open the zip file you’ll find eight files in there, but for this exercise the only one you need is the ECDC.tmx as this is a multilingual TM.  So if you open this in a text editor you’ll note a couple of things:

  1. It was created with “Trados Translator’s Workbench for Windows”… the latest… and last… build
  2. Each Translation Unit contains a translation for each of the 25 languages… for example:

So this is a little different to the TMs you might normally encounter that only contain the source and target languages but still a valid format for any tools that support a multilingual TMX.
Now that you have it, let’s consider the process of using this TMX in Studio.

Upgrading a Multilingual TMX

This is actually made really easy by Studio.  All you have to do is use the Upgrade Translation Memories… route here:

Selecting this brings up a window where you can select the TMX file, then you click on Next.  This brings up the screen where you can decide between three options… create a TM for each translation memory you are upgrading (you can do as many as you like in one go), group them together by language pair and create one TM for each language pair (you can have as many different types of supported TMs and different language pairs in each one) or a custom output.  So if you were a Project Manager you might find it useful to select the first option and have TMs for all 25 language pairs created for you:

But if you only want one… say English to Greek… then it would be faster and more appropriate to select the Custom option and choose only this pair… in fact I added one for each direction as these might be useful reference TMs for anyone specialising in public health material:

I can then click on Next -> Finish and I see that both TMs have been created:

I can now the TMs in Studio and I see something like this which looks ok and is ready for use… actually takes me longer to explain it than it does to do it..!:

Now, upgrading always keeps English as one of the pairs as this was the original source… srclang=”EN”… but what happens if you decided that you actually wanted to extract a TM for a language pair that didn’t contain English?

Importing the TMX

This process is slightly different because you need to create the TM in Studio first… or use an existing one, and then import the TMX into it.  Only the language pairs that are in your Studio TM will be extracted and imported from the TMX.  However this does give you even more flexibility because working this way allows you to create language pair from this resource that does not contain English.  So for example, if I work with Bulgarian to Norwegian I can create my Studio TM first (or use an existing one to skip this step):

Then I right-click on the TM in Translation Memories View and select Import:

That’s it… now I have a Bulgarian – Norwegian Studio TM from the ECDC:

So two ways to use the new Translation Memory provided into the public domain by the ECDC… both nice and simple.

0 thoughts on “More useful resources… and multilingual TMs

  1. Thanks for the clear instructions, Paul, and the news about this new .tmx.
    For some very strange reason the ECDC, like other EU agencies, likes to use the US variant of English. So anyone who wants to import the .tmx into a TM with a EN-GB variant will have to search & replace “EN” with “EN-GB” in a text editor before importing it. (Note that the .tmx doesn’t specify EN-US, but it is.)
    I’ve just added this useful new resource to my medical EU TM and used batch edit to add “ECDC” as a field value, so I’ll know where these new units came from.
    Emma

  2. Hi Emma… or just create an en-GB TM to import into. That should do the trick. Good idea with the field values… I was very tempted to add that into this description as I wrote the article but I already make these too long… one blog where two or three might be more digestible 😉

  3. An en-GB TM throws up the error message “The language pair of the import file [xx-xx->en-US] does not match the language pair of the Translation Memory [xx-xx->en-GB]”. That’s why I thought I’d mention it!
    Maybe you mean an en-US TM?

  4. Hi Emma… that’s bizarre. I just tested this by creating a new sdltm en(GB) – es(ES) and this worked fine… then tried the other way… that was ok too.
    Then I remembered… you are not using SP2R because you want the export comments in a fully formatted word document feature and there was a bug in the earlier build on TMX import that has been resolved in SP2R.
    My guess is this is the problem… maybe?
    Regards
    Paul

    1. Aha! In that case it won’t be a problem for anyone who has done the right thing and updated to SP2R, and will only affect diehards like myself 😉
      Thanks for the clarification!

  5. Dear Paul Really basic question – which program should I use to open the zip file? Thanks
    Lenne’s iPad

    1. Hi Lenne, in this case the file has a zip extension so if you are using Microsoft Windows you should be able to simply double click it and unzip the contents to a location you choose. I use Total Commander which is a separate tool similar to Windows Explorer but can do a lot more including looking into any kind of zip file.
      Regards
      Paul

  6. Hello
    I don’t have Studio but you can use a free software like Olifant and import the TMX stiputating which languages you want to import.
    I did it in French, Swedish and English and the whole operation took about one minute to have a trilingual TMX file.
    I then exported out the TMX file with a new name so that the 3 language TMX was created
    I’m not sure you could use it within Trados but Olifant has excellent search capabilities
    Regards

  7. Thank you so much for the information. Does anyone know how to import it into Memoq? Thanks in advance!

Leave a Reply