Archive

Tag Archives: translation memory

Is English (Europe) the new language on the other side of the Channel that we’ll all have to learn if Brexit actually happens… will Microsoft ever create a spellchecker for it now they added it to Windows 10?  Why are there 94 different variants of English in Studio coming from the Microsoft operating system and only two Microsoft Word English spellcheckers?  Why don’t we have English (Scouse), English (Geordie) or English (Brummie)… probably more distinct than the differences between English (United States) and English (United Kingdom) which are the two variants Microsoft can spellcheck.  These questions, and similar ones for other language variants are all questions I can’t answer and this article isn’t going to address!  But I am going to address a few of the problems that having so many variants can create for users of SDL Trados Studio.

Updating a translation memory

Emma Goldsmith wrote “SDL Trados Studio TM isn’t updating! My translation memory is empty!” in 2013 and it’s a very useful article helping users to answer the question about why their translation memories won’t update.  But there’s another one I have come across a few times recently and it’s all related to these pesky variants!  Let’s say you start using Studio and you have been working on projects using the “translate single document” approach for de(DE) into en(US), so you also have a translation memory with these variants.  Then one day you’re asked to translate into en(UK), so you create your project using de(DE) -> en(UK) thinking you’ll add the translation memory later since it didn’t appear in the way it usually does, only to discover you can’t use the translation memory you have because the language variants are different.  Fortunately someone tells you about AnyTM and you add your TM to the project anyway and all is well!

All is well until you are sent ten files and you decide to create a standard Studio project for this.  You haven’t done this before but as you work through the settings all seems well and you don’t notice that you have two target languages in the project wizard:

Even if you did notice the two languages you might not realise the implications since the one you want is there, English (United States).  You don’t even notice something could be wrong after adding your files since you see your translation memory all set up and ready to go:

You carry on working through the wizard and open the first file to work on it by double clicking on the first one in the list, you translate the file and as you work through it you notice that your TM is not being updated and you see this message in the translation results window:

You check your Project Settings and the translation memory is there and checked to update.  What’s going on?  You google the problem and review Emma’s blog… still can’t sort it, so you post into the SDL Community and nobody can help you there either!  So let’s examine the facts.

  1. You created a multilingual project that is for de(DE) -> en(UK) and de(DE) -> en(US).
  2. You have a translation memory for de(DE) -> en(US) but not for de(DE) -> en(UK)
  3. You opened the first file in the list without checking the language so alphabetically this is en(UK)

A multilingual project means you have one source language and at least two target languages.  You translate the files for each language by selecting the appropriate language from the drop down in the Files View shown above.  What this means is that the default language will be the first in the list and in this case it’s en(UK).  However you only have a translation memory for en(US) so opening the file in your project without changing the language in the drop down first means you are now translating the file without a translation memory at all and this of course explains why you get the message “No open translation memories…” and why your en(US) translation memory is not being updated.

ok… all understood so far but why does Studio show the translation memory in the Project Settings if it’s not there?  The reason is explained by the way the Language Pairs are organised in your settings:

The way Studio works is that it provides a way to apply settings and translation memories for all the languages in a multilingual project by adding them to “All Language Pairs“.  This is very handy and can save a lot of work on projects with 30 or 40 target languages.  You can also specify different settings for any one or all of the target languages by checking this box for the languages in question, in this case de(DE) -> en(UK):

If, after checking that box, I now go back to my “All Language Pairs” I would see that I am also told that the de(DE) -> en(UK) language pair is now using a different Translation Provider, followed up by the very helpful message that there are “No translation providers” because I only have a translation memory added for de(DE) -> en(US).  There are no memories available for an en(UK) target language because I have never added one.  If I applied this checkbox to the en(US) target language as well then I’d see this:

Now it starts to make sense because I can clearly see I don’t have a translation memory for one of the language pairs and it’s clear which one it is.  But unless I was always using the specific language pair settings it’s easy to miss until you understand what’s happening behind the scenes.  In fact unless you worked with multilingual projects on purpose you could be forgiven for not noticing this at all.

So, now we’ve hopefully cleared that up what are the options?  You have a few:

  1. The easiest solution if you don’t care about the differences between en(UK) and en(US) is to remove your translation memory from your settings and then add it back in again using AnyTM.  This way the same translation memory will be used by both language pairs, OR
  2. Add an existing translation memory with the right language pair to your project, OR
  3. Create a new translation memory with the missing language pair and add it to your project.  This would only really be sensible if you actually wanted to use en(UK) in the first place and it was important to have different translation memories, OR
  4. Close the file, switch to the correct target language (en(US)) in the Files View and import your translated SDLXLIFF files from the wrong language into your TM.  Make sure you don’t exclude the language variants:

    Then pretranslate your files to get back to where you were.

So not a huge problem with a few ways to resolve it.  But I think worth noting how this happens and ensure you pay attention to the number of target languages to avoid something like this happening in the future.

Merging Translation Memories

Around five years ago I created a video showing how to merge translation memories which works really nicely and shows the power of SDL Trados Studio for things like like this.  But what it doesn’t handle for merging is variants.  If I have three en(US) -> de(DE) translation memories and one en(UK) -> de(AT) translation memory then the best this feature can offer when I try to merge based on language pair is this:

Two translation memories instead of four because it can’t ignore the variants in the same way an import can.  So if you find yourself in this situation and wish to create a single English to German TM that you intend to use for all variants because you don’t worry about the differences, or you have some other way of identifying differences using Fields and Attributes for example, then the process would be this:

  1. Merge all your translation memories that are the same variants as shown in the video referred to above to get one translation memory
  2. Export the variant translation memories (Im starting to feel as though I’m being racist!) to TMX
  3. Import the TMX files into your one SDLTM and make sure you don’t check the option to exclude variants as we did for the SDLXLIFF files earlier on

So this is also fairly straightforward to solve and it probably easier than editing the language codes in the TMX file.  If you do want to use field values to identify the variants and work with one TM then you need to export all of your TMs to TMX (after merging to reduce the number if you have a lot of them) and then create a new SDLTM containing the appropriate fields like this for example:

Then import your TMX files and choose the variant that is appropriate like this for example:

The same approach would then apply to your future translations… you would select the appropriate field values for your projects and stamp each translation unit accordingly.  If you don’t remember to do this religiously then the whole idea of using fields and attributes in this way becomes a nonsense… so think carefully before you decide to work this way:

My personal opinion is that this explanation of fields and attributes is more of a “what’s possible” than “what’s sensible” and if you wish to retain the ability to uniquely identify differences between variants then you should use multiple translation memories.  This is really simple and explained in this article.  What’s also interesting is that when I wrote that one there were apparently 16 variants of English, and here we are, over 5 years later and we have 94!  How did that happen!  More importantly, if you actually receive work with these variants and you use field values just imagine how complicated it could get!

And the last thing of course (already explained in the articles I’ve referenced) if you do decide to use one translation memory for all your work is that you’ll need to use AnyTM to add your translation memory in your options and your project templates.  This will ensure your translation memory will always be used irrespective of the variants you use.

Spellchecking

The last thing I wanted to cover in relation to these pesky variants is spell checking. When you complete your translation and look for the spell check you are going to see one of these things and it doesn’t matter whether you are using Microsoft Word as your default spellchecker or Hunspell… you can still have this problem, although there is potentially better coverage with Hunspell:

It’s pretty annoying when you get the “not supported” message, but let’s look at why this is and I’ll use English as the target language for this illustration.  94 variants and Microsoft Word covers two of them, en(US and en(UK).  Now, having said that the Microsoft Word spell checker will kick in with more varieties… in fact these ones; Australia, Belize, Canada, Ireland, India, Jamaica, Malaysia, New Zealand, Philippines, Singapore, South Africa, United Kingdom and United States.  Why is this?  I have no idea but perhaps there’s a quick win in there somewhere once we find out!

For now we know you can spellcheck 13 English variants using Microsoft Word while Hunspell doesn’t fare so well out of the box as it only supports 7 variants… Caribbean, Australia, Canada, New Zealand, South Africa, United Kingdom and United States.  But Hunspell does have a significant advantage over Microsoft Word and that is… it’s a doddle to add new ones!  Here’s how it’s done.

  1. Navigate to the Studio program folder and you’ll find the HunspellDictionaries folder.  In Studio 2017 it’ll be in here:
    c:\Program Files (x86)\SDL\SDL Trados Studio\Studio5\HunspellDictionaries\
    In Studio 2015 it’s in Studio 4 and so on.
  2. In this folder you’ll find a pair of files for each language that is supported, an .AFF file and a .DIC file.
    For example en_GB.aff and en_GB.dic are the files relevant to English (United Kingdom), en_AU.aff and en_AU.dic are the files relevant to English (Australia).  The file names use the Windows Language Code Identifier (LCID) reference, or culture identifiers.
  3. You will also find a config file, there’s only one of these, called spellcheckmanager_config.xml
  4. To add a new language variant you can just make a copy of the .AFF and .DIC files for the variant you consider most similar and name them with the appropriate LCID reference, and then add a couple of lines into the spellcheckmanager_config.xml so Studio can find them.
    1. Quick tip: if you’re not sure how to get the appropriate LCID then just open any file as a single file translation and select the language variant you want.  Studio will name the file using the language codes.  Sometimes you could guess them, so en-GI is English Gibralter for example and you would name the files en_GI.aff and en_GI.dic, but I doubt you’d guess English Europe which is en-150, or English World which is en-001.
  5. Once you’ve got your .AFF and .DIC files you just make some space in the spellcheckmanager_config.xml using a decent text editor and add a few lines like this for example and save the file:
    <language>
    <isoCode>en-GI</isoCode>
    <dict>en_GI</dict>
    </language>

    1. Quick tip: pay attention to the differences in the way the codes are written in each line.  The isoCode element uses a hyphen to separate the language and variant, the dict element uses an underscore.
  6. You may find you need to have admin rights to make these changes to files in the Studio program folders.  I find it’s easier to copy them into a folder where I don’t need these rights, make the changes and save them, and then copy them back into the correct folder with admin rights afterwards.

There is a knowledgebase article here which explains how to do this, but since I’ve seen many people still struggling here’s a short video demonstrating how it’s done using English Europe, or en-150.

Duration: 9 minutes 13 seconds

 

Using segmentation rules on your Translation Memory is something most users struggle with from time to time; but not just the creation of the rules which are often just a question of a few regular expressions and well covered in posts like this from Nora Diaz and others.  Rather how to ensure they apply when you want them, particularly when using the alignment module or retrofit in SDL Trados Studio where custom segmentation rules are being used.  Now I’m not going to take the credit for this article as I would not have even considered writing it if Evzen Polenka had not pointed out how Studio could be used to handle the segmentation of the target language text… something I wasn’t aware was even possible until yesterday.  So all credit to Evzen here for seeing the practical use of this feature and sharing his knowledge.  This is exactly what I love about the community, everyone can learn something and in practical terms many of SDLs customers certainly know how to use the software better than some of us in SDL do!

Read More

The handling of numbers and units in Studio is always something that raises questions and over the years I’ve tackled it in various articles.  But one thing I don’t believe I have specifically addressed, and I do see this rear its head from time to time, is how to handle the spaces between a number and its unit.  So it thought it might be useful to tackle it in a simple article so I have a reference point when asked this question, and perhaps it’ll be useful for you at the same time.

I have a background in Civil Engineering so when I think about this topic I naturally fall back to “The International System of Units (SI)” which has a clear definition on this topic:

Read More

001“More power to the elbow”… this is all about getting more from the resources you have already got, and in this case I’m talking about your Translation Memories.  In particular I’m talking about enabling them for upLIFT.  upLIFT, in case you have not heard about this yet despite all the marketing activity and forum discussions since August this year, is a technology that is being used in SDL Trados Studio 2017 to enable some pretty neat things.  I’m not going to devote this article to what upLIFT is all about as Emma Goldsmith has written a really useful article today that does a far better job than I could have done.  You can find Emma’s article here, called “SDL Trados studio 2017 : fragment recall and repair“.  But a quick summary to get us started is that upLIFT enables things like this:

  • fragment matching
    • whole Translation Units
    • partial Translation Units
  • fuzzy match repair
    • from fragment matching
    • from your termbase
    • from Machine Translation

Read More

001Back in July 2013 I wrote an article called “Fields and Attributes in Studio” which was all about adding different types of metadata to your Translation Units every time you confirmed a segment to make it easier, or more complex depending on what you’ve done, to manage your Translation Memories.  If you’re not sure what I mean by this take a look at the article as I won’t repeat a lot of that here… at least I’ll try not to!  This capability in Studio is probably quite familiar to most users of the old SDL Trados 2007 and earlier, and was even essential to some extent because you could only use a single Translation Memory at a time.

Read More

Copyright Rudall30 | Dreamstime.comI’ve written about how to handle bilingual excel files, csv files and tab delimited files in the past.  In fact one of the most popular articles I have ever written was this one “Creating a TM from a Termbase, or Glossary, in SDL Trados Studio” in July 2012, over three years ago.  Despite writing it I’m still struggling a little with why this would be useful other than if you have been given a glossary to translate or proofread perhaps… but nonetheless it doesn’t really matter what I think because clearly it was useful!

So, why am I bringing this up three years later?  Well, the recent launch of Studio 2015 introduced a new filetype that seems worthy of some discussion.  It’s a Bilingual Excel filetype that allows you to handle excel files with bilingual content in a similar fashion to the way it used to be possible in the previous article.  There are some interesting differences though, and notably the first would be that you won’t lose any formatting in the excel file which is something that happened if you had to handle files like these as CSV or Tab Delimited Text.  That in itself mught be interesting for some users because this was the first thing I’d hear when suggesting the CSV filetype as a solution for handling files of this nature.  Most of the time I don’t think this is really an issue but for those occasions where it is this is a good point.

Read More

01This article is all about out with the old and in with the new in more ways than one!  In the last week I have been asked three times about converting Wordfast translation memories and Wordfast glossaries into resources that could be used in Studio and MultiTerm.  Normally, for the TXT translation memories I get I would go the traditional route and use a copy of Wordfast to export as TMX.  Then it’s simple, but what if you don’t have Wordfast or don’t want to have to try and use it?  Wordfast glossaries are new territory for me as I’d never looked at these before.  But on a quick check it looked as though they are also TXT files so I decided to take a better look.

Before I get into the detail I’ll just add that I’m not very familiar with Wordfast so I’m basing my suggestions on the small number of files I have received, or created, and the process I used to convert them to formats more useful for a Studio user.  I’ll start with the glossaries as this is where I got the idea from,  I better explain my opening statement too… this is because after I did an initial conversion using the Glossary Converter from the SDL Openexchange I was asked to explain how this would work with MultiTerm Convert.  This of course made me think about the old versus the new… I wouldn’t compare Wordfast and Studio in this way at all 😉 Read More

%d bloggers like this: