Archive

Tag Archives: segmentation

Ever since the release of Studio 2009 we have had the concept of Language Resource Templates, and ever since the release of Studio 2009 I’d risk a bet that most users don’t know what they’re for or how to use them.  To be fair this is hardly a surprise since their use is actually quite limited out of the box and access to the goodies inside is pretty hard to get at.  It’s been something I used to see users complain about a long time ago but for some years now I rarely see them mentioned anymore.  This article, I hope, might change that.

But before anything else I should explain what a Language Resource Template is.  If you open a Translation Memory in your Translation Memory Management View, and then go to your settings you’ll see something like this:

Variable List

In practice, “variables” in Studio are words or phrases that don’t change at all when you translate them.  So it’s useful to be able to ensure they are handled automatically in Studio by defining lists containing these “variables”.  I’ve covered these in the past… some 6 or 7 years ago!

Abbreviation List

By default Studio will segment sentences at the end of a sentence.  But if you use “abbreviations” in the middle of your sentences like acc. (according to), Cert. (certificate) or Pharmacol. (pharmacology, pharmacological) for example then you don’t want the sentences to break at these points.  For some languages there are default sets of “abbreviations“, but you can customise the list to suit your needs.

Ordinal Follower List

In some languages ordinal numbers can be followed with a fullstop, and when this happens you don’t want the sentence to break at this point.  In German the date 25th December would be written as 25. Dezember.  In order to prevent the sentence from segmenting after the 25. Studio allows you to enter Dezember.  Again there are defaults for some languages, but you can add anything you like to the ordinal follower list to control segmentation in some way.  For example in my sentence before last you could add the word Studio to the list if you wanted to prevent the sentence from segmenting between 25. and Studio.

Segmentation Rules

There are default rules for every language that define how sentences are segmented.  They normally cover a “Full stop rule” and rules for “Other terminating punctuation”.  I’m pretty sure they are also exactly the same defaults used for every language (I only checked a few) so you might want to change these to suit.  Certainly there are many reasons for adapting these rules and I have addressed these on and off over the years:

All of these things are what you will find in a Language Resource Template and you can create a template without the Translation Memory from the same menu you might use to create a Translation Memory when you are working in the Translation Memories view:

It looks like this:

So that’s all interesting, and they are very useful.  But the problem with them is that they are applied on the Translation Memory.  They are not a separate resource that you could reuse on a different Translation Memory, unless you were creating a new one.  So if you had 20 Translation Memories and they all used the same Language Resources then making a change to just one of these resources would mean you have to make this change twenty times.  Painful, but reluctantly doable once.  But if you have more Translation Memories, and if you regularly update them then this probably becomes unworkable for most users pretty quickly.  I aso didn’t mention that these rules control the source and the target segmentation, so if you do a lot of alignment then you can double the effort described; and if you handle more than one language pair then you can multiply all of that effort by each language pair you are responsible for maintaining!

Phew… is it any wonder many users are not familiar with Language Resource Templates?

applyTMTemplate

You may be familiar with the extremely useful Apply Studio Project Template application developed in 2015 which allows you to quickly apply the settings in a Project Template to any Studio project you like.  Well, we had an idea to take the same principle and make it possible to apply the settings in a Language Resource Template to any number of Translation Memories you like.  This way you can maintain at least one template and only have to change this one to be able to apply the changes to hundreds of others.  Pretty cool!  You can find the applyTMTemplate application on the SDL AppStore here.

After installing the plugin you’ll find it in the Add-Ins menu in all the Views in Studio, like this:

And if you’re a keyboard jockey then you can also set a custom shortcut for this feature in your File -> Options -> Keyboard Shortcuts… maybe useful if you do a lot of updating TMs:

To demonstrate how this app works I created 12 Translation memories out of six different language pairs by also creating TMs in the opposite direction.  I then created one single Language Resource Template and edited all four of the resources for each of the four languages.  All I did then was select the Language Resource Template, tick all the resources I wished to be applied (these are actually checked by default), drag and dropped all 12 Translation Memories into the window and clicked on Apply.  Almost instantly the resources I added into this single template were applied to all of the TMs:

You can see a visual confirmation that the operation was successful as all of the source and target languages have a tick against them.  If a language fails, and I did find one in testing (which will probably be resolved in due course) then the TM that failed will be displayed something like this:

Here the resources for Swahili (Congo) failed to update, but the rest did.  I’m not sure where the error comes from at the moment but it was only when I decided to try a slightly exotic language for fun that I discovered the problem.  I have no doubt it will be identified and resolved in due course… but for now it serves as a good example of what will happen if a language fails to update.  Now that I know this I could go in and manually update this one, but that’s a small price to pay compared to all the work I’d have to do for all of the languages before this application was made available!

Some friendly words of warning

As with all Translation Memory operations, always back them up before you carry out any kind of operation on them and thoroughly test how they work and can be exported after these Language Resource Operations.  Your Translation Memories are valuable assets!

And unlike Translation Memories which have a pop-up to show you where they are when you hover over them in the Translation Memory view, like this:

Language Resource Templates don’t do that.  So don’t forget where you put them or you’ll be searching your computer for *.sdltm.resource files to find out where they are.  I only mention this because I had the very problem twice!  I can only think that the lack of awareness around Language Resource Templates is why nobody ever noticed!  Perhaps this will change now we have this great tool that brings the concept of templates for Translation Memories a new lease of life!

If there’s one thing I firmly believe it’s that I think all translators should learn a little bit of regex, or regular expressions.  In fact it probably wouldn’t hurt anyone to know how to use them a little bit simply because they are so useful for manipulating text, especially when it comes to working in and out of spreadsheets.  When I started to think about this article today I was thinking about how to slice up text so that it’s better segmented for translation; and I was thinking about what data to use.  I settled on lists of data as this sort of question comes up quite often in the community and to create some sample files I used this wikipedia page.  It’s a good list, so I copied it as plain text straight into Excel which got me a column of fruit formatted exactly as I would like to see it if I was translating it, one fruit per segment.  But as I wanted to replcate the sort of lists we see translators getting from their customers I copied the list into a text editor and used regex to replace the hard returns (\r\n) with a comma and a space, then broke the file up alphabetically… took me around a minute to do.  I’m pretty sure that kind of simple manipulation would be useful for many people in all walks of life.  But I digress….

Read More

Using segmentation rules on your Translation Memory is something most users struggle with from time to time; but not just the creation of the rules which are often just a question of a few regular expressions and well covered in posts like this from Nora Diaz and others.  Rather how to ensure they apply when you want them, particularly when using the alignment module or retrofit in SDL Trados Studio where custom segmentation rules are being used.  Now I’m not going to take the credit for this article as I would not have even considered writing it if Evzen Polenka had not pointed out how Studio could be used to handle the segmentation of the target language text… something I wasn’t aware was even possible until yesterday.  So all credit to Evzen here for seeing the practical use of this feature and sharing his knowledge.  This is exactly what I love about the community, everyone can learn something and in practical terms many of SDLs customers certainly know how to use the software better than some of us in SDL do!

Read More

#01The new alignment tool in Studio SP1 has certainly attracted a lot of attention, some good, some not so good… and some where learning a few little tricks might go a long way towards improving the experience of working with it.  As with all software releases, the features around this tool will be continually enhanced and I expect to see more improvements later this year.  But I thought it would be useful to step back a bit because I don’t think it’s that bad!

When Studio 2009 was first launched one of the first things that many users asked for was a replacement alignment tool for WinAlign.  WinAlign has been around since I don’t know when, but it no longer supports the modern file formats that are supported in Studio so it has been overdue for an update for a long time.

Read More

Cartoon by Martin RowsonAs I’m writing this I can hear the cry of “Use a CAT tool for translating literature, or prose… no way!”  This is a discussion I see from time to time and there are some pretty strong feelings on this subject for a number of reasons. One of the reasons given is that you cannot take this type of material sentence by sentence and just do a literal translation.  Other reasons may be more detail around this same point, and also touch on the need for a creative flow because this type of translation requires a very creative writing style rather than literally translating the words.

Read More

%d bloggers like this: