Studio Tips

If there’s one thing I firmly believe it’s that I think all translators should learn a little bit of regex, or regular expressions.  In fact it probably wouldn’t hurt anyone to know how to use them a little bit simply because they are so useful for manipulating text, especially when it comes to working in and out of spreadsheets.  When I started to think about this article today I was thinking about how to slice up text so that it’s better segmented for translation; and I was thinking about what data to use.  I settled on lists of data as this sort of question comes up quite often in the community and to create some sample files I used this wikipedia page.  It’s a good list, so I copied it as plain text straight into Excel which got me a column of fruit formatted exactly as I would like to see it if I was translating it, one fruit per segment.  But as I wanted to replcate the sort of lists we see translators getting from their customers I copied the list into a text editor and used regex to replace the hard returns (\r\n) with a comma and a space, then broke the file up alphabetically… took me around a minute to do.  I’m pretty sure that kind of simple manipulation would be useful for many people in all walks of life.  But I digress….

I now have three files created from my list and I’ll use these to try and explain the concepts of segmenting text in SDL Trados Studio through the use of segmentation rules in your translation memory.

  1. a comma plus space separated list
  2. a comma only separated list
  3. a tab delimited list

In Studio without creating custom segmentation rules I’m going to see something like these:

comma plus space

comma only

tab delimited

I think all of these are the most common sort of problems we see users dealing with and hopefully they will allow me to explain the concepts you can use for segmenting on anything.  So, first of all let’s see where you create these rules.  Trados Studio segments files based on the structure of the file first and then using segmentation rules in your translation memory.  So if you have a word file and you end a sentence wth a full stop followed by a space, or you press the enter key which inserts a hard return (or a paragraph break), you will get a separate sentence (or segment) in Studio when you open them up.  Like this:

If you are working with markup files, like XML or HTML, then the segmentation is controlled by the parser rules for the separate elements.  For example, opening an XML file with the parser rules set incorrectly could lead to something like this where the file is segmented based on full stops at the end of each sentence, but also where the inline elements (<nice>, <info> and <not>) have been incorrectly set as structural elements (see this article for more information on this rather large topic):

But if the structure of the file is all set up correctly then the only thing you might have to address is the segmentation of text that is placed into Studio as a single segment when you’d prefer to see it broken down further, which brings me back to my sliced fruit.

Segmentation Rules

Segmentation rules are held on the translation memory or in language resource templates.  The options are exactly the same in both places.  Language resource templates are very similar to project templates in that they provide you with a way to create a new translation memory based on a configuration you use a lot.  So things like your Variable List, Abbreviation List, Ordinal Follower List and Segmentation Rules can all be set up in a language resource template and used as the basis of a new transaltion memory whenever you create one:

You create a new language resource template in the Translation Memories view by selecting New -> New Language Resource Template as shown above.  You can also create a new translation memory based on a previous one, but using these templates is handy because they take up little space and can easily be shared with others.  I think it would be handy to have an Apply Translation Memory Template application that worked in a similar way to Apply Project Template… and we might look at the feasibility of this in the near future.

To edit the segmentation rules in a translation memory you open it in the Translation Memory view and then select Settings from the ribbon, or right-click and select Settings from the context menu:

An important point to note is that you must select a translation memory to use the ribbon icon because otherwise it will be greyed out.  Once you’ve done this you’ll find the segmntation rules under the Language Resources node.  If you create a language resource template you’ll find the only settings in there are these listed under the Language Resources node shown below, so everything I’m about to explain is the same for both.

You then select Segmentation Rules and click on Edit.  There are two options in the next window:

  • Paragraph based segmentation
  • Sentence based segmentation

The first option, paragraph based, does not support customised segmentation rules.  It is just a way to segment your files based on paragraph as opposed to sentence and a paragraph is determined by the filetype structure I explained earlier.  You can read a little more about paragraph segmentation in this article as it’s an useful option under the right circumstances.  For our sliced fruit we are going to be working with the Sentence based segmentation where you’ll find three default rules:

  • Full stop rule
  • Other terminating punctuation (question mark, explanation mark)
  • Colon

Generally I’d leave these alone unless you have a very specific reason for wanting to change them and you understand why this will be necessary.  There are options when you edit them to add exceptions in addition to, or instead of, changing the rules and if you do play around in this area I’d recommend trying to use the exceptions first as this is usually a lot safer.  But we’re going to add new rules.

Adding Segmentation Rule

New segmentation rules work by defining three pieces of information:

  1. what characters are there before you break to start a new sentence
  2. what character do you actually want to break on
  3. what characters appear after the break

There are two views where you can apply these rules, a Basic View:

And an Advanced View:

You’ll notice that the Advanced View only has two places for information, whereas the Basic has three.  This is because the Advanced View uses regular expressions and the Before break pattern incorporates the first two pieces of information that you enter into the Basic View.

The rules you create are handled sequentially.  So if we wanted to segment the first fruit file which is a comma plus space between the fruits we have to do two things:

  1. break before the comma
  2. break after the space

The reason we want to do this is so we are able to handle the words on their own and just filter out the comma and space in the editor.  Doing this is not always obvious because the basic View doesn’t give us all the options for the things we need, unless they are really basic, and trying to add them in the Basic View and then switching to the Advanced View often leads to expressions that don’t work the way you expected as they can be escaped twice which then looks for the existence of a backslash as opposed to the backslash referring to a particular pattern.  So to work around this I always use the information that is correct in the Basic View and then use a capital X for the information that is not there.  This allows me to edit the Advanced View more easily as the basic requirements are there.  This is best explained by an example:

Before break – I want to identify that there are letters before the last letter which is the break character

Break characters – this will be the last letter before the comma. I can’t enter this with an expression so I use a capital X

After break – I want a comma and a space, so use \s for the space and check Regular Expression

When I switch to the Advanced View I see this:

It’s hard to read, so this is what I have:


It’s simple to see the capital X now and all I have to do is replace this with the regex for a word character.  For this I’ll use a \w and my expression looks like this:


When I open the file using the translation memory containing these rules Studio will open the file like image below with the text segmenting before the comma.

So all I have to do now is create the second rule to break after the comma and the space.  This one is quite interesting because I don’t have anything before the break as the comma is now at the start of the segment, so I delete this entry and leave it blank.  I then enter a capital X for the break characters so I can add the comma and the space in the Advanced View and select Text for after the break:

This gives me the following:


I add a regex for the comma and space to replace the X like this:


Then when I open the file in Studio this time I get exactly what I need and I can filter out the commas to see something like this:

Now working with the file for translation is a doddle and I’ll ensure only the words are entered into my translation memory, and it’s easier to add them to a termbase if I like and ensure term recognition as the commas won’t be in the way.

Finally, as this article got much longer than I originally intended (hopefully because I included things that are useful for you and not just because I thought it was a good idea!) I have created a video showing how to do this for all three fruit files in succession.

Duration: 15 min 16 seconds

Every time a new release of SDL Trados Studio is released there are usually a flurry of blogs and videos explaining what’s in them, some are really useful and full of details that will help a user decide whether the upgrade is for them or not, and others are written without any real understanding of what’s in the software or why the upgrade will help.  That’s really par for the course and always to be expected since everyone is looking for the things they would like to meet their own needs.  So for me, when I’m looking for independent reviews of anything, I find the more helpful reviews give me as much information as possible and I can make my own mind up based on the utility I’ll get from it, the fun in using it and the cost of upgrade.  I put a couple of what I would consider helpful reviews here as they both try to cover as many of the new features available as possible.  So if you are in the early stages of wondering at a high level what’s in it for you then you could do a lot worse than spending 10 or 20 minutes of your time to read/watch the contributions from Emma and Nora below.

Read More

Studio 2019 has arrived and it brings with it some nice features on the surface, and some important improvements under the hood… but it also brings with it a lot more upgrades than just Studio, and I don’t just mean MultiTerm!  The SDL AppStore is one of the unique benefits you get when you work on the SDL technology stack and there are hundreds of apps available that can provide additional resources, custom filetypes, file converters, productivity enhancements, manuals, etc.  When you upgrade your version of Studio you are also going to have to upgrade your apps.  Many of the apps are maintained by the SDL Community team and these have all been upgraded ready for use in Studio 2019, but the majority have been created and maintained by others.  I’ve written this article to explain what you need to look out for as a user of SDL Trados Studio or MultiTerm, and also as a reference guide for the developers who might have missed the important information that was sent out to help them with the process. Read More

Is English (Europe) the new language on the other side of the Channel that we’ll all have to learn if Brexit actually happens… will Microsoft ever create a spellchecker for it now they added it to Windows 10?  Why are there 94 different variants of English in Studio coming from the Microsoft operating system and only two Microsoft Word English spellcheckers?  Why don’t we have English (Scouse), English (Geordie) or English (Brummie)… probably more distinct than the differences between English (United States) and English (United Kingdom) which are the two variants Microsoft can spellcheck.  These questions, and similar ones for other language variants are all questions I can’t answer and this article isn’t going to address!  But I am going to address a few of the problems that having so many variants can create for users of SDL Trados Studio.

Read More

There’s always been the occasional question appearing on the forums about data protection, particularly in relation to the use of machine translation, but as of the 25th May 2018 this topic has a more serious implication for anyone dealing with data in Europe.  I’ve no intention of making this post about the GDPR regulations which come into force in May 2016 and now apply, you’ll have plenty of informed resources for this and probably plenty of opinion in less informed places too, but just in case you don’t know where to find reliable information on this here’s a few places to get you started:

With the exception of working under specific requirements from your client, Europe has (as far as I’m aware) set out the only legal requirements for dealing with personal data.  They are comprehensive however and deciphering what this means for you as a translator, project manager or client in the translation supply chain is going to lead to many discussions around what you do, and don’t have to do, in order to ensure compliance.  I do have faith in an excellent publication from SDL on this subject since I’m aware of the work that gone into it, so you can do worse than to look at this for a good understanding of what the new regulations mean for you.

Read More

I’m pretty sure that when we started to build the new Customer Experience Team in Cluj last year that there was nothing in the job description about being competitive… but wow, they are!!!  I’d be lying if I said I wasn’t competitive, because I know I am, but it’s been a long time since I’ve had these kinds of feelings that keep me up at night.

To some extent I think the training requirements at SDL are the perfect fuel for this type of environment and I haven’t made up my mind yet whether it’s healthy or not.  But in their roles the team speak with customers through the online chat, in the community, via email… basically anywhere anyone comes in with a question because they don’t have a support contract or an account manager to ask and they didn’t know about the SDL Community which is of course the best place to go for help.  To be able to answer the variety of technical questions we see, all the team have either completed or are working through the various SDL Certifications available at a rate of knots and are learning more about the sort of problems faced by translators and project managers just by having to help people every day.  They are doing a fantastic job!

Read More

In the years that the SDL AppStore has been around I get asked one question on a fairly regular basis… “How can I find out about new apps or updates to existing apps?”.  A very reasonable question of course and one that has not been addressed particularly well, albeit there have been ways to keep yourself informed.  The ultimate solution we all want to see is the AppStore embedded into SDL Trados Studio, but as that isn’t going to happen for a while here’s a couple of ways you can still keep yourself aware of the updates.  The first is via twitter and this has been around for a while; the second is using an RSS feed which is brand new as of today!

Read More

%d bloggers like this: