The final article (in this introductory series anyway) on regular expressions in Studio is looking at how to use search and replace in Studio. This capability, to use regex to replace as well as search, will only be possible with the update release of SDL Trados Studio 2011 SP2 and later and it’s a very welcome addition to the toolset provided within Studio.
So, what do I mean by this exactly? Well, I had a good example a few months ago where a user was provided with a source file containing dates like this:
The translation was supposed to return a date in this format:
12 March 2014
Studio cannot convert a short date to a long date. It can convert the short date format in one language to the short date format in another language… but not switch from short to long.
Using a regex replacement you can effect this after the event, so translate the file allowing Studio to do its thing and then at the end replace the date. To do this you first have to find the dates.
In this example you could use this pattern because the source was yyyy-mm-dd and when going from de(DE) to en(GB) this short format can stay the same:
Now, in order to use a replacement regex we have to create what is called a back reference. This is simply a way of finding a pattern and remembering it. To do this you use brackets in this format (). So for example the first part of my regex that looks for the year is this:
If I put brackets around this then the regex engine in Studio will remember this as back reference number 1. So like this:
The next part I want to remember would be the date, and by putting brackets around the date part the regex engine will remember the day as back reference number 2. So my search expression would now look like this:
To replace my search string with these back references I can use a syntax like this:
So $1 represents the first back reference and $2 the second. You can have as many of these as you like. In this case doing a search and replace like this would first find the first string that matched like this:
If I now replace them all in one go I get this:
You can see what’s happened… the first back reference and the second back reference have replaced all the strings that the search pattern matched. So this gives me yyyy dd which is not what I wanted. But what I can do is swap the order of the back references around and replace with this instead:
This should give me the dd yyyy. But this is not enough because I don’t have the month. Unfortunately it is not possible to create a replacement expression that will do all the months in one go, but I could use a maximum of 12 simple search and replace operations that would handle all the changes I’m after just in case every month of the year appears in the document. In my simple example I only need two, one for March and one for November. So this is where the other important thing to know is… you can mix plain text and back references in the replace syntax. So if I change my search pattern to this:
And my replace pattern to this:
$2 March $1
Then I can replace all the months found with 03 to months designated March.
I then amend the patterns to do November and repeat the exercise like this:
So you can see it wouldn’t take that long to do the entire year. A few interesting uses for this feature might be changing things like dd.mm.yy to mm/dd/yy for example, like this:
Or, an interesting example provided by Daniel Brockmann, where it may be useful for correcting sentences that require a particular way of writing based on a styleguide. So if we have these two sentences for example:
Auf Hilfedateien im Internet zugreifen
Auf lokale Hilfedateien zugreifen
The styleguide may prefer you start with a verb and write them like this:
Zugreifen auf Hilfedateien im Internet
Zugreifen auf lokale Hilfedateien
You could do this with the following search and replace:
Search: Auf (.*) zugreifen
Replace: Zugreifen auf $1
If you only had two sentences it’s easy to do them manually… but if you had a lot throughout your project then using regex for something like this might be very timesaving.
Incidentally… this last example introduced something new to this blog… the .* (dot star)
This is actually something to be very careful with. The dot means match anything apart from a line break as we know, and the star means keep on looking between zero and unlimited times. So in this last example if there are any words in there that do not fit the ability to swap the start and end like this then the result could be completely wrong. “Economy of accuracy” is a good thing for translators because it makes using regex easy, and the dot in particular is an easy meta character to remember… but please be careful and always make sure that the expression is only finding exactly the things you wish to replace before you click on “Replace All”..!
If you came to this article first and wondered where parts 1 and 2 were…
And we also have a Part 4 now;