The final article (in this introductory series anyway) on regular expressions in Studio is looking at how to use search and replace in Studio. This capability, to use regex to replace as well as search, will only be possible with the update release of SDL Trados Studio 2011 SP2 and later and it’s a very welcome addition to the toolset provided within Studio.
So, what do I mean by this exactly? Well, I had a good example a few months ago where a user was provided with a source file containing dates like this:
2014-03-12
The translation was supposed to return a date in this format:
12 March 2014
Studio cannot convert a short date to a long date. It can convert the short date format in one language to the short date format in another language… but not switch from short to long.
Using a regex replacement you can effect this after the event, so translate the file allowing Studio to do its thing and then at the end replace the date. To do this you first have to find the dates.
In this example you could use this pattern because the source was yyyy-mm-dd and when going from de(DE) to en(GB) this short format can stay the same:
d{4}-d{2}-d{2}
Now, in order to use a replacement regex we have to create what is called a back reference. This is simply a way of finding a pattern and remembering it. To do this you use brackets in this format (). So for example the first part of my regex that looks for the year is this:
d{4}
If I put brackets around this then the regex engine in Studio will remember this as back reference number 1. So like this:
(d{4})
The next part I want to remember would be the date, and by putting brackets around the date part the regex engine will remember the day as back reference number 2. So my search expression would now look like this:
(d{4})-d{2}-(d{2})
To replace my search string with these back references I can use a syntax like this:
$1 $2
So $1 represents the first back reference and $2 the second. You can have as many of these as you like. In this case doing a search and replace like this would first find the first string that matched like this:
If I now replace them all in one go I get this:
You can see what’s happened… the first back reference and the second back reference have replaced all the strings that the search pattern matched. So this gives me yyyy dd which is not what I wanted. But what I can do is swap the order of the back references around and replace with this instead:
$2 $1
This should give me the dd yyyy. But this is not enough because I don’t have the month. Unfortunately it is not possible to create a replacement expression that will do all the months in one go, but I could use a maximum of 12 simple search and replace operations that would handle all the changes I’m after just in case every month of the year appears in the document. In my simple example I only need two, one for March and one for November. So this is where the other important thing to know is… you can mix plain text and back references in the replace syntax. So if I change my search pattern to this:
(d{4})-03-(d{2})
And my replace pattern to this:
$2 March $1
Then I can replace all the months found with 03 to months designated March.
I then amend the patterns to do November and repeat the exercise like this:
So you can see it wouldn’t take that long to do the entire year. A few interesting uses for this feature might be changing things like dd.mm.yy to mm/dd/yy for example, like this:
Search: (d{2}).(d{2}).(d{2})
Replace: $2/$1/$3
Or, an interesting example provided by Daniel Brockmann, where it may be useful for correcting sentences that require a particular way of writing based on a styleguide. So if we have these two sentences for example:
Auf Hilfedateien im Internet zugreifen
Auf lokale Hilfedateien zugreifen
The styleguide may prefer you start with a verb and write them like this:
Zugreifen auf Hilfedateien im Internet
Zugreifen auf lokale Hilfedateien
You could do this with the following search and replace:
Search: Auf (.*) zugreifen
Replace: Zugreifen auf $1
If you only had two sentences it’s easy to do them manually… but if you had a lot throughout your project then using regex for something like this might be very timesaving.
Incidentally… this last example introduced something new to this blog… the .* (dot star)
This is actually something to be very careful with. The dot means match anything apart from a line break as we know, and the star means keep on looking between zero and unlimited times. So in this last example if there are any words in there that do not fit the ability to swap the start and end like this then the result could be completely wrong. “Economy of accuracy” is a good thing for translators because it makes using regex easy, and the dot in particular is an easy meta character to remember… but please be careful and always make sure that the expression is only finding exactly the things you wish to replace before you click on “Replace All”..!
If you came to this article first and wondered where parts 1 and 2 were…
Regular Expressions – Part 1
Regex… and “economy of accuracy” (Regular Expressions – Part 2)
And we also have a Part 4 now;
DOGS and CATS… Regular Expressions Part 4!
Thanks Paul for the great series of posts. Very comprehensive and easy to digest!
The replace feature is not working for me though.
I’m trying to replace all instances of “[NAME] Beach” (say, Sandyhills Beach) with e.g. “Plage de…”
In the Find What, I have entered “(.*) (Beach)” and indeed it perfectly matched the bit of text I wanted to change (I could have refined my regex but this one works just fine, no pun intended!). However, when I insert Plage de $1 in the Replace With box, it replaces the initial text with literal “Plage de $1”, instead of the backreference…
Then I went ahead and attempted the same procedure in Notepad++
Find What: (.*) (Beach)
Replace With: Plage de 1
…and it worked just fine.
I looked this up somewhere else, and apparently there is an issue going on preventing Studio from recognising backreferences:
http://www.proz.com/forum/sdl_trados_support/197884-using_findreplace_to_change_commas_to_periods.html
If you have any thoughts about this, by all means let us know..
Thanks
Filipe
Hi Filipe,
I guess it’s easy to miss my opening statement… really my fault for not being more clear and for jumping the gun..! I was giving a sneak preview to the release due next week which is an update to SP2. The update contains some really nice enhancements in addition to bug fixing and making regex replacement is one of them.
So next week you will be able to download the update and do this as well… I’d recommend it.
Regards
Paul
This is excellent news. Is this a free upgrade if I already have SDL Trados Studio 2011? Also thank you for all of these very useful tips on your site, I’m finding them very useful as practical lessons on getting more from this application and I am very glad I didn’t spend my money on memoq.
Hi George,
Yes, if you already have Studio 2011 then this will be a free upgrade.
Regards
Paul
Thank, Paul for the great posts!
Can you refer to tags with regEx? I’d like to add an exception to the segmentation rules, which includes internal tags? can you do that?
thanks! (I have latest Studio 2011 SP2)
Hi Kristzian, this isn’t possible yet, but it is on a list of things we want to do.
Hi Paul,
Thank you for your great posts on regex!
Is there any chance to get a commitment from SDL as to when exactly it will be possible to use regex in tags?
I know a lot of my colleagues would appreciate Krisztians idea (refer to tags with regex for complex segmentation rules).
Is it possible that this issue is also the reason why it’s not possible to search tag content anymore?
Thanks!
Christian
Hi Christian, you can search in tags using the Batch Search & Replace App from the OpenExchange (installed with 2011). Segmentation is a different question and you would not normally segment on tags, rather you decide if they should be included in the segment when you create the filetype (I’m assuming we are talking about XML here).
Perhaps you can share an example with me and we can take a better look? You can email me using pfilkin@sdl.com
Hi Paul,
Is there an expression to find all the segments in Studio with a certain font color?
Tommy
Hi Tommy, I’m afraid not as the colours are determined by the tags and we don’t (yet) allow searching within the tags. I guess depending on the filetype, and depending on what you want to achieve with this you might be able to do this in the native source application afterwards?
Hi Paul!
Back-referene doesn’t seem to work in OpenExchange SDL Batch Find and Replace. Is this correct or am I doing something wrong? Is there a tool to search and replace multiple sdlxliff files with RegEx?
thanks a lot!!
Kriszitán
Hi Kriszitán, I’m afraid the app you’re referring to doesn’t support this. I wish it did… we have no other way to search and replace across files unless they are merged. Would be nice enhancement for the app, and also to Studio generally. Maybe you could use something like EditPad Pro and make the changes in the sdlxliff files natively… you just need to watch you only make changes to the target text.
Thank you so much for your posts Paul, i wonder if you still see the comments of this older Post. But let me have a try.
IS there any possibility to have a non-breaking-space in the replace String? Like between the number an the % character? I tried this but it only gets me “u00A0”. Had success with the Regex match auto suggest provider as well.
Thanks for helping!
Hi Derk, I think this sometimes works and sometimes doesn’t… I have not investigated this in detail. So if you copy paste the non-breaking character out of word, or use Ctrl+Shift+Space in the Studio Editor, for example then it may work. I did a quick test and replaced this:
(d{2}) (cm)
With this:
$1 $2
And it worked. The non-breaking space was copy pasted from the Studio Editor into the replace field between the backreferences as you can’t insert the non-breaking space with the keyboard into this field.
Thanks, it works, if there is already a nbsp, but if you have lots of numbers like EUR 1,000,000 to replace by 1 000 000 EUR (with NBSPes) it does not Work, nor does RegexBuddy as it seams. Any other help welcome.
Well… I created a very simple test like this:
(EUR) (d{1}),(d{3}),(d{3})
Replace with:
$2 $3 $4 $1
The spaces between the backreferences are all pasted in non-breaking spaces as before and this worked fine. What version of Studio are you using… just so I can test in the same one? I’m using 2017 obviously.
An a last one. DONE!!! My mistake was that I pasted the nbsp from Word, when you take one from Studio it works 🙂 THANKS
Excellent. I do think this would be easier if the search/replace field supported adding different types of spaces in the same way you can add them into the editor.
Thanks so much for You helping. I am using 2017 too, but it wont work I always get a breaking space instead of the nbsp. I can search for a nbsp with u00A0 but it want work in the replace field nor does the pasted nbsp. (Windws 8.1 German)
Dear Paul
Dont know if you still read this thread. You were right with all, but still the autosuggest Provider would not work. Why? I found an anwer somewhere on the net. The is an incompability between Studio 2017 and Regex Match AutoSuggest Provider when it comes to suggest numbers. It just wont work without Workaround. The workaround ist hitting ShiftControlF12 (thats what i founs on the NEt, thanks for that) and the autosuggest will apear. I hope that this will be solved some time.
Hi all,
Do you know if there’s any regular expression in Trados that can replace straight apostrophes with curly apostrophes?
Current: China’s economy
Needed: China’s economy
Thanks!
Hi Vlad,
You can just copy paste a straight apostrophe into your search field, and a curly apostrophe into the replace and this would work. If you want to only get them in places like your example then search for:
(w)x27(ws)
And replace with this:
$1’$2
I used a unicode character in the search to avoid all doubt, but you will need to paste the curly apostrophe into the replace inbetween the back references.
Hi Dear,
Thanks for your great effort on this.
I have a different question; is there a way for filtering translated strings in Studio Editor by date?
I need to get specific strings translated in a certain period of time, unfortunately I’m using a TM server with limited access, so I’m wondering is there a way for using regular Ex in Studio editor to get these required strings?
Do you mean the strings contain a date?
Hi Paul
Is the full scope of regex supported by Trados Studio documented?
For example, the simple regex
(w+)s+1
(which finds duplicate words) does not function correctly.
That looks like a bug to me Tony. If you test it in the Advanced Display Filter it works perfectly, but fails dismally in the search & replace. You can also use the SDLXLIFF Toolkit for a better Search & Replace where it also works and this has the added benefit of a source and/or target operation as well as a preview prior to applying the results. The out of the box Search & Replace is notoriously unreliable so I’d recommend you use the toolkit anyway. If you find bugs in that we can also fix them quickly.
Thanks for the prompt reply. I tried the ADF, works OK. I will also try the Toolkit.
Regards
Tony
I have a gripe with the Search/Replace window in Studio : after using a second monitor it often “flees” away from the interface and can be made visible only through a complicated operation (right-click, move, use arrows). That window cannot be docked like others, can it ? That would solve the problem once and for all.
No it can’t. Perhaps post your question into the SDL Community for some better ideas around working with this. http://xl8.one
Hi Paul
In Danish negative numbers are written with a minus, eg “-123.456,00”. When translating to English we replace the “minus” with parentheses, eg “(123,456.00)”.
Is there a search and replace Regex to use?
Hi Susanne, based on this example you could use something like this:
Search for this:
-(d{1,3}.d{3},d{0,2})
Replace with this:
($1)
But I’d recommend you post more examples into the SDL Community as I expect that won’t be completely satisfactory for your complete needs.
To match larger (and smaller) numbers, also with an optional decimal part, the regex should be extended such as
-(((d{1,3}.)(d{3}.)*d{3}|d{1,3})(,d+)?)
Nice addition… or perhaps enhance it further:
-(d{1,3}(.d{3})*(,d+)?)
Shorter expression and achieves the same thing… nice pastime 😉
True. Unfortunately, the usefulness and power of regexes is not sufficiently appreciated.
Hi Paul and Anthony – thank you both so much – it works and has made a little easier.
Your regex finds negative numbers as well as “financial highlights for 2015-2018”, but that’s ok.
Would it be possible to include hits matching “-1.10%” and replace them with “1.10%)”?
Ups – should read Would it be possible to include hits matching “-1.10%” and replace them with “(1.10%)”, sorry.
A “quick-and-dirty” regex is
-(d{1,3}(.d{3})*(([,.]d+)%?)?)
But this will also match some poorly formed numbers.
To avoid matches for terms such as 2015-2018, bounds would need to be specified, the correct bounds depend on the context.
As Paul mentioned in one of his previous “Multifaria”, consider “economy of accuracy” – how much effort do you want to spend writing a “perfect” regex (which actually may be impossible).
Thank you, Anthony – and indeed economy of accuracy – but you have provided me with very useful tools for now.
Hi Susanne, can I suggest you post your questions in here – https://community.sdl.com/product-groups/translationproductivity/f/regex_and_xpath
That’s a much better place to ask your qestions and much easier for people to help. One thing though… you said you want to include this format but now you have switched to a period decimal separator. Was that intentional?
Hi Paul – yes, thank you – will post on the community in future. And yes, it was intentional as the search and replace regex is to be used in our target language.