Search and replace with Regex in Studio – Regular Expressions Part 3

The final article (in this introductory series anyway) on regular expressions in Studio is looking at how to use search and replace in Studio.  This capability, to use regex to replace as well as search, will only be possible with the update release of SDL Trados Studio 2011 SP2 and later and it’s a very welcome addition to the toolset provided within Studio.

So, what do I mean by this exactly?  Well, I had a good example a few months ago where a user was provided with a source file containing dates like this:

2014-03-12

The translation was supposed to return a date in this format:

12 March 2014

Studio cannot convert a short date to a long date.  It can convert the short date format in one language to the short date format in another language… but not switch from short to long.

Using a regex replacement you can effect this after the event, so translate the file allowing Studio to do its thing and then at the end replace the date.  To do this you first have to find the dates.

In this example you could use this pattern because the source was yyyy-mm-dd and when going from de(DE) to en(GB) this short format can stay the same:

\d{4}-\d{2}-\d{2}

Now, in order to use a replacement regex we have to create what is called a back reference.  This is simply a way of finding a pattern and remembering it.  To do this you use brackets in this format ().  So for example the first part of my regex that looks for the year is this:

\d{4}

If I put brackets around this then the regex engine in Studio will remember this as back reference number 1.  So like this:

(\d{4})

The next part I want to remember would be the date, and by putting brackets around the date part the regex engine will remember the day as back reference number 2.  So my search expression would now look like this:

(\d{4})-\d{2}-(\d{2})

To replace my search string with these back references I can use a syntax like this:

$1 $2

So $1 represents the first back reference and $2 the second.  You can have as many of these as you like.  In this case doing a search and replace like this would first find the first string that matched like this:

If I now replace them all in one go I get this:

You can see what’s happened… the first back reference and the second back reference have replaced all the strings that the search pattern matched.  So this gives me yyyy dd which is not what I wanted.  But what I can do is swap the order of the back references around and replace with this instead:

$2 $1

This should give me the dd yyyy.  But this is not enough because I don’t have the month.  Unfortunately it is not possible to create a replacement expression that will do all the months in one go, but I could use a maximum of 12 simple search and replace operations that would handle all the changes I’m after just in case every month of the year appears in the document.  In my simple example I only need two, one for March and one for November.  So this is where the other important thing to know is… you can mix plain text and back references in the replace syntax.  So if I change my search pattern to this:

(\d{4})-03-(\d{2})

And my replace pattern to this:

$2 March $1

Then I can replace all the months found with 03 to months designated March.

I then amend the patterns to do November and repeat the exercise like this:

So you can see it wouldn’t take that long to do the entire year.  A few interesting uses for this feature might be changing things like dd.mm.yy to mm/dd/yy for example, like this:

Search:   (\d{2})\.(\d{2})\.(\d{2})

Replace:   $2/$1/$3

Or, an interesting example provided by Daniel Brockmann, where it may be useful for correcting sentences that require a particular way of writing based on a styleguide.  So if we have these two sentences for example:

Auf Hilfedateien im Internet zugreifen
Auf lokale Hilfedateien zugreifen

The styleguide may prefer you start with a verb and write them like this:

Zugreifen auf Hilfedateien im Internet
Zugreifen auf lokale Hilfedateien

You could do this with the following search and replace:

Search:   Auf (.*) zugreifen

Replace:   Zugreifen auf $1

If you only had two sentences it’s easy to do them manually… but if you had a lot throughout your project then using regex for something like this might be very timesaving.

Incidentally… this last example introduced something new to this blog… the .* (dot star)

This is actually something to be very careful with.  The dot means match anything apart from a line break as we know, and the star means keep on looking between zero and unlimited times.  So in this last example if there are any words in there that do not fit the ability to swap the start and end like this then the result could be completely wrong.  “Economy of accuracy” is a good thing for translators because it makes using regex easy, and the dot in particular is an easy meta character to remember… but please be careful and always make sure that the expression is only finding exactly the things you wish to replace before you click on “Replace All”..!

If you came to this article first and wondered where parts 1 and 2 were…

Regular Expressions – Part 1

Regex… and “economy of accuracy” (Regular Expressions – Part 2)

And we also have a Part 4 now;

DOGS and CATS… Regular Expressions Part 4!

22 comments
  1. fsamora said:

    Thanks Paul for the great series of posts. Very comprehensive and easy to digest!

    The replace feature is not working for me though.

    I’m trying to replace all instances of “[NAME] Beach” (say, Sandyhills Beach) with e.g. “Plage de…”

    In the Find What, I have entered “(.*) (Beach)” and indeed it perfectly matched the bit of text I wanted to change (I could have refined my regex but this one works just fine, no pun intended!). However, when I insert Plage de $1 in the Replace With box, it replaces the initial text with literal “Plage de $1”, instead of the backreference…

    Then I went ahead and attempted the same procedure in Notepad++
    Find What: (.*) (Beach)
    Replace With: Plage de \1

    …and it worked just fine.

    I looked this up somewhere else, and apparently there is an issue going on preventing Studio from recognising backreferences:

    http://www.proz.com/forum/sdl_trados_support/197884-using_findreplace_to_change_commas_to_periods.html

    If you have any thoughts about this, by all means let us know..

    Thanks

    Filipe

    Like

    • Hi Filipe,
      I guess it’s easy to miss my opening statement… really my fault for not being more clear and for jumping the gun..! I was giving a sneak preview to the release due next week which is an update to SP2. The update contains some really nice enhancements in addition to bug fixing and making regex replacement is one of them.
      So next week you will be able to download the update and do this as well… I’d recommend it.
      Regards
      Paul

      Like

  2. George said:

    This is excellent news. Is this a free upgrade if I already have SDL Trados Studio 2011? Also thank you for all of these very useful tips on your site, I’m finding them very useful as practical lessons on getting more from this application and I am very glad I didn’t spend my money on memoq.

    Like

    • Hi George,
      Yes, if you already have Studio 2011 then this will be a free upgrade.
      Regards
      Paul

      Like

  3. Krisztian said:

    Thank, Paul for the great posts!
    Can you refer to tags with regEx? I’d like to add an exception to the segmentation rules, which includes internal tags? can you do that?

    thanks! (I have latest Studio 2011 SP2)

    Like

    • Hi Kristzian, this isn’t possible yet, but it is on a list of things we want to do.

      Like

      • Christian said:

        Hi Paul,

        Thank you for your great posts on regex!

        Is there any chance to get a commitment from SDL as to when exactly it will be possible to use regex in tags?
        I know a lot of my colleagues would appreciate Krisztians idea (refer to tags with regex for complex segmentation rules).
        Is it possible that this issue is also the reason why it’s not possible to search tag content anymore?

        Thanks!
        Christian

        Like

      • Hi Christian, you can search in tags using the Batch Search & Replace App from the OpenExchange (installed with 2011). Segmentation is a different question and you would not normally segment on tags, rather you decide if they should be included in the segment when you create the filetype (I’m assuming we are talking about XML here).
        Perhaps you can share an example with me and we can take a better look? You can email me using pfilkin@sdl.com

        Like

  4. Tommy said:

    Hi Paul,

    Is there an expression to find all the segments in Studio with a certain font color?

    Tommy

    Like

    • Hi Tommy, I’m afraid not as the colours are determined by the tags and we don’t (yet) allow searching within the tags. I guess depending on the filetype, and depending on what you want to achieve with this you might be able to do this in the native source application afterwards?

      Like

  5. Krisztian said:

    Hi Paul!

    Back-referene doesn’t seem to work in OpenExchange SDL Batch Find and Replace. Is this correct or am I doing something wrong? Is there a tool to search and replace multiple sdlxliff files with RegEx?

    thanks a lot!!
    Kriszitán

    Like

    • Hi Kriszitán, I’m afraid the app you’re referring to doesn’t support this. I wish it did… we have no other way to search and replace across files unless they are merged. Would be nice enhancement for the app, and also to Studio generally. Maybe you could use something like EditPad Pro and make the changes in the sdlxliff files natively… you just need to watch you only make changes to the target text.

      Like

  6. Derk von Moock said:

    Thank you so much for your posts Paul, i wonder if you still see the comments of this older Post. But let me have a try.
    IS there any possibility to have a non-breaking-space in the replace String? Like between the number an the % character? I tried this but it only gets me “\u00A0”. Had success with the Regex match auto suggest provider as well.
    Thanks for helping!

    Like

    • Hi Derk, I think this sometimes works and sometimes doesn’t… I have not investigated this in detail. So if you copy paste the non-breaking character out of word, or use Ctrl+Shift+Space in the Studio Editor, for example then it may work. I did a quick test and replaced this:

      (\d{2}) (cm)

      With this:

      $1 $2

      And it worked. The non-breaking space was copy pasted from the Studio Editor into the replace field between the backreferences as you can’t insert the non-breaking space with the keyboard into this field.

      Like

      • Derk von Moock said:

        Thanks, it works, if there is already a nbsp, but if you have lots of numbers like EUR 1,000,000 to replace by 1 000 000 EUR (with NBSPes) it does not Work, nor does RegexBuddy as it seams. Any other help welcome.

        Like

      • Well… I created a very simple test like this:

        (EUR) (\d{1}),(\d{3}),(\d{3})

        Replace with:

        $2 $3 $4 $1

        The spaces between the backreferences are all pasted in non-breaking spaces as before and this worked fine. What version of Studio are you using… just so I can test in the same one? I’m using 2017 obviously.

        Like

      • Derk von Moock said:

        An a last one. DONE!!! My mistake was that I pasted the nbsp from Word, when you take one from Studio it works 🙂 THANKS

        Like

      • Excellent. I do think this would be easier if the search/replace field supported adding different types of spaces in the same way you can add them into the editor.

        Like

  7. Derk von Moock said:

    Thanks so much for You helping. I am using 2017 too, but it wont work I always get a breaking space instead of the nbsp. I can search for a nbsp with \u00A0 but it want work in the replace field nor does the pasted nbsp. (Windws 8.1 German)

    Like

    • Derk von Moock said:

      Dear Paul
      Dont know if you still read this thread. You were right with all, but still the autosuggest Provider would not work. Why? I found an anwer somewhere on the net. The is an incompability between Studio 2017 and Regex Match AutoSuggest Provider when it comes to suggest numbers. It just wont work without Workaround. The workaround ist hitting ShiftControlF12 (thats what i founs on the NEt, thanks for that) and the autosuggest will apear. I hope that this will be solved some time.

      Like

  8. Vlad said:

    Hi all,

    Do you know if there’s any regular expression in Trados that can replace straight apostrophes with curly apostrophes?

    Current: China’s economy
    Needed: China’s economy

    Thanks!

    Like

    • Hi Vlad,

      You can just copy paste a straight apostrophe into your search field, and a curly apostrophe into the replace and this would work. If you want to only get them in places like your example then search for:

      (\w)\x27(\w\s)

      And replace with this:

      $1’$2

      I used a unicode character in the search to avoid all doubt, but you will need to paste the curly apostrophe into the replace inbetween the back references.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: