The Studio Terminator… err Terminjector

The title of this article is only half joking… half because the Terminjector provides a mechanism for filling a neat hole in the armour of Studio… and the other half because this application takes advantage of exactly what the SDL OpenExchange was designed to do.  It was designed to provide a mechanism for any developer to develop and plug into the Studio product to introduce capabilities that give them an advantage over anyone else, or share with others so they can get the benefit too.

In this case the developer, Tommi Nieminen, has created this plugin for himself and then published it free of charge for others to enjoy as well.  The basic idea behind the plugin is that it can modify the output of the Translation Memory before it is passed to the editor for you to use.  Why is this useful you may ask?  Well consider the following example of a document (created to demonstrate the point of course) you may wish to translate in Studio:

The first two segments were confirmed into the TM and rest of the document allowed to pick up the best match for each segment.  The point of interest here are these:

The codes CO10 0DT and AB10 1AA are variables in this document and as you can see (if it’s not too small) they have been reused throughout the document because the rest of the text is the same, apart from the dates.  The dates 20/03/1962 and 24/3/1962 have been autolocalised by Studio into the format demanded by the target language.  In this case I am going from en(GB) to en(US) just to keep this simple, so the dates were changed to 3/20/1962 and 3/24/1962 respectively.  The exception being segment #5 as here the date as a 3-digit year, 1/1/793, and this is not recognised at all, so the autolocalisation changed this to 3/20/1962 as the closest match… clearly something you don’t want as it’s easy to miss.

Dealing with the codes is something you could do in Studio by adding them as variables to the Language Resource template.  But what if there were hundreds, or thousands (in the case of post codes as this example).  The current version of the Studio desktop tool doesn’t allow for importing a list so you would have to add them one by one… not really practical for a large list.  Studio GroupShare will allow the import of variable lists but if you’re using the desktop tool this doesn’t help you.

Terminjector provides the ability to create regular expressions (here we go again… but you may be seeing how useful they can be now) that allow you to set up patterns to look for variables and dates and then replace the suggestion from the TM with the matching pattern replacement.  One word of caution though… I have mentioned before that Studio uses the .NET flavour of Regex… here Terminjector uses POSIX.  I don’t intend to go into this in a lot of detail as this is an exercise to show you the possibilities, but RegexBuddy can handle all the flavours of regex and can even convert the code between flavours… so another good reason to buy a copy.!

To make Terminjector work you simply download the plugin from the OpenExchange, unzip it and run the msi.  Then start Studio and add the Terminjector as a new Translation Provider.  It should be added to the specific language pair rather than All Languages, so in here:

You will then be presented with the Options window where for this exercise you must do two things, create a txt file (this is just a click of a button and decide where to put it), and select the Translation Memory you wish to associate the plugin with.  The full details of how to configure this and how to use it are here so I won’t go into the details now.  Rather I’ll show you the potential effect on this file.

First of all I added a regular expression like this:

([A-Z][A-Z][0-9]{1,2} [0-9][A-Z][A-Z]) \1

This is to grab the postcode from the source and then replace the Translation Memory match with it.

Then I opened the txt file that was created for me and added the expressions for the dates directly into it like this:

([A-Z][A-Z][0-9]{1,2} [0-9][A-Z][A-Z]) \1
([0-9][0-9]?)/(01)|1/([0-9]{1,4}) \1 January \3 
([0-9][0-9]?)/(02)|2/([0-9]{1,4}) \1 February \3 
([0-9][0-9]?)/(03)|3/([0-9]{1,4}) \1 March \3 
([0-9][0-9]?)/(04)|4/([0-9]{1,4}) \1 April \3 
([0-9][0-9]?)/(05)|5/([0-9]{1,4}) \1 May \3 
([0-9][0-9]?)/(06)|6/([0-9]{1,4}) \1 June \3 
([0-9][0-9]?)/(07)|7/([0-9]{1,4}) \1 July \3 
([0-9][0-9]?)/(08)|8/([0-9]{1,4}) \1 August \3 
([0-9][0-9]?)/(09)|9/([0-9]{1,4}) \1 September \3 
([0-9][0-9]?)/10/([0-9]{1,4}) \1 October \2
([0-9][0-9]?)/11/([0-9]{1,4}) \1 November \2 
([0-9][0-9]?)/12/([0-9]{1,4}) \1 December \2

This may look odd to you, but this is the second part I wanted to show you.  Within Studio you can change a short date to another formatted short date, and a long date to another formatted long date.  But you cannot change a short date into a long date.  So I decided to have a go at changing the short date here into a long one, and also get around the problem of the 3-digits for the year at the same time.

So these expressions are intended to look for a date with a 3 and change it to March, for a date with a 7 and change it to July etc.  So I needed 12 expressions to do this and you can store as many different regex expressions in here as you like, all used by the Terminjector provider when a match is found in the source text.

The effect of this when I click on a segment in Studio is something like this: 

The second result in the TM results window here is coming from the terminjector translation provider as you can see, the first one is from the Studio TM.  What you can see is that the Terminjector has correctly replaced the post code and the date with the details I needed.  Brilliant..!

If I now go through my document using these improved results then I now see this instead:

One drawback is that the TM results are all 0%.  I guess this is because there is nothing to match with in the Terminjector as it’s just a wrapper around the Studio TM.  But if it’s not important for you to show the matches in a completed bilingual file then this is certainly a good way to handle the problems of long variable lists and difficult date transitions with regular expressions.

I’d thoroughly recommend you review the help documentation with this plugin because it can do more than I have outlined here… but on this quick test I did this afternoon of the updated version I’m impressed.

Updated 4 September 2012

After writing this article I had an interesting set of questions from Mats (as you can see below), and I realised that I had not made it clear enough in the example I was showing… I also realised I had omitted some important details in order to achieve what Mats referred to which was to allow this:

To clarify the difference.  In this screenshot I have translated the first two segments only and then Terminjector has inserted the correct variables and dates into the target (the red underlines are spelling mistakes as I am still using an English target language but with Welsh text )… this works during interactive translation or pre-translation so is a powerful tool over the use of variables or simple search and replace.

In the previous example I was actually converting a no match value into the target, hence the reason for this being the same as the source, but with correct use of the regular expressions to substitute the values I searched for.  I was not replacing the values found in a TM as a fuzzy match.

The missing parts were these.  First I needed to make sure that the Terminjector TM provider was activated for update and then I don’t even need the SDLTM I had also used before.  So this is what I did.  I deactivated the file based TM I was using and checked the update box for the Terminjector:

So the Terminjector is still using the filebased TM I deactivated but this is set through the Terminjector settings and not in here… first basic error on my part.

Next, in order to replace a value that is being presented as a replacement for the target coming from the TM you have to add a third component to the regex patterns… this is clearly documented in the help but I didn’t read it properly.  So this third regex, and this is quite hard to get your head around, “refers to the text in the source segment that the capturing group expressions of the first field matched.”  I put this in quotes because I needed help from Tommi to understand this and he explained it to me using these words… better than I could explain it!

So for the post codes for example I changed this:

([A-Z][A-Z][0-9]{1,2} [0-9][A-Z][A-Z]) \1

to this:

([A-Z][A-Z][0-9]{1,2} [0-9][A-Z][A-Z]) \1 [A-Z][A-Z][0-9]{1,2}
[0-9][A-Z][A-Z]

So basically as I wanted to use the entire pattern for the post code I just needed to add it again at the end for the replacement when there is a fuzzy match from the TM.

The dates on the other hand were a little more tricky and lengthy because of the operation I decided to attempt… and I will add that I needed help with both of these concepts from Tommi.  So I changed this:

([0-9][0-9]?)/(01)|1/([0-9]{1,4}) Ionawr \1, \3

Which is a little different from the original because now I’m doing this properly and am translating to a different language to make sure it’s clear… to this:

([0-9][0-9]?)/(01)|1/([0-9]{1,4}) Ionawr \1, \3 ((Ionawr)|(Chwefror)
|(Mawrth)|(Ebrill)|(Mai)|(Mehefin)|(Gorffennaf)|(Awst)|(Medi)|(Hydref)
|(Tachwedd)|(Rhagfyr)) [0-9][0-9]?, [0-9]{1,4}

So basically I had to identify the pattern from the source with the first field:

([0-9][0-9]?)/(01)|1/([0-9]{1,4})

provide an option for a specific month in the case of a no match:

Ionawr \1, \3

and finally provide the ability to recognise the appropriate month from the possible twelve based on what was found in the source and matched, and then use it in the replacement regex for the date:

((Ionawr)|(Chwefror)|(Mawrth)|(Ebrill)|(Mai)|(Mehefin)|(Gorffennaf)
|(Awst)|(Medi)|(Hydref)|(Tachwedd)|(Rhagfyr)) [0-9][0-9]?, [0-9]{1,4}

So quite lengthy and more intricate a process than I first described.  But this is incredibly powerful because I can now pre-translate the entire file and all the variable postcodes, and in addition transform the short English UK dates to long Welsh dates.  So probably not the best example to make this look easy..!

I think for most users the ability to handle huge variable lists like this is a real plus, and when you are only thinking about one thing not even difficult to set up.  So give it a try… the capabilities once you get your head around the basics are very impressive.  A really good application from Tommi Nieminen on the SDL OpenExchange.

16 comments
  1. Hi Paul,

    Thanks for a very instructive post on a very interesting application. A few comments:

    1. I believe the drawback you mention is addressed in the Advanced settings: “Constructed segment match percentage: The value of this field is the match percentage that is used when TermInjector constructs translation proposals for segments that have no translation memory matches. The reason this setting is adjustable is that some Studio functionalities require a certain minimum match percentage. Default is 0.”

    2. It’s still not possible to download the application via OpenExchange; the very last click gives an error message. (I got my copy by asking Tommi.)

    3. That last image — is that the result of direct insertions into the target segments, or did you have to copy TM hits? I have not been able to produce direct insertions. It would be a real boon if it was possible.

    Mats

    Like

    • Hi Mats,

      You certainly got me thinking this morning… and as a result I have updated the post. Hopefully answering your questions at the same time? But specifically:

      1. The drawback was because I was not updating through the Terminjector Provider. I only updated the SDLTM as a seperate file-based TM
      2. The download should be ok using Internet Explorer or Chrome. Firefox sometimes causes a problem that I hope will be resolved soon.
      3. Direct insertions through interactive translating or pre-translation are possible… if you set it up correctly.

      I hope this is a lot more clear now?

      Paul

      Like

  2. Giles Tilling said:

    Hi Paul,
    Would this be a good solution for us, where we have changed all our drawing designations from (1 – “V”) to (“V”/1), for example, or is there an easier way of making everything in brackets a variable?

    Many thanks

    Giles

    Like

    • Hi Giles, I guess so. If you searched for something like this:

      (\d+) – (“V”)

      Then configured the replace to be this:

      \2/\1

      Then it may do the trick. I haven’t tested but I see no reason why something like this would not work

      Like

  3. Hi Paul,
    Is it possible to use unbreakable spaces? For example, I want 100,000 be 100 000 instead, separated with an unbreakable space. How, if possible, can I do that?   does not work – it’s displayed in the segment “as is” –  

    Like

    • Probably… but it depends on the format of the file you are translating. A non-breaking space is represented in different ways depending on the file format, so if you use the Terminjector, which is inserting text characters, then you need to make sure the non-breaking space is inserted appropriately. I’d say the best thing to do is contact Tommi (the developer) and I’m sure he can help you with this question.

      Like

  4. Sarah said:

    Hi there, I am just experimenting with this great app. I was also interested in converting some number with a non-breaking space. The way this is working is that you type your regex in SDL Trados, including all non-breaking spaces necessary and that is automatically included in your regex rule when doing the concordance search (F3). However, typing a non-breaking space straight in an editor such as Notepad++ doesn’t work, so you won’t see them in the editor.

    Like

  5. Hi Paul,

    so I was going to try this plugin, which apparently even works with 2014, but when I go to the download page at Open Exchange and start the download, I get the following error message: “The requested URL /terminjector21.zip was not found on this server.” (the error message might differ a bit depending on the browser I use, I tried Chrome, Firefox and IE). Since I don’t know whom to contact for support, I’ll just write you here. 🙂 I did send a message to Tommi on his homepage as well, but this might be more of a SDL-problem… Anyway, hope you can help.

    Thanks,
    /Joseph

    Like

    • Nevermind, just got a response from Tommi. He says he’s working on a new version and that he had already removed the old one from his site. Tommi said that the new version (2.2) should be available within a few weeks (with out-of-the-box compatibility with 2014). Guess I have to wait until then.

      Like

    • Hi Joseph, you might have timed this between Tommi updating and the app changing. There was a new version submitted this week but then withdrawn for some late changes so I guess this could have caused the problem. Best to wait until he resubmits as you surmised.

      Like

      • Hi Paul,

        Thanks for the quick response. Yes, I will wait for the new version. Should not be too long…

        Like

  6. David said:

    Dear Paul,

    First of all, thanks a lot for your great blog! I find it really useful, and I always look into it, so as to find out new features regarding Trados Studio.

    Now, I am looking into a way to pre-translate terms the same way as it was in Trados 2007. Using this feature, in Trados 2007, it was possible to connect a TM (even an empty one) to a termbase and configure it so that the found terms were inserted in the target segment automatically.

    As we all know, this is not possible in Studio unless external apps are used. I was researching this tool, TermInjector, which also seems to be compatible with Studio 2015 (also 2017?), and this could be a potential solution for this.

    However, the main drawback I could see for this is regarding RegEx, and the time it could take for us to prepare the files accordingly. Besides, as far as I can see, TermInjector might be useful for dealing with codes, dates, fund-names, etc., but you cannot take advantage of a proper glossary in which you can import automatically the terms you might need for translation.

    For this reason, I looked into another external application named FireballXL8 (https://fireballxl8.com/). This one looks pretty much easier to manage, although you can only use all the features if you have a premium account (id est, you pay for it, obviously), it works online (lack of confidentiality), and, unfortunately, there is no way to know which segments have been updated at a glance.

    Besides, Rainbow could be an alternative, but I have not been able to find out so far the way I could take advantage of their features for my goal.

    To sum things up, I would be really interested in knowing if there is a tool that works in a pretty similar way to the feature we used to have in Trados 2007, or something we can do, so as to import a CSV, or a glossary, and the Studio file gets automatically updated with the terms that are needed.

    Thanks a lot in advance for your time!

    I am looking forward to hearing from you.

    Kind regards,

    David

    Like

    • Hello David, I am not aware of one other than Terminjector. I think the old Trados approach was dropped in Studio because it was hit and miss from a termbase (not really a 1-2-1 releationship) and it was thought easier to do this now (back in 2009 “now”…) interactively because of the improved integration in the editor for terminology. Terminjector is a possible solution for you as this would do the job although you’d need to maintain a separate data source for this.
      You should create an idea for this here if you think it should be in the product.

      Like

  7. Vladimir said:

    Hello, Paul, thank you for you blog, it is very informative and useful! Could you please tell me whether it is possible to make Terminjector work in Studio 2017?

    Like

    • Possibly… you might need to have 2014 installed too. But if you take the sdlplugin from the folder it installs into in 2014 and copy it into 2017 then it “might” work. There are plans to upgrade it properly, but probably not for a little while.

      Like

      • Vladimir said:

        Paul, this method works very well, thank you. I have installed 2014, taken the plugin from the relevant folder and installed it for 2017. It works properly.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: