001It’s been a while since I wrote anything about the SDLXLIFF Toolkit.. in fact I haven’t done since it was first released with the 2014 version of Studio.  Now that we have added a few new things such as SDLPLUGINS so that apps are better integrated and can be more easily distributed with Studio we have launched a new version of the toolkit for Studio 2017.  What’s new?  To be honest not a lot, but there are a couple of things that I think warrant this visit.

First of all, the app is now a plugin and this means it loads faster, is always available and there are a few tricks to being able to get the most from this.  Secondly, there are a few fixes to the search & replace features that make it possible to complete tasks that Studio will fail with and to do this the API team completely rebuilt the regex engine.  So whilst you won’t see too many changes, there are a few under the hood.

The best way to illustrate this is to show you so I have created a short video below where I have tried to explain how best to use the toolkit now it’s a plugin and not a standalone application, and I used the problems described below to demonstrate how it works.  If you want to know what else it can do I have reproduced part of the original guide below the video as that seems to have been lost over the years.  This might be helpful for a few of the more obscure features you may not have realised were possible.

The problems…

The full list of woes is in a couple of posts here and here but in a nutshell means the following cannot be successfully completed with out of the box search & replace operations in Studio or the old version of the toolkit:

  • search & replace the full text in all the TUs when paragraph units contain more than one TU.  When attempting a full text replacement and adding a pair of square brackets to the start and end of the segments only the last TU in the paragraph unit is affected:
  • use reserved characters as replacement values for regex search & replace.  The replace operation should allow [[ as simple characters, but Studio expects you to handle them as reserved characters with a special meaning and warns you to close the open brackets:
  • take account of tags when using search & replace.  The first tag is skipped and the second deleted resulting in a ghost tag:
  • search with a lookahead and replace.  So searching for this lookahead:
    της(?= Thunderbolt)
    and replacing with this text:
    The idea being that we replace της with του but only where it’s followed by a space and the word Thunderbolt.  Like this:
    If you do this in Studio this is found correctly, της, but it replaces it with the same text it found.

These things also affected the toolkit previously, but we have resolved them all in the latest build for Studio 2017.  I show these in context in the video as they are not things you routinely come across so you might not have ever noticed.  But if you are doing full text search/replace operations using back references in your regular expressions then you may have been frustrated in your attempts.

Approx. time: 12 minutes

What else?

To describe what else it can do I thought I’d reproduce parts of the original guide to help.  It is pretty cool!

The application itself has three main views and all of them provide a way for you to carry out various operations on entire Project, group of SDLXLIFF files or single SDLXLIFF.  The files themselves can be added by using the buttons here:
But you can also drag and drop SDLPROJ or SDLXLIFF files into the space where the files are listed in the image above.

When you carry out operations on these files they will be applied on the files that are actually selected.  So the ones that are blue in colour as shown above.  You can select them all by pressing Ctrl+a after selecting one file so the pane is active.  Alternatively you can select multiple files by holding down the Ctrl key and selecting the files you want with your mouse.



007This operation will create new SDLXLIFF files based on the selections you make, or the search criteria you provide.  The files will be created in a folder you specify after clicking on Sliceit!

The naming convention takes the original filename and then adds a hash code underscore sliced to the end like this:

Original filename.docx_7bab7841-8097-403d-ac03-ac6edd683bf2._sliced.sdlxliff

If you have selected several files, or an entire Project of files before pressing Sliceit! then a new SDLXLIFF will be created for any content found in the original files. You would then create a Project in Studio with these files and use virtual merge to open them altogether quickly and easily to handle the files and add the translations to your Translation Memory.

There is an option to merge them into one file rather than use the virtual merge in Studio, but using this option can lead to excessive processing of files with some SDLXLIFFs.  How long depends on the complexity of information in the files you are processing, so unless you have a very good reason for wanting them to be merged together maybe don’t do this and just use the virtual merge.

Also note that sliced files cannot be used to create target translations on their own, not can they be previewed in any way other than using the Print Preview in a web browser.


This operation will change the translation status AND/OR the lock status of all the segments selected based on the selections you make, or the search criteria you provide.  In addition the same selections can be used to Copy source to target for all the segments selected.  You cannot change translation status, lock status AND copy source to target in one go, but you can copy source to target first and then immediately change translation status, lock status afterwards without changing any other settings.  So the process is still quick


009This operation will quite simply clear all the target segments from your Project based on the selections you make, or the search criteria you provide.





The operation is only possible in the Replace Tab:


The basic idea being you can carry out search and replace operations across the entire Project, group of SDLXLIFF files or single SDLXLIFF.  This operation can be applied in the source or the target.  There will be more details on the use of this operation in the Views section below.



This is the first view you’ll see when you start the application and it allows you to carry out various actions on the statuses of segments in the SDLXLIFF files:


In this view you have several groups to choose from and the selections are based on OR only.  So you cannot make selections from multiple groups in one go… you have to apply the changes based on one group at a time.  The groups and their descriptions are as follows:

Translation Status

012Using this group you can select the translation status of all segments in the selected sdlxliff files and then apply the operation you want.


013Using this group you can select segments based on the score applied to them from a Translation Memory. Perfect Match and Context Match scores are obvious as you either check the boxes or you don’t, but the Match value option has a few interesting possibilities by building your own expressions.

The expressions are all standard boolean logic but I have added a few simple examples below to help give you some ideas of how these can be used.


SDLXLIFF Toolkit – expression builder

You can use:

Relational operators:

 = equal to
 != not equal to
 < less than
 > greater than
 <= less than or equal to
 >= greater than or equal to

Logical Operators:

 AND Requires all values to be true
 OR Requires one of the values to be true
 && Requires all values to be true
 || Requires one of the values to be true


 () Used to group operators together


 Select segments with match values less than 95%

>= 40
 Select segments with match values greater than or equal to 40%

(<95 AND >80) || (<50 && >30)
Select segments that have match values between 80% and 95% OR between 30% and 50%

!= 100 AND != 60
 Select segments with match values apart from 100% and 60% matches

<=90 && != 60
 Select segments that are less than or equal to 90% but also don't equal 60%

((<90 && > 80) || (< 60 AND > 50)) OR < 10
 Select segments that have match values between 80% and 90% OR between 50% and 60%.
 OR just select match values less than 10%.

These particular examples may not be useful in practice, but they should give you an idea of how you can build expressions and select segments based on unusual criteria that would be impossible in Studio alone.


014Using this group you can select segments that are locked or unlocked in Studio and then apply the operation you want.




Translation Origin

015This group allows you to select segments based on the translation origin of the segments as stored in the SDLXLIFF:

  • Translation Memory : results originating from a TM match
  • Interactive : results derived by the translator making changes
  • Automated Translation : Autolocalised segments
  • Auto-propagated : segments translated based on previously translated segments in the document



016This group allows you to select segments based on the system attribute of the segments stored in the SDLXLIFF:

  • Machine Translation : results originating from an MT engine
  • Translation Memory : results originating from a TM match
  • Propagated : segments translated based on previously translated segments in the document


Document Structure

This is an interesting use of the information in the SDLXLIFF. When you open a file in Studio you will see the right hand column contains information relating to where the segment comes from in the document.  So for example, if the segment is a Paragraph segment it will say “P“, a heading will say “H”, or a list item will say “LI”.

So to use this you select the SDLXLIFF files you want to work with and then click Generate DSI.  This will then list all the DSI types that are available to work with in the selected files.


For example, in this file you can see various types of structure.  The “TC+” represents more than one type associated with that segment:


So you can click on the coloured types and this will open up a small window in Studio explaining what this information relates to.  If I click on the “TC+” you can see I have three different types of structure recognised, and all of these also become available for selection when I Generate DSI in the SDLXLIFF Toolkit:


Clicking on the Generate DSI results in the information in this file being available for selection like this:

So I can now select all, or some of these items by holding down the Ctrl key and using the mouse.  Then I can apply whatever operation I want to the selected segments.





This next view allows me to search for anything I like inside the source AND/OR the target segments before applying any of the operations discussed in previous sections (Sliceit!, Changeit! or Clearit!).


There are some simple options that allow you to match the case of the expression, match whole words only, use regular expressions or search in tags.  The results of any search patterns found are shown in the results pane and you can also expand this to make it easier to see them by clicking on the expansion icon in the top right:


If you have results spanning multiple files then the column on the right also shows you which file the results come from.


This next view allows you to apply search and replace operations on an entire Project, group of SDLXLIFF files or single SDLXLIFF using natural language or regular expressions.  You can also use this to search and replace in source or target.


There are some great use cases for this, but in particular it’s handy for changing non recognised placeables such as dates that Studio does not see as the correct format because it uses the culture sets within your computers operating system. So in this example above the dates written as dd-mmm-yyyy are not recognised so I get this kind of thing in Studio:


By searching for these dates and replacing them without the hyphens in a single operation means I can work with the dates like this and have Studio not only correctly localise them for me, but also prevent unnecessary verification error messages as a result of the original numbering issue:


Furthermore, and because getting this wrong could lead to serious innaccuracies in your source segments, you have the opportunity to preview what the changes would look like before you hit the Replace All button.  The Preview button will make the changes with your replacement in this view only allowing you to scroll through the results in the expanded results window first so you can see very quickly if you’ve picked up anything wrong before you replace it:


The end!

001Years ago, when I was still in the Army, there was a saying that we used to live by for routine inspections.  “If it looks right, it is right”… or perhaps more fittingly “bullshit baffles brains”.  These were really all about making sure that you knew what had to be addressed in order to satisfy an often trivial inspection, and to a large extent this approach worked as long as nobody dug a little deeper to get at the truth.  This approach is not limited to the Army however, and today it’s easy to create a polished website, make statements with plenty of smiling users, offer something for free and then share it all over social media.  But what is different today is that there is potential to reach tens of thousands of people and not all of them will dig a little deeper… so the potential for reward is high, and the potential for disappointment is similarly high.

Read More

001Probably you’re all far more educated than me and when you read COTI you probably didn’t think “chuckling on the inside” did you?  I googled it and looked at four acronym websites, none of which found the correct definition… but two of them returned the title of this article so it must be right!!  Oh how I wish it was… just to bring a little levity to the ever so serious tasks of interoperability.  But no, it stands for Common Translation Interface (COTI).  This is a project pioneered by DERCOM which is the “Association Of German Manufacturers Of Authoring And Content Management Systems”… so nothing to be amused about there!

The subject of interoperability is in fact a serious one and many tools like to claim they are more interoperable than others as a unique selling point for anyone prepared to listen.  It’s also a big topic and whilst I am always going to be guilty of a little bias I do believe there isn’t a tool as interoperable as the SDL language Platform because it’s been built with support for APIs in mind.  This of course means it’s possible for developers outside of SDL to hook their products into the SDL Language Platform without even having to speak to SDL.  Now that’s interoperability!  It’s also why I probably hadn’t heard of COTI until the development was complete and I was asked to sign a plugin for SDL Trados Studio by Kaleidosope… outside of SDL I think they are the Kings of integration between other systems and the SDL language portfolio.
Read More

001One of my favourite features in Studio 2017 is the filetype preview.  The time it can save when you are creating custom filetypes comes from the fun in using it.  I can fill out all the rules and switch between the preview and the rules editor without having to continually close the options, open the file, see if it worked and then close the file and go back to the options again… then repeat from the start… again… and again…   I guess it’s the little things that keep us happy!

I decided to look at this using a YAML file as this seems to be coming up quite a bit recently.  YAML, pronounced “Camel”, stands for “YAML Ain’t Markup Language” and I believe it’s a superset of the JSON format, but with the goal of making it more human readable.  The specification for YAML is here, YAML Specification, and to do a really thorough job I guess I could try and follow the rules set out.  But in practice I’ve found that creating a simple Regular Expression Delimited Text filetype based on the sample files I’ve seen has been the key to handling this format.  Looking ahead I think it would be useful to see a filetype created either as a plugin through the SDL AppStore, or within the core product just to make it easier for users not comfortable with creating their own filetypes.  But I digress…

Read More

001Ever since Trados came about one of the most requested features for translators has been merging across hard returns, or paragraph breaks.  Certainly for handling the translation it makes a lot of sense to be able to merge fragments of a sentence that should clearly be in one, but despite this it’s never been possible.  Why is this?  You can be sure this question has come up every year and whilst everyone agrees it would be great to have this capability, Trados has not supported it through the product.  The reason for the reluctance is that when you merge a paragraph unit (the name given to translation units separated by a paragraph break) you probably need to be able to decide how this change to the structure of the file should be handled in the target document.  Sometimes this might be simple, other times it might not be, and the framework that Trados products use is not designed in a way that supports the ability to alter the look and feel of the target file across every filetype the product can open.  Even the release of the Studio suite of products still uses the same basic idea of being able to handle the bilingual files directly rather than importing them into a black box and whilst this does offer many advantages, this problem of merging over paragraph units remains… until now.

Read More

001“More power to the elbow”… this is all about getting more from the resources you have already got, and in this case I’m talking about your Translation Memories.  In particular I’m talking about enabling them for upLIFT.  upLIFT, in case you have not heard about this yet despite all the marketing activity and forum discussions since August this year, is a technology that is being used in SDL Trados Studio 2017 to enable some pretty neat things.  I’m not going to devote this article to what upLIFT is all about as Emma Goldsmith has written a really useful article today that does a far better job than I could have done.  You can find Emma’s article here, called “SDL Trados studio 2017 : fragment recall and repair“.  But a quick summary to get us started is that upLIFT enables things like this:

  • fragment matching
    • whole Translation Units
    • partial Translation Units
  • fuzzy match repair
    • from fragment matching
    • from your termbase
    • from Machine Translation

Read More

001CAT tools typically calculate wordcounts based on the source material.  The reason of course is because this way you can give your clients an idea of the cost before you start the work… which of course seems a sensible approach as you need to base your estimate on something.  You can estimate the target wordcount by applying an expansion factor to the source words, and this is a principle we see with pseudotranslate in Studio where you can set the expansion per language to give you some idea of the costs for DTP requirements in the finished document before you even start translating.  But what you can’t do, at least what you have never been able to do in all the Trados versions right up to the current SDL Trados Studio, is generate a target wordcount for those customers who pay you for work after the translation is complete and are happy to base this on the words you have actually translated. Read More

%d bloggers like this: