Revisiting the toolkit…

001It’s been a while since I wrote anything about the SDLXLIFF Toolkit.. in fact I haven’t done since it was first released with the 2014 version of Studio.  Now that we have added a few new things such as SDLPLUGINS so that apps are better integrated and can be more easily distributed with Studio we have launched a new version of the toolkit for Studio 2017.  What’s new?  To be honest not a lot, but there are a couple of things that I think warrant this visit.

First of all, the app is now a plugin and this means it loads faster, is always available and there are a few tricks to being able to get the most from this.  Secondly, there are a few fixes to the search & replace features that make it possible to complete tasks that Studio will fail with and to do this the API team completely rebuilt the regex engine.  So whilst you won’t see too many changes, there are a few under the hood.

The best way to illustrate this is to show you so I have created a short video below where I have tried to explain how best to use the toolkit now it’s a plugin and not a standalone application, and I used the problems described below to demonstrate how it works.  If you want to know what else it can do I have reproduced part of the original guide below the video as that seems to have been lost over the years.  This might be helpful for a few of the more obscure features you may not have realised were possible.

The problems…

The full list of woes is in a couple of posts here and here but in a nutshell means the following cannot be successfully completed with out of the box search & replace operations in Studio or the old version of the toolkit:

  • search & replace the full text in all the TUs when paragraph units contain more than one TU.  When attempting a full text replacement and adding a pair of square brackets to the start and end of the segments only the last TU in the paragraph unit is affected:
    002
  • use reserved characters as replacement values for regex search & replace.  The replace operation should allow [[ as simple characters, but Studio expects you to handle them as reserved characters with a special meaning and warns you to close the open brackets:
    004
  • take account of tags when using search & replace.  The first tag is skipped and the second deleted resulting in a ghost tag:
    003
  • search with a lookahead and replace.  So searching for this lookahead:
    της(?= Thunderbolt)
    and replacing with this text:
    του
    The idea being that we replace της with του but only where it’s followed by a space and the word Thunderbolt.  Like this:
    005
    If you do this in Studio this is found correctly, της, but it replaces it with the same text it found.

These things also affected the toolkit previously, but we have resolved them all in the latest build for Studio 2017.  I show these in context in the video as they are not things you routinely come across so you might not have ever noticed.  But if you are doing full text search/replace operations using back references in your regular expressions then you may have been frustrated in your attempts.

Approx. time: 12 minutes

What else?

To describe what else it can do I thought I’d reproduce parts of the original guide to help.  It is pretty cool!

The application itself has three main views and all of them provide a way for you to carry out various operations on entire Project, group of SDLXLIFF files or single SDLXLIFF.  The files themselves can be added by using the buttons here:
006
But you can also drag and drop SDLPROJ or SDLXLIFF files into the space where the files are listed in the image above.

When you carry out operations on these files they will be applied on the files that are actually selected.  So the ones that are blue in colour as shown above.  You can select them all by pressing Ctrl+a after selecting one file so the pane is active.  Alternatively you can select multiple files by holding down the Ctrl key and selecting the files you want with your mouse.

Operations

Sliceit!

007This operation will create new SDLXLIFF files based on the selections you make, or the search criteria you provide.  The files will be created in a folder you specify after clicking on Sliceit!

The naming convention takes the original filename and then adds a hash code underscore sliced to the end like this:

Original filename.docx_7bab7841-8097-403d-ac03-ac6edd683bf2._sliced.sdlxliff

If you have selected several files, or an entire Project of files before pressing Sliceit! then a new SDLXLIFF will be created for any content found in the original files. You would then create a Project in Studio with these files and use virtual merge to open them altogether quickly and easily to handle the files and add the translations to your Translation Memory.

There is an option to merge them into one file rather than use the virtual merge in Studio, but using this option can lead to excessive processing of files with some SDLXLIFFs.  How long depends on the complexity of information in the files you are processing, so unless you have a very good reason for wanting them to be merged together maybe don’t do this and just use the virtual merge.

Also note that sliced files cannot be used to create target translations on their own, not can they be previewed in any way other than using the Print Preview in a web browser.

Changeit!

008
This operation will change the translation status AND/OR the lock status of all the segments selected based on the selections you make, or the search criteria you provide.  In addition the same selections can be used to Copy source to target for all the segments selected.  You cannot change translation status, lock status AND copy source to target in one go, but you can copy source to target first and then immediately change translation status, lock status afterwards without changing any other settings.  So the process is still quick

Clearit!

009This operation will quite simply clear all the target segments from your Project based on the selections you make, or the search criteria you provide.

 

 

 

Replace

The operation is only possible in the Replace Tab:

010

The basic idea being you can carry out search and replace operations across the entire Project, group of SDLXLIFF files or single SDLXLIFF.  This operation can be applied in the source or the target.  There will be more details on the use of this operation in the Views section below.

Views

Statuses

This is the first view you’ll see when you start the application and it allows you to carry out various actions on the statuses of segments in the SDLXLIFF files:

011

In this view you have several groups to choose from and the selections are based on OR only.  So you cannot make selections from multiple groups in one go… you have to apply the changes based on one group at a time.  The groups and their descriptions are as follows:

Translation Status

012Using this group you can select the translation status of all segments in the selected sdlxliff files and then apply the operation you want.

Score

013Using this group you can select segments based on the score applied to them from a Translation Memory. Perfect Match and Context Match scores are obvious as you either check the boxes or you don’t, but the Match value option has a few interesting possibilities by building your own expressions.

The expressions are all standard boolean logic but I have added a few simple examples below to help give you some ideas of how these can be used.

 

SDLXLIFF Toolkit – expression builder

You can use:

Relational operators:

 = equal to
 != not equal to
 < less than
 > greater than
 <= less than or equal to
 >= greater than or equal to

Logical Operators:

 AND Requires all values to be true
 OR Requires one of the values to be true
 && Requires all values to be true
 || Requires one of the values to be true

Brackets:

 () Used to group operators together

Examples:

<95 
 Select segments with match values less than 95%

>= 40
 Select segments with match values greater than or equal to 40%

(<95 AND >80) || (<50 && >30)
Select segments that have match values between 80% and 95% OR between 30% and 50%

!= 100 AND != 60
 Select segments with match values apart from 100% and 60% matches

<=90 && != 60
 Select segments that are less than or equal to 90% but also don't equal 60%

((<90 && > 80) || (< 60 AND > 50)) OR < 10
 Select segments that have match values between 80% and 90% OR between 50% and 60%.
 OR just select match values less than 10%.

These particular examples may not be useful in practice, but they should give you an idea of how you can build expressions and select segments based on unusual criteria that would be impossible in Studio alone.

Locked/Unlocked

014Using this group you can select segments that are locked or unlocked in Studio and then apply the operation you want.

 

 

 

Translation Origin

015This group allows you to select segments based on the translation origin of the segments as stored in the SDLXLIFF:

  • Translation Memory : results originating from a TM match
  • Interactive : results derived by the translator making changes
  • Automated Translation : Autolocalised segments
  • Auto-propagated : segments translated based on previously translated segments in the document

 

System

016This group allows you to select segments based on the system attribute of the segments stored in the SDLXLIFF:

  • Machine Translation : results originating from an MT engine
  • Translation Memory : results originating from a TM match
  • Propagated : segments translated based on previously translated segments in the document

 

Document Structure

017
This is an interesting use of the information in the SDLXLIFF. When you open a file in Studio you will see the right hand column contains information relating to where the segment comes from in the document.  So for example, if the segment is a Paragraph segment it will say “P“, a heading will say “H”, or a list item will say “LI”.

So to use this you select the SDLXLIFF files you want to work with and then click Generate DSI.  This will then list all the DSI types that are available to work with in the selected files.

 

For example, in this file you can see various types of structure.  The “TC+” represents more than one type associated with that segment:

018

So you can click on the coloured types and this will open up a small window in Studio explaining what this information relates to.  If I click on the “TC+” you can see I have three different types of structure recognised, and all of these also become available for selection when I Generate DSI in the SDLXLIFF Toolkit:

019

Clicking on the Generate DSI results in the information in this file being available for selection like this:

020
So I can now select all, or some of these items by holding down the Ctrl key and using the mouse.  Then I can apply whatever operation I want to the selected segments.

 

 

 

Search

This next view allows me to search for anything I like inside the source AND/OR the target segments before applying any of the operations discussed in previous sections (Sliceit!, Changeit! or Clearit!).

021

There are some simple options that allow you to match the case of the expression, match whole words only, use regular expressions or search in tags.  The results of any search patterns found are shown in the results pane and you can also expand this to make it easier to see them by clicking on the expansion icon in the top right:

022

If you have results spanning multiple files then the column on the right also shows you which file the results come from.

Replace

This next view allows you to apply search and replace operations on an entire Project, group of SDLXLIFF files or single SDLXLIFF using natural language or regular expressions.  You can also use this to search and replace in source or target.

023

There are some great use cases for this, but in particular it’s handy for changing non recognised placeables such as dates that Studio does not see as the correct format because it uses the culture sets within your computers operating system. So in this example above the dates written as dd-mmm-yyyy are not recognised so I get this kind of thing in Studio:

024

By searching for these dates and replacing them without the hyphens in a single operation means I can work with the dates like this and have Studio not only correctly localise them for me, but also prevent unnecessary verification error messages as a result of the original numbering issue:

025

Furthermore, and because getting this wrong could lead to serious innaccuracies in your source segments, you have the opportunity to preview what the changes would look like before you hit the Replace All button.  The Preview button will make the changes with your replacement in this view only allowing you to scroll through the results in the expanded results window first so you can see very quickly if you’ve picked up anything wrong before you replace it:

026

The end!

6 thoughts on “Revisiting the toolkit…

  1. Hi Paul,

    I’ve just started using the sdlxliff toolkit and I’m particularly interested in the sliceit feature. What I would like to do is the following: I have several files to be translated from de-DE into en-US which I already converted into sdlxliff. Then, I have several files already in English, so that I would like to perform an alignment. However, it’s a lot of files, so performing an alignment for each of them would be too time consuming. If I could merge the sdlxliff files of the source language into one sdlxliff file and the sdlxliff files of the target language into another sldxliff file, I could then perform a single alignment. However, for this to be feasible, I would need the content order of the resulting merged file to match the order of the selected sdlxliff files to be merged. So, if I want to merge files named “1.sdlxliff”, “2.sdlxliff” and “3.sdlxliff”, it would be great to get a merged file where the segment order goes from those in file 1 to those in file 2 and then to 3. I hope I am being clear. The problem is, right now this does not happen. The toolkit seems to put the content of the selected files randomly together. Is it possible to influence the content order of the resulting merged sdlxliff file?

    Thanks in advance for your help.

    Cheers, Luciano

Leave a Reply