A strange title, and a stranger image with a pair of zebras and a road, but in keeping with the current fascination with animals during the SDL Spring Roadshows I thought it was quite fitting. Nothing at all to do with the subject other than the Zebras may be duplicated and they are hovering a road to somewhere that looks cold!
The problem posed at the SDL Trados Roadshow in Helsinki by some very technical attendees, after the event was over, was about how to efficiently work on a Translation Memory (TM) so you could remove all the unnecessary duplicates.
The problem can be managed through Studio using the features available in the Translation Memory Maintenance View… but only if you know which segments are duplicates so you can find them. Not really helpful in this case where we actually want to be able to find the segments with same source but different target, and then remove the ones we don’t want with the aid of better QA features that are in the Studio Editor.
So the solution we came up with was to make use of two of the things we demonstrated during the events:
- SDLTmConvert – an OpenExchange application
- Frequently Occurring Units – a Project Management feature in Studio
To demonstrate how this works I took a TM from the DGT (just a sample of around 20k TUs), upgraded it to Studio and then using the SDLTmConvert application I converted it into XLIFF files. I don’t intend to work on these files directly so I just created 4 files with around 5000 TUs in each one.
I then created a project in Studio with these files and made sure that when I did this I applied the Frequently Occurring Units feature during the analysis:
This is a very cool feature in Studio that allows you to create an SDLXLIFF file containing only the segments that occur more than the number of times you set… I selected 2. If you use this when you are working on a Project with some colleagues, but you don’t have a TM Server (SDL GroupShare) where you can actively share the same TM as you work, then by translating this file first and then pre-translating the Project you can ensure consistency for these segments. Then you share the pre-translated files out for translation… so pretty neat.
But for my purposes I’m interested in the TUs that occur more than once so I can find the duplicates in the TM and remove the ones I don’t want. So to do this I add the exported file (which is created in a folder called “Exports” in the Studio Project folder) to my Project as a translatable file and then open it with my TM attached. I now see things like this:
So 1. is my TM Results window and 2. is the active segment in my Frequently Occurring Units export. You can see I have 5 results, so I can now decide which ones I wish to keep and I can remove the rest one at a time, or by selecting more than one at a time, by right-clicking in the TM Results window and selecting “Delete Translation Unit”:
If I also ensure my TM is not set to update so I don’t mess with the context information that may be on the TU then I can work through the file confirming segments and then I know exactly where I got to and can easily return to the task on another day if there are so many TUs to correct.
I thought this was quite a neat solution using the Studio Platform to solve a problem that perhaps many people have come across and then resorted to other, perhaps more arduous means to resolve it.