“The badass is an uncommon man of supreme style. He does what he wants, when he wants, where he wants.” (Urban Dictionary by dougdougdoug). There are in fact many definitions of what a badass is, but I like this part of one of the definitions because it really reflects what this article is about and why it’s needed. No clues so far… but let’s think anonymization!
I covered this topic a couple of times in the past for various reasons and of course it comes up all the time in the forums, and via email, with users wanting to make changes to the underlying data that Trados Studio records.
The first article above discussed the rights and wrongs of removing the fact a machine translation engine has been used, and the second article provides information on a variety of tools that can be used to protect personal data as well as how to remove the traces of having used machine translation. Having done that already why do I bring this up again?
Well, the reason is that we see more and more users still looking for ways to anonymize the usernames in their SDLXLIFF files, remove evidence of having applied machine translation, change the names of the translation memory that was used and make this possible as a batch task. The latter point is important because users of the Professional version (yes… it’s not only Freelance translators that want to do these things!) want to be able to do all, or some, of these things automatically when they finalize their projects. Last but not least, Trados Studio operates a filebased solution so the bilingual files are always accessible and can be changed with the use of a text editor. This is fine when you know what you’re doing… but we also see problems arising with corrupted bilingual files as a result of attempting to make these changes and removing important characters in error.
Yes… the translation industry is full of badass individuals who will do what they want, when they want, where they want, even if the advice is don’t! In my opinion there is absolutely no point in trying to prevent these badass users from doing something they can do anyway whether you like it or not. If they break things trying you’ll only be cutting your nose off to spite your face as it causes problems for their clients, additional work for support and leads to forum posts that often don’t tell the whole story. So, what better action to take than to make this all possible in a controlled manner?
The SDL Batch Anonymizer
The app itself is available through the SDL AppStore. It can be broken down into solutions for several things:
- one-click anonymization of your projects
- anonymization of usernames
- anonymization of machine translation
- anonymization of translation memory
one-click anonymization of your projects
If all you want to do is simply remove all evidence of having used machine translation and anonymize the usernames everywhere in your translation then this is the way to go. Check the box “Anonymize all values” and “Use settings from All Language Pairs” (if your project is multilingual):
The result will be something like taking this:
To this, where the usernames have been removed altogether and all machine translation has been replaced with “Interactive” translation:
So a very simple way to completely anonymize the main things users are looking for across all files and all languages in your project.
anonymization of usernames
The one-click method is great, but if you use the usernames for anything and just want to replace real names with names that are anonymous but still meaningful for you, then you can do this too:
Here I just type in the username I wish to use and these names will be used everywhere Studio records them in the SDLXLIFF instead of whatever was used before.
anonymization of machine translation
There are a couple of things you can do here. First of all you can simply remove all evidence of machine translation and replace with “Interactive translation“:
This gets you the same result as you would have using the one-click approach. But you can also be a little more sophisticated (read “badass”) and make it appear as though the results had come from a translation memory you name, and also with a specific fuzzy match value:
This would change your machine translated files to this (same example as before):
The NMT status has gone and has been replaced with an edited 69% match coming from an “en-de (Automotive)” translation memory.
anonymization of translation memory
The final option is really a serious badass users feature. Let’s say you had a project manager sending you packages with their Project TM attached, and they asked you to only use their Project TM. Fair enough… but in this case you know that your own Translation Memories are far more useful because you’ve been working as an automotive translator for 30-years and insist on always providing a quality job. So in Studio you do what’s easy and add your own Translation Memories to the project. Easy enough.
However, the fact you used your own Translation Memories for any results is recorded in the SDLXLIFF and this particular project manager is very “smart” and he runs random checks in the SDLXLIFF to make sure you have adhered to his request. Don’t believe me… I know it happens and I’m not sure who the badass is in this case!
So using this option you can replace the name of the TM you used with the one you were asked to use for all results that come from a Translation Memory:
Note that when you use this option it’s not possible to change the match value. This is deliberate because the intention here is to change the provider name only and leave all other values the same.
If you work with multilingual projects and choose to anonymize them it’s possible you’ll want to use different data for each language. This is possible because the settings page for SDL Batch Anonymize is replicated against each language pair like this:
In the screenshot above I set the username to de_trans and added a TM called de_TM with fuzzy match values of 69% for all machine translation results. In the Italian language I set these to it_trans, it_TM and 69% , and I made similar changes to the other languages.
Before you make these changes it’s important to tell Studio not to use the All Language Pairs settings so you simply uncheck the box here:
A cool feature here is that if you only want to make changes to one language you can do this just by leaving this box checked for each language pair you don’t wish to change, and set the All Language Pairs to have no settings at all and this option above unchecked. Now you can uncheck the specific language pair you wish to change and make the settings you need there.
Very badass indeed!
For Studio Professional users
There is an additional benefit for users working with the Professional version of Studio. One of the features of this version is the ability to create custom batch tasks so you can use the SDL Batch Anonymizer as part of this. An example of why this could be interesting is because you could create Project Templates that contain custom tasks like this:
Here I can prepare my project and have it already anonymized after pre-translating with Machine Translation and the project is ready to work on. I could also create custom Batch Tasks to do other things such as adding the SDL Batch Anonymizer to the Export Files and have the anonymization carried out in one go immediately prior to the Export. So this is a very useful feature for users working with the Professional version of Studio.
However, there is a trick to setting this up for Project Templates because the settings page is only available when you run the batch task. So here’s a couple of steps to explain how you do it:
- Create a Project Template in your preferred way (without the SDL Batch Anonymizer)
- Create a Project using this template
- Run the SDL Batch Anonymizer with the settings you require
- Right-click on the Project in the Projects View and create a template from this Project
- Select the Project Template you already created and overwrite it
This way the settings you selected for the Batch Anonymizer will be used in the Project Template when you create your Projects even though you can’t see them in the Project Settings.
Today, we are in an industry that has been at pains to provide continuing improvements in machine translation, and now that the quality is better than it has ever been we’re even willing to pay for it. So it’s almost amusing that now we want to hide the fact we used it! I say almost because the reason is clear… machine translation is misunderstood and this isn’t amusing at all. Is it because translators are embarrassed to come clean they used it to enhance their productivity? Is it because clients don’t want to pay for it as they fail to understand the work that’s required to ensure the machine translation is actually the right translation? Is it because clients don’t realise this is just another tool used by translators, that they have paid for, to help them with their professional endeavours? Maybe a little of all of the above and in all cases it’s important as an industry that our end customers understand what’s involved to deliver quality translations. But until that day comes we can at least try to manage the process with a badass approach… the SDL Batch Anonymizer!