“Tags” are something we normally like to avoid, whether it’s graffiti or documents prepared for translation in a CAT tool, and you can find articles and forum threads all over the internet about how to avoid them. But what if you want them… the ones in a CAT tool? Let’s say you receive a project from your client in a package, and they didn’t prepare the files as well as you would have liked, leaving you to deal with strings you’d rather have protected as tags, or even tags you don’t want to have to tackle at all. In a nutshell, if you’re using Studio you’re stuffed! You can prepare the files again as you like (possibly), translate them in your own project, and then pre-translate the real project afterwards from your TM, correcting any tag differences before returning the package to your client.
But how many people really do that? My guess is not too many. But judging by the number of times I’m asked whether it’s possible to tag up the files in Studio after the project has been prepared, or remove tags from a file after the project has been prepared, I think the number who’d like to do this is a little more. Well, you still can’t do this in Studio out of the box, but you can do it with the help of a clever plugin developed by a translating developer, or developing translator (not sure which way around this should be), Jesse Good. The application Jesse developed does a lot more than just this but I’m going to focus on a couple of interesting tagging components.
It’s called “Cleanup Tasks” and it’s freely available from the SDL AppStore (make sure you have at least version 1.2 if you already downloaded this). You can read all the details in a very useful blog article written by Jesse where he explains, “in a nutshell”, how you can handle the following:
- You can lock segments based on structure or content
- You can remove unwanted tags in the source
- You can modify the source or target text as you like and create “settings” files for easy reuse
- You can create placeholders for fixed words or phrases
- You can use StrConv from visual basic to perform many useful tasks such as;
- case conversion,
- full-width to half-width character conversion,
- convert between Hiragana and Katakana,
- convert between simplified and traditional chinese characters
The plugin creates two new batch tasks in Studio that support you running a “cleanup” on your project file(s) before or after they’ve been translated. I’d really recommend you take a careful look at the article Jesse has written because he does a better job than I could of explaining how it works. But I just want to explain how it could benefit you when faced with a file like this for example:
Clearly I made the file up to show a couple of typical problems, but I think it works for this example. There are three things to note in my file.
- It’s a good example of an awful PDF conversion for example (segment #1)
- It contains embedded HTML in a file (segments #3 – #11)
- There are some product names I may wish to protect (segments #12 & #13)
You can find plenty of examples all over the internet on how to prepare files to tackle this properly, well when I say properly I mean deal with an improperly prepared file!..For example:
How to get rid of a tag soup in Trados Studio – an article by Emma Goldsmith on dealing with the tag soup
Regex for Microsoft Word… is there no end? – an article by me on using regular expressions in Word to hide the tags in the embedded content
Both of these require work prior to bringing the files into Studio. Not too helpful if you get a package and the Project Manager didn’t prepare them well for you, or if the files in question require a separate product that you don’t own and don’t wish to purchase either! But now thanks to the help of this little plugin from Jesse tackling these types of things can be a lot easier. It should be noted that care should be taken with this because you could easily remove things that should really be there, but used carefully this application could become one of the most popular apps on the SDL AppStore. So read his blog article carefully.
The process is really simple… you just run a couple of new batch tasks created when you install the plugin. “Cleanup Source” to deal with the tags and other things, and “Cleanup Target and Generate Files” if you added tags to protect text from translation afterwards. If you don’t, and if the save target will work (this is where you must take care), then you can simply save the target files as normal. But you should have a play with this before using in anger to make sure you understand the implications for your work.
Running a single batch task on my test file above converted it to this in a couple of seconds and I can save the settings file I created to reuse in future files:
Now this is a very crude example, but this is going to be much easier to handle and as you can see the clean up is even smart enough to retain some of the formatting you might want to keep, such as bold and italic tags. The html conversion was also pretty clever because with one expression it’s even been able to distinguish between tag pairs and placeholders. Very smart… but remember the earlier warning because if you have not prepared your expressions carefully the target file might not be what you expect. I know the html part isn’t the best, let’s face it… this isn’t really the most appropriate way to handle html files in the first place so we are just looking at different ways to tackle situations which are undesirable and could easily be avoided if a little more thought were given to the localisation process in the first place.
I’m not going to reproduce the excellent article Jesse has written or explain how to do all the things it’s capable of, but I will show you a quick video using the file I created above. I think you’ll get the point… and I think you’ll like it!
Excellent job Jesse! I’m going to enjoy using this one to solve all kinds of interesting problems and here’s another link to his article in case you missed it on the way down:
Cleanup Plug-in Tool by Jesse Good.
… and the plugin itself!
Cleanup Plug-in on the SDL AppStore
0 thoughts on “Tackling a translators graffiti!”
I was wondering if you assist with Regular Expressing FileType setup on the side. We have a file that just cannot figure out how to setup in STUDIO suing REGEX…
Hi Jeff, I do it all the time, but not on the side. But first a good thing to try is asking in the SDL Community… these kinds of things are quite common in there and there are plenty of people who regularly like to help.
A gem of a tool. I don’t know what motivates such geniuses to devote their time to creating such tools, but we couldn’t do without them!
Now if Jesse can just put his mind to how to clean up unhelpfully segmented IDML files…
Absolutely agree. In this case I think it’s because Jesse is a translator and solves things he finds useful to make his own life easier… at least that would be the initial motivation. He and other developers are so good at adding features for others I love the whole community feel we get from being able to use this platform!
I have to say thanks to you, I did not know about the existence of this plugin. I agree with Jim and yourself, this translator Jesse made a good work – maybe because, as you said, he is a translator and knows about the difficulties of his profession.
With your usual thoughtfulness you have made a relevant post just when I had a question. I hope you don’t mind a related question:
We are trying to figure out how to convert plain-text tags (, and so on) to the tags in Trados that would achieve the same effect (subscript, soft return etc.). Am I right in thinking that the Cleanup Tasks app you introduce here would take care of this for us? Also (for reference), is there another way to do so?
I think if you have a source file with plain text tags and need to convert them then your best bet would be to convert them with the filetype settings and not as a post processing step which this app provides. What type of file is your source file?
Sorry, I didn’t see your response before I posted my second comment. Thank you very much. As I said below, your other post has more or less answered my question for me, but I would like to know how to deal with this in a TM or TMX as well as in sdlxliff files…
If you need it in sdlxliff files then this app, CleanUp tasks will probably do the job you need. If you need it in a TM then the only way I am aware of would be to export to TMX and then manually put tags around the things you wish to be tagged, probably using some smart regex search replace operations. But I would not recommend this because they will be useless in there if the source text is not also tagged… you’ll just lower your leverage. better to prepare the source files and handle them that way.
Just another note… I’m now on leave for 10 days or so so won’t be able to approve any more posts until I get back. So if you want more information I recommend you take the conversation over to the SDL Community where there are plenty of willing people to help you with these interesting questions.
Sorry for multiple comments! I’ve found another post of yours (at https://multifarious.filkin.com/2013/08/21/embedded_content_excel/) which is very helpful, but would still be glad to hear anything you have to say; also regarding whether the Cleanup app can be used for TMs/TMX files as well as sdlxliffs etc.