Introducing the multilingual XML… super filetype!

I was compelled to make a return to a previous theme around Marvel Comics because it’s the only way I can do justice to the amazing work the RWS AppStore team carry out on a daily basis.  There are some things you just can’t wait to get up in the morning for, and for me, one of these things is being able to work with this team on a daily basis.  The first meeting of every day for me is with this team and what a fantastic way to start the day it is!  I started this article by mentioning Marvel, but as you’ll see, the hero of this story is probably a Honey Badger!

The API (Application Programming Interface) documentation (recently updated here – https://developers.rws.com/), used by developers to help them create the knots that tie their solutions to the RWS products, contain a number of simple examples which can be used as a starting block.  One of these is something I’ve been waiting to see turned into a proper app suitable for anyone to use, and to date I’m only aware of a couple of developers who did this for their own use.  One developer released a bilingual XML filetype onto the appstore by mistake some years ago and I even wrote about it… but then had to remove the article and the app a few days later when they realised they’d made it public in error!  But now, I’m really happy to say we’ve made the time to address this for more general consumption via the appstore.  The solution the appstore team came up with is spectacular and worth waiting for, but it’s also simple and incredibly useful!

Bilingual XML Filetype

What am I talking about?  A bilingual XML filetype, and the relevant API documentation for anyone who is interested is the Filetype Support Framework and as you can see this does already contain an example of how to build a bilingual XML filetype.  But before I go any further, what exactly do I mean by a bilingual XML Filetype and why would it be useful?  Well, here’s an example of the sort of file I see from time to time… or rather a fabricated file containing some of the trickiest things to deal with:

This file is actually an example of a multilingual XML file supporting translations in multiple languages, but the problems of how to handle it in Studio are relevant.  In this small example of only four segments in Studio we have the following issues to deal with:

  1. The file is not monolingual and we have to be able to read one element and write the translation into another
  2. The file is partially translated so the workaround of using regex to copy source from the source elements into the target elements in the source file is not appropriate
  3. There are html tags and CDATA within the translatable text
  4. There are also non-translatable placeholders in the text.  So {0} and {1} for example
  5. The language codes are not anything Trados Studio can recognise

But wait… this isn’t just bilingual, it’s a

Multilingual XML Filetype

So there are two more problems!

  1. ideally we should be able to create a multilingual project from this single source file
  2. when we have completed the project we need to be able to rebuild the single multilingual XML target file as opposed to having multiple files, one for each target language

I started off by saying this was so simple, but in fact it’s not!  The problems that anyone having to deal with when faced with a file like this, especially if they are the project manager having to handle all the target languages, are not trivial.  But despite this the appstore team have managed to create a solution that pretty much does address these problems by taking the Honey Badger approach!  This filetype doesn’t give a … hoot about standards, or specifications.  Files have to well formed, but after that anything goes.  The app is designed with just being able to get the translatable content out and do the work!  After all, that’s what you get paid for!!

How does it work?

You can find a fairly detailed explanation of how to work with this filetype here in the RWS Community, including a video from the RWS Autumn Roadshow where the filetype was first introduced.  The app wasn’t completely finished at the time and there were still a few things we wanted to complete before releasing, but it is now available in the appstore and ready for use!

The user interface

The basic idea is you need to tell the filetype a few things:

  1. what’s the file extension (xml, xliff, tmx etc.)?
  2. where should the different translations be in the file?
  3. what languages should the translations be in?
  4. should an embedded content processor be used?
  5. do you have to handle placeables using regex because an embedded content processor won’t pick them up?
  6. do you need to handle entity conversions
  7. do you want to provide support for any quick inserts?

So probably all sound fairly familiar apart from 2. and 5.

2. where should the different translations be in the file?

Trados Studio can only handle pre-defined bilingual files such as XLIFF, TTX, ITD and SDLXLFF for example.  It cannot handle multilingual filetypes at all (unless you are just extracting a single language to work on with a custom XML filetype), and it certainly can’t support the creation of a multilingual project that makes proper use of all the languages in the file.  One of the reasons for this is that files like this could have been prepared with whatever structure the developer felt most appropriate to use for their own purposes.  So the interface needs to reflect this.  For example:

In this example I am managing a TMX file for translation with 25 languages.  One file to create a single multilingual project.  The Language Mapping interface has two parts:

  1. Languages Root
  2. Languages

In the root I need to specify where in the file the languages can be located.  So I do this using an absolute XPath query (an XML technology I have discussed before in case you’re new to this).  For a TMX which looks something like this:

The Languages Root XPath query would therefore be:

/tmx/body/tu

The languages are all contained within the //tuv/seg elements and defined by the use of an xml:lang attribute.  We can use this attribute to tell the app where each language goes.  So using this same example we have these relative XPath queries:

English
tuv[@xml:lang=’EN’]/seg

Bulgarian
tuv[@xml:lang=’BG’]/seg

And so on for all 25 languages in my file.  Incidentally if you’d like a good explanation of absolute and relative XPath queries, as well as an introduction to working with XPath then this W3 Schools is a good place to start.

Once you start to work with this filetype you’ll see how logical and well thought out this interface is.  Every file is likely to be different and this provides the flexibility to handle them.

5. do you have to handle placeables using regex because an embedded content processor won’t pick them up?

This is something I expect every Trados Studio user working with embedded content in their files will be wishing was available in all the filetypes.  Frankly I have no clue why it isn’t!  If you don’t know what I mean then take the elements in this file for example:

Some years ago Trados Studio introduced the ability (for some filetypes) to handle CDATA sections using an embedded content processor, such as the html filetype for example.  This was great and it significantly cuts down the work involved in creating regular expression rules for files containing content like this which was the process before this feature was introduced.  However, you still get CDATA that not only contains html, but it also contains placeables.  You are then forced to look for workarounds (Data Protection Suite or Clean up tasks for example) or just manually handle them while translating.  This is sub-optimal.  So in the multilingual XML filetype the developer added some settings to allow you to tag up any content you like using regular expressions in addition to the use of embedded content processors.

There are some default rules to give you a head start and an idea of how to use this, but you can create as many as you need.  An important point to note is that you create the expressions to suit your content.  If one of the defaults works for you then that’s great… they do cover some common scenarios… but they are not intended to be the answer for all placeables!

Such a simple solution though… should be available for every filetype in Trados Studio!

The batch tasks

It’s just a filetype so why do we need batch tasks?  It seems so far that everything in this article has two reasons… and this question is no exception!

  1. we need to be able to import the translations for each language in the project if the file is partially translated
  2. we need to be able to put the fully translated multilingual XML file back together again when the project is complete

When you install the plugin you will also find you have two new batch tasks:

  • Import Multilingual Translations, and
  • Generate Multilingual Translations

If you have the Freelance version of Trados Studio then you will have to run these batch tasks manually after creating your projects in Trados Studio.

Import Multilingual Translations

The “Import Multilingual Translations” would be run after the project is created.  The options on this task are straightforward:

You can run the task after pre-translating from your TM as part of your normal project creation process because the options allow you to overwrite any existing translations if they are already approved or preferred for example.  You can also set the “Origin System” and the segment status in Trados Studio to be used after import (Draft, Translated, Approved etc.), and you can also exclude segments from being updated based on a wider range of one or more selection criteria:

  • properties
    • locked
  • status
    • Draft, Translated, Approved etc.
  • type of match
    • Perfect Match, Context Match, Exact Match, Machine Translation etc.

So a decent amount of flexibility around whether you would prefer to use work already done with other resources or take the translations provided in the imported file.

Generate Target Translations

The “Generate Target Translations”  batch task is needed to pull the final target file together.  Why?  Well, Trados Studio is a tool based on working with bilingual content created from either bilingual or monolingual source files.  Studio will create an SDLXLIFF file for each language pair and will recreate the target file with the translated content inserted into the right place for each one.  So if you have a multilingual file with 25 languages in it (one of them being the source) you will end up with 24 target files, one for each target language.  You now have to put all of these together into one file to be able to provide the fully translated multilingual file back to your customer.  That can be quite a task!

So this batch task does it for you.  It will create target files in each language folder containing only the translations for that language AND it will create a single file in a new folder called “Multilingual” which contains the single multilingual file with all the translations for your customer.  I don’t know if you’ve ever tried to do this before?  It is possible of course and you may, if you are an experienced user or localization engineer, have created scripts or processes to do this.  But it’s not simple and some files can be incredibly difficult to handle.  So for me this task is a stroke of genius 🙂

Professional Version of Trados Studio

If you have the professional version of Trados Studio  then of course you can create custom tasks.  So for example, I have one that does this:

When I create a multilingual project with this template I only do three things:

  • convert to translatable format
  • copy to target languages
  • import multilingual translations

So the project creation process is quick and I don’t need to run the batch task to import the translations afterwards.  A nice feature in the professional version.

I know we often share the “secret code” for this sort of customisation so Freelance users who want to have this, and are prepared to manually edit their project templates can achieve a similar level of automation albeit with a workaround every time they want to use it somewhere new… so here’s what you need for the example above:

“notsoSecret” code
  <InitialTaskTemplate Description="Used for multilingual projects using the new multilingual xml filetype" Name="multilingual" Id="70d9843e-78f7-463f-a4a2-785ec9622659">
    <SubTaskTemplates>
      <SubTaskTemplate TaskTemplateId="Sdl.ProjectApi.AutomaticTasks.Conversion" />
      <SubTaskTemplate TaskTemplateId="Sdl.ProjectApi.AutomaticTasks.Split" />
      <SubTaskTemplate TaskTemplateId="MultilingualXMLFileType_ImportBatchTask_Id" />
    </SubTaskTemplates>
  </InitialTaskTemplate>

You might only want to insert the “<SubTaskTemplate TaskTemplateId=”MultilingualXMLFileType_ImportBatchTask_Id” />” into an existing template, but now you know what to use!

A Preview

I should also mention the preview.  You can’t create a preview that mirrors whatever the finished translations will look like because we don’t know what this is from a flat XML file.  We could create a preview showing all the other languages so you might get some inspiration if the file is partially translated.  But we questioned the value in that too.  So in the end we went for showing the XML itself, and where the translation you are working on sits in the file… so you get a preview like this for example:

If something else would be preferred we are always happy to look at the suggestion.

Interesting use cases

If you take a look at the video in this wiki you’ll see various file usecases for this filetype such as:

  • the really common crappy XLIFF created by many non-localization friendly tools such as WordPress as they abuse the CDATA concept in XLIFF making content very difficult to handle
  • invalid XLIFF with incorrect language codes, non-recognised elements or attributes… all things which Trados Studio won’t like because it adheres to the XLIFF specification and expects nothing less.  The multilingual XML filetype is a bit like the Honey Badger in this respect as it doesn’t give a …. hoot, and couldn’t care less about standards or specifications!  As long as the file is well formed it’ll allow you to handle the translation which is all you really want!
    • a good example of this would be here in the RWS Community and I think this may be the first real life completed project using this new filetype!
  • .. and more

But I thought it would be interesting to tackle a more off-beat usecase that came up in the RWS Community a week or so ago from a user looking for a solution to handling a bilingual requirement inside a Word file.  The user doesn’t seem to be too interested anymore as he never responded, but I was.  It was the perfect opportunity to try something quite complicated with this new Multilingual XML filetype!  To summarise, the problem was how to handle a Word document that looked like this:

What makes this tricky is three things:

  1. only content in the table cells need to be translated
  2. the source is in the second column of each table and the target needs to be placed into the third column
  3. if the cell in the third column is shaded grey then that particular row should not be translated at all

At first glance, this is something that looks like so much work (especially if the file is large) that it’s probably easier to translate in the Word file itself.  However… we have the Multilingual XML filetype!  So we could do this:

  1. unzip the docx file
    1. a docx file, for anyone who didn’t know, is actually a zipped set of files and folders.  So if you add a .zip extension to the file name you can unzip it and get at the files inside
  2. Inside the files you’ll find something like this:

    And inside the “word” folder this:

    Gets exciting when we see the xml extension coming up 😉
  3. The document.xml in my file contains all the content I might need to translate in this file.
  4. translate the XML in Trados Studio
  5. Save the target file
  6. Put it back into the unzipped docx file and zip it up again.

Simple really…. if it wasn’t for the three tricky things I mentioned above.  The document.xml file in a docx is quite complicated.  It has 19 namespaces and a lot of structure:

I won’t lie to you… I did find it tricky to get to the bottom of what I needed here and actually built a simplified version just to get the XPath right first:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document>
	<body>
		<tbl>
			<tr>
				<tc>
					<p>
						<r>
							<t>Expert</t>
						</r>
					</p>
				</tc>
				<tc>
					<p>
						<r>
							<t>Problem Description</t>
						</r>
					</p>
				</tc>
				<tc>
					<tcPr>
						<shd fill="A6A6A6" />
					</tcPr>
					<p>
						<r>
							<t>GREY IN HERE</t>
						</r>
					</p>
				</tc>
			</tr>
		</tbl>																	
	</body>
</document>

So I removed the namespaces and all the stuff in the file apart from the main paths to the information I needed.  This allowed me to configure the filetype… putting the appropriate namespaces back in which in this case is “w”:

Absolute XPath to the “Language root”:
/w:document/w:body/w:tbl/w:tr[not(w:tc[3]/w:tcPr/w:shd/@w:fill=’A6A6A6′)]

So I’m telling the app that the part of the document where the source and target languages will be is in the tr element.  But not when the colour of the cell in the third column of the table row is grey.  The colour is held in the fill attribute of the shd element.  So this path will only filter out the table rows where the third column doesn’t contain a grey cell.

Then I just need Relative XPath expressions for each language.  In this case:

Source:
w:tc[2]/w:p/w:r/w:t

Target:
w:tc[3]/w:p/w:r/w:t

So I’m just pulling out the text from the second and third table columns to insert onto my project.  This gets me the following in Trados Studio with the one cell already pre-translated as this was in the Word file already:

Pretty sweet!  If the file was huge this will have saved me one hell of a lot of work.  I can now translate the file (machine translation!!):

I run the “Generate Multilingual Translations” batch task and put the target file back into my unzipped Word folder to replace the document.xml that was there before and Bob’s your Uncle!

And finally, just in case it’s easer to follow, I created a video of the whole process from start to finish.  Hopefully it’ll show you how well the filetype works as well as how to work through the steps I’ve been talking about above:

Length: 10 mins 33 seconds

I actually had a lot of fun doing this, and the exercise proved a useful test case for the developer because we discovered the logic we used for handling namespaces was inadequate for a file like this.  So you’ll see a new version of the filetype was released on the 1st December (today) to accommodate the fix.  Just another advantage of having filetypes as apps rather than in the core product… they can be fixed from one day to the next and you don’t have to wait for a release of the core product to enjoy the benefits!

All thanks to the genius of the RWS AppStore Honey Badgers!!

Leave a Reply