Introducing the multilingual XML… super filetype!

I was compelled to make a return to a previous theme around Marvel Comics because it’s the only way I can do justice to the amazing work the RWS AppStore team carry out on a daily basis.  There are some things you just can’t wait to get up in the morning for, and for me, one of these things is being able to work with this team on a daily basis.  The first meeting of every day for me is with this team and what a fantastic way to start the day it is!  I started this article by mentioning Marvel, but as you’ll see, the hero of this story is probably a Honey Badger!

The API (Application Programming Interface) documentation (recently updated here – https://developers.rws.com/), used by developers to help them create the knots that tie their solutions to the RWS products, contain a number of simple examples which can be used as a starting block.  One of these is something I’ve been waiting to see turned into a proper app suitable for anyone to use, and to date I’m only aware of a couple of developers who did this for their own use.  One developer released a bilingual XML filetype onto the appstore by mistake some years ago and I even wrote about it… but then had to remove the article and the app a few days later when they realised they’d made it public in error!  But now, I’m really happy to say we’ve made the time to address this for more general consumption via the appstore.  The solution the appstore team came up with is spectacular and worth waiting for, but it’s also simple and incredibly useful!

Bilingual XML Filetype

What am I talking about?  A bilingual XML filetype, and the relevant API documentation for anyone who is interested is the Filetype Support Framework and as you can see this does already contain an example of how to build a bilingual XML filetype.  But before I go any further, what exactly do I mean by a bilingual XML Filetype and why would it be useful?  Well, here’s an example of the sort of file I see from time to time… or rather a fabricated file containing some of the trickiest things to deal with:

This file is actually an example of a multilingual XML file supporting translations in multiple languages, but the problems of how to handle it in Studio are relevant.  In this small example of only four segments in Studio we have the following issues to deal with:

  1. The file is not monolingual and we have to be able to read one element and write the translation into another
  2. The file is partially translated so the workaround of using regex to copy source from the source elements into the target elements in the source file is not appropriate
  3. There are html tags and CDATA within the translatable text
  4. There are also non-translatable placeholders in the text.  So {0} and {1} for example
  5. The language codes are not anything Trados Studio can recognise

But wait… this isn’t just bilingual, it’s a

Multilingual XML Filetype

So there are two more problems!

  1. ideally we should be able to create a multilingual project from this single source file
  2. when we have completed the project we need to be able to rebuild the single multilingual XML target file as opposed to having multiple files, one for each target language

I started off by saying this was so simple, but in fact it’s not!  The problems that anyone having to deal with when faced with a file like this, especially if they are the project manager having to handle all the target languages, are not trivial.  But despite this the appstore team have managed to create a solution that pretty much does address these problems by taking the Honey Badger approach!  This filetype doesn’t give a … hoot about standards, or specifications.  Files have to well formed, but after that anything goes.  The app is designed with just being able to get the translatable content out and do the work!  After all, that’s what you get paid for!!

How does it work?

You can find a fairly detailed explanation of how to work with this filetype here in the RWS Community, including a video from the RWS Autumn Roadshow where the filetype was first introduced.  The app wasn’t completely finished at the time and there were still a few things we wanted to complete before releasing, but it is now available in the appstore and ready for use!

The user interface

The basic idea is you need to tell the filetype a few things:

  1. what’s the file extension (xml, xliff, tmx etc.)?
  2. where should the different translations be in the file?
  3. what languages should the translations be in?
  4. should an embedded content processor be used?
  5. do you have to handle placeables using regex because an embedded content processor won’t pick them up?
  6. do you need to handle entity conversions
  7. do you want to provide support for any quick inserts?

So probably all sound fairly familiar apart from 2. and 5.

2. where should the different translations be in the file?

Trados Studio can only handle pre-defined bilingual files such as XLIFF, TTX, ITD and SDLXLFF for example.  It cannot handle multilingual filetypes at all (unless you are just extracting a single language to work on with a custom XML filetype), and it certainly can’t support the creation of a multilingual project that makes proper use of all the languages in the file.  One of the reasons for this is that files like this could have been prepared with whatever structure the developer felt most appropriate to use for their own purposes.  So the interface needs to reflect this.  For example:

In this example I am managing a TMX file for translation with 25 languages.  One file to create a single multilingual project.  The Language Mapping interface has two parts:

  1. Languages Root
  2. Languages

In the root I need to specify where in the file the languages can be located.  So I do this using an absolute XPath query (an XML technology I have discussed before in case you’re new to this).  For a TMX which looks something like this:

The Languages Root XPath query would therefore be:

/tmx/body/tu

The languages are all contained within the //tuv/seg elements and defined by the use of an xml:lang attribute.  We can use this attribute to tell the app where each language goes.  So using this same example we have these relative XPath queries:

English
tuv[@xml:lang=’EN’]/seg

Bulgarian
tuv[@xml:lang=’BG’]/seg

And so on for all 25 languages in my file.  Incidentally if you’d like a good explanation of absolute and relative XPath queries, as well as an introduction to working with XPath then this W3 Schools is a good place to start.

Once you start to work with this filetype you’ll see how logical and well thought out this interface is.  Every file is likely to be different and this provides the flexibility to handle them.

5. do you have to handle placeables using regex because an embedded content processor won’t pick them up?

This is something I expect every Trados Studio user working with embedded content in their files will be wishing was available in all the filetypes.  Frankly I have no clue why it isn’t!  If you don’t know what I mean then take the elements in this file for example:

Some years ago Trados Studio introduced the ability (for some filetypes) to handle CDATA sections using an embedded content processor, such as the html filetype for example.  This was great and it significantly cuts down the work involved in creating regular expression rules for files containing content like this which was the process before this feature was introduced.  However, you still get CDATA that not only contains html, but it also contains placeables.  You are then forced to look for workarounds (Data Protection Suite or Clean up tasks for example) or just manually handle them while translating.  This is sub-optimal.  So in the multilingual XML filetype the developer added some settings to allow you to tag up any content you like using regular expressions in addition to the use of embedded content processors.

There are some default rules to give you a head start and an idea of how to use this, but you can create as many as you need.  An important point to note is that you create the expressions to suit your content.  If one of the defaults works for you then that’s great… they do cover some common scenarios… but they are not intended to be the answer for all placeables!

Such a simple solution though… should be available for every filetype in Trados Studio!

The batch tasks

It’s just a filetype so why do we need batch tasks?  It seems so far that everything in this article has two reasons… and this question is no exception!

  1. we need to be able to import the translations for each language in the project if the file is partially translated
  2. we need to be able to put the fully translated multilingual XML file back together again when the project is complete

When you install the plugin you will also find you have two new batch tasks:

  • Import Multilingual Translations, and
  • Generate Multilingual Translations

If you have the Freelance version of Trados Studio then you will have to run these batch tasks manually after creating your projects in Trados Studio.

Import Multilingual Translations

The “Import Multilingual Translations” would be run after the project is created.  The options on this task are straightforward:

You can run the task after pre-translating from your TM as part of your normal project creation process because the options allow you to overwrite any existing translations if they are already approved or preferred for example.  You can also set the “Origin System” and the segment status in Trados Studio to be used after import (Draft, Translated, Approved etc.), and you can also exclude segments from being updated based on a wider range of one or more selection criteria:

  • properties
    • locked
  • status
    • Draft, Translated, Approved etc.
  • type of match
    • Perfect Match, Context Match, Exact Match, Machine Translation etc.

So a decent amount of flexibility around whether you would prefer to use work already done with other resources or take the translations provided in the imported file.

Generate Target Translations

The “Generate Target Translations”  batch task is needed to pull the final target file together.  Why?  Well, Trados Studio is a tool based on working with bilingual content created from either bilingual or monolingual source files.  Studio will create an SDLXLIFF file for each language pair and will recreate the target file with the translated content inserted into the right place for each one.  So if you have a multilingual file with 25 languages in it (one of them being the source) you will end up with 24 target files, one for each target language.  You now have to put all of these together into one file to be able to provide the fully translated multilingual file back to your customer.  That can be quite a task!

So this batch task does it for you.  It will create target files in each language folder containing only the translations for that language AND it will create a single file in a new folder called “Multilingual” which contains the single multilingual file with all the translations for your customer.  I don’t know if you’ve ever tried to do this before?  It is possible of course and you may, if you are an experienced user or localization engineer, have created scripts or processes to do this.  But it’s not simple and some files can be incredibly difficult to handle.  So for me this task is a stroke of genius 🙂

Professional Version of Trados Studio

If you have the professional version of Trados Studio  then of course you can create custom tasks.  So for example, I have one that does this:

When I create a multilingual project with this template I only do three things:

  • convert to translatable format
  • copy to target languages
  • import multilingual translations

So the project creation process is quick and I don’t need to run the batch task to import the translations afterwards.  A nice feature in the professional version.

I know we often share the “secret code” for this sort of customisation so Freelance users who want to have this, and are prepared to manually edit their project templates can achieve a similar level of automation albeit with a workaround every time they want to use it somewhere new… so here’s what you need for the example above:

“notsoSecret” code
  <InitialTaskTemplate Description="Used for multilingual projects using the new multilingual xml filetype" Name="multilingual" Id="70d9843e-78f7-463f-a4a2-785ec9622659">
    <SubTaskTemplates>
      <SubTaskTemplate TaskTemplateId="Sdl.ProjectApi.AutomaticTasks.Conversion" />
      <SubTaskTemplate TaskTemplateId="Sdl.ProjectApi.AutomaticTasks.Split" />
      <SubTaskTemplate TaskTemplateId="MultilingualXMLFileType_ImportBatchTask_Id" />
    </SubTaskTemplates>
  </InitialTaskTemplate>

You might only want to insert the “<SubTaskTemplate TaskTemplateId=”MultilingualXMLFileType_ImportBatchTask_Id” />” into an existing template, but now you know what to use!

A Preview

I should also mention the preview.  You can’t create a preview that mirrors whatever the finished translations will look like because we don’t know what this is from a flat XML file.  We could create a preview showing all the other languages so you might get some inspiration if the file is partially translated.  But we questioned the value in that too.  So in the end we went for showing the XML itself, and where the translation you are working on sits in the file… so you get a preview like this for example:

If something else would be preferred we are always happy to look at the suggestion.

Interesting use cases

If you take a look at the video in this wiki you’ll see various file usecases for this filetype such as:

  • the really common crappy XLIFF created by many non-localization friendly tools such as WordPress as they abuse the CDATA concept in XLIFF making content very difficult to handle
  • invalid XLIFF with incorrect language codes, non-recognised elements or attributes… all things which Trados Studio won’t like because it adheres to the XLIFF specification and expects nothing less.  The multilingual XML filetype is a bit like the Honey Badger in this respect as it doesn’t give a …. hoot, and couldn’t care less about standards or specifications!  As long as the file is well formed it’ll allow you to handle the translation which is all you really want!
    • a good example of this would be here in the RWS Community and I think this may be the first real life completed project using this new filetype!
  • .. and more

But I thought it would be interesting to tackle a more off-beat usecase that came up in the RWS Community a week or so ago from a user looking for a solution to handling a bilingual requirement inside a Word file.  The user doesn’t seem to be too interested anymore as he never responded, but I was.  It was the perfect opportunity to try something quite complicated with this new Multilingual XML filetype!  To summarise, the problem was how to handle a Word document that looked like this:

What makes this tricky is three things:

  1. only content in the table cells need to be translated
  2. the source is in the second column of each table and the target needs to be placed into the third column
  3. if the cell in the third column is shaded grey then that particular row should not be translated at all

At first glance, this is something that looks like so much work (especially if the file is large) that it’s probably easier to translate in the Word file itself.  However… we have the Multilingual XML filetype!  So we could do this:

  1. unzip the docx file
    1. a docx file, for anyone who didn’t know, is actually a zipped set of files and folders.  So if you add a .zip extension to the file name you can unzip it and get at the files inside
  2. Inside the files you’ll find something like this:

    And inside the “word” folder this:

    Gets exciting when we see the xml extension coming up 😉
  3. The document.xml in my file contains all the content I might need to translate in this file.
  4. translate the XML in Trados Studio
  5. Save the target file
  6. Put it back into the unzipped docx file and zip it up again.

Simple really…. if it wasn’t for the three tricky things I mentioned above.  The document.xml file in a docx is quite complicated.  It has 19 namespaces and a lot of structure:

I won’t lie to you… I did find it tricky to get to the bottom of what I needed here and actually built a simplified version just to get the XPath right first:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document>
	<body>
		<tbl>
			<tr>
				<tc>
					<p>
						<r>
							<t>Expert</t>
						</r>
					</p>
				</tc>
				<tc>
					<p>
						<r>
							<t>Problem Description</t>
						</r>
					</p>
				</tc>
				<tc>
					<tcPr>
						<shd fill="A6A6A6" />
					</tcPr>
					<p>
						<r>
							<t>GREY IN HERE</t>
						</r>
					</p>
				</tc>
			</tr>
		</tbl>																	
	</body>
</document>

So I removed the namespaces and all the stuff in the file apart from the main paths to the information I needed.  This allowed me to configure the filetype… putting the appropriate namespaces back in which in this case is “w”:

Absolute XPath to the “Language root”:
/w:document/w:body/w:tbl/w:tr[not(w:tc[3]/w:tcPr/w:shd/@w:fill=’A6A6A6′)]

So I’m telling the app that the part of the document where the source and target languages will be is in the tr element.  But not when the colour of the cell in the third column of the table row is grey.  The colour is held in the fill attribute of the shd element.  So this path will only filter out the table rows where the third column doesn’t contain a grey cell.

Then I just need Relative XPath expressions for each language.  In this case:

Source:
w:tc[2]/w:p/w:r/w:t

Target:
w:tc[3]/w:p/w:r/w:t

So I’m just pulling out the text from the second and third table columns to insert onto my project.  This gets me the following in Trados Studio with the one cell already pre-translated as this was in the Word file already:

Pretty sweet!  If the file was huge this will have saved me one hell of a lot of work.  I can now translate the file (machine translation!!):

I run the “Generate Multilingual Translations” batch task and put the target file back into my unzipped Word folder to replace the document.xml that was there before and Bob’s your Uncle!

And finally, just in case it’s easer to follow, I created a video of the whole process from start to finish.  Hopefully it’ll show you how well the filetype works as well as how to work through the steps I’ve been talking about above:

Length: 10 mins 33 seconds

I actually had a lot of fun doing this, and the exercise proved a useful test case for the developer because we discovered the logic we used for handling namespaces was inadequate for a file like this.  So you’ll see a new version of the filetype was released on the 1st December (today) to accommodate the fix.  Just another advantage of having filetypes as apps rather than in the core product… they can be fixed from one day to the next and you don’t have to wait for a release of the core product to enjoy the benefits!

All thanks to the genius of the RWS AppStore Honey Badgers!!

Social sharing, kindle or email....

Let’s learn about XML…

This year at the Spring Trados Roadshows the emphasis was firmly placed upon education.  Almost all the presentations were based on providing translators, project managers, localization engineers etc. with great material to help them as they work with the Trados toolsets.

I had a few presentations at this event and decided it might be useful to post a few of them here, especially the ones that might help with some of the common filetype questions we see in the communities from time to time.

Continue reading

Social sharing, kindle or email....

What’s in a name?

“What’s in a name? That which we call a rose
By any other name would smell as sweet.”

In Shakespeare’s soliloquy, Romeo and Juliet, Juliet isn’t allowed to be with Romeo because his family name is Montague… sworn enemies of the Capulet family.  Of course she doesn’t care about his name, he’d still be everything she wanted irrespective of what he was called.  The rose would still smell as sweet irrespective of what it was called.  “Trados”, “SDL” and “RWS” have endured, or enjoyed, a feuding history as competitors in the same industry.  Our names are our brand and now that they’re changing do we still smell as sweet?  Sadly things don’t end well for poor Romeo and Juliet… but in our story we fare a little better!

Continue reading

Social sharing, kindle or email....

A file with a view…

I started thinking about “A room with a view” by E. M Forster when I contemplated how to start this article.  But as you can see from the images on the left my mind wandered from this idea and was focused more on the “view”.  This is quite possibly because our R&D team started a “Working from home” distance challenge to cover as much distance as you can every day for a month by physically getting out of your office/home and taking some fresh air.  A great initiative in these days of working from home where it’s all too easy to never leave your desk!  Walking, running, cycling and even swimming were acceptable activities and you get the distance converted into points based on the type of exercise you are doing.  You do have to track the activity and you have to take a few pictures as evidence of your efforts… but that brings me back around to my topic for the article… the pictures, or more specifically the views.  Yes, this is a very tenuous link indeed with the actual topic which is studioViews!

Continue reading

Social sharing, kindle or email....

Short term memories…

“Not only is my short-term memory horrible, but so is my short-term memory.”  I have no idea who this quote can be attributed to, and its certainly not original, but it is quite appropriate when I start to think about the evolution of Trados.  Ever since Trados Studio was launched you can be sure to find many “experts” in places like ProZ and even the SDL Community recommending you don’t upgrade because there is no difference compared to the last version.  To be fair, if you only use a fraction of the features despite having used the software for a decade, then it probably is like this.  The alternative being these “experts” have very short-term memories.

Continue reading

Social sharing, kindle or email....

Psst… wanna know a few more things about file types?

I wrote under this title back in 2013 and provided a bit of information about the Word filetypes in Studio.  It was a pretty popular article and I always meant to circle back and do some more.  Seven is a lucky number so now we’re in 2020, seven years later, I thought I’d do it again… and it’s also just as long, so grab a coffee first!

Continue reading

Social sharing, kindle or email....

A Private AppStore…

All the apps come in these places
And the apps are not the same
You don’t look at their faces
And you don’t ask their names
You don’t think of them as human
You don’t think of them at all
You keep your mind on the money
Keeping your eyes on the wall
I’m your private AppStore, I don’t cost no money
I’ll do what you want me to do…

Every time I think the words “Private AppStore” that song comes into my head and leaves me with an earworm for a while.  Funny, but true!

Continue reading

Social sharing, kindle or email....

Some you win… some you lose

When we released the new Trados 2021 last week I fully intended to make my first article, after the summary of the release notes, to be something based around the new appstore integration.  The number of issues we are seeing with this release are very low which is a good thing, but nonetheless I feel compelled to tackle one thing first that has come up a little in the forums.  It relates to some changes made to improve the product for the many.

Continue reading

Social sharing, kindle or email....

Not your usual stuff!

Time seems to be going faster as I’m getting older as it doesn’t seem that long ago since we saw the release of the 2019 version of SDL Trados Studio.  But here we are, it is that time again and many users will already have noticed they have a shiny new version in their account… SDL Trados Studio 2021.  Fast as it is, we don’t want to do these product launches too often because I can tell you it’s a major undertaking requiring no small amount of coordination between the product management teams, core development teams, AppStore team, support teams, customer success teams, marketing teams, sales teams, back office teams, IT teams, 3rd party developers who provide plugins and more.  In addition to this we often have other projects on the go and many of the teams worked on the new sdl.com website which also went live this week, AND everyone did all of this while having to work isolated from their colleagues while working from home.  Quite an achievement and I certainly feel proud to be part of this SDL team, and not just because of how well they all work together.

Continue reading

Social sharing, kindle or email....

Lazy XLIFF…

The last few years have seen some chatter around the topic of “lights-out project management” which is an idea referring to the automation of tasks, particularly through the use of AI (Artificial Intelligence), so that human intervention is not required.  Ideally, of course, allowing project managers to concentrate their efforts on other, more productive and value-added activities.  The goal of reducing the time spent on administrative tasks is nothing new and some attempts to achieve this can be more of a false economy because of the “hidden” technical restrictions under the hood of the tools used.

Continue reading

Social sharing, kindle or email....