XLIFF 2.x… the translator’s panacea?

In the last year or so many articles have been written about XLIFF 2.0 explaining what’s so great about it, so I’m not going to write another one of those.  I’m in awe of the knowledge and effort the technical standard committees display in delivering the comprehensive documentation they do, working hard to deliver a solution to meet the needs of as many groups as possible.  The very existence of a standard however does not mean it’s the panacea for every problem it may be loosely related to.  It’s against this background I was prompted to write about this topic after reading this article questionning whether some companies were preventing translators from improving their lives.  The article makes a number of claims which I think might be a little misguided in my opinion… in fact this is what it says:

XLIFF 2.0 is a “new” bilingual format for translation that attempts to do a handful important things for translators.

  • Improve the standard so that different translation tools makers, like SDL, don’t “need” to create their own proprietary versions that are not compatible with other tools
  • Creating true interoperability among tools, so translators can work in the tool of their choice, and end-customers can have flexibility about who they work with too
  • Allow businesses to embed more information in the files, like TM matches glossaries, or annotations, further enhancing interoperability

I say “new” because XLIFF 2.0 has been around for years now. Unfortunately, adoption of the XLIFF 2.0 standard has been slow, due to tools makers and other players deciding that interoperability is not in their interest. It’s one of those things where commerce gets in the way of sanity.

Does XLIFF attempt to do anything for translators specifically?  I have a character flaw forcing me to respond to things like this, and so I did, but as the commenting was tricky and the discussion unwieldy I thought I’d be able to address this better in my own article.  I’m not writing as an expert on XLIFF (because I’m not), but I am passionate about translation and the technology we use, so I’m just delivering my view on the usefulness of XLIFF 2.0.  In many ways standards are an essential part of modern day life, but I don’t think they are always the answer to everything.  In fact sometimes, despite the best of intentions, I think the standard can actually have the opposite effect of what was originally intended.  So I’m going to share my own views here, and note these are my own views that may not reflect those of SDL in general.  I’d also note this topic of interoperability is nothing new and in many ways to read an article 4-years on from a GALA event in 2014 (that had a bit of a focus on this topic) pushing XLIFF 2.0 as the solution to a translators problem of interoperability is very disappointing since it demonstrates how little things have moved on.  I put a copy of the presentation I delivered at the event here for interest:

SDLXLIFF

But first of all, a quick overview of the XLIFF standard (created under the OASIS nonprofit consortium) just so we know what we’re talking about.  XLIFF stands for XML Localisation Interchange File Format and it was created to provide a reliable and consistent way to share translatable content between Content Management Systems and translation tools.  This could include of course sharing between two different translation tools.  In fact this is a good point to refute the first comment that SDL created their own proprietary version because nothing could be further from the truth.

SDL introduced XLIFF as the native bilingual filetype for SDL Trados Studio 2009 and it was, and still is, fully compliant with the XLIFF 1.2 standard.  It’s called an SDLXLIFF and uses the file extension sdlxliff as opposed to plain xliff or xlf.  The simple reason for this is that Studio works natively on the SDLXLIFF file which means you can see it in windows explorer any time you like without having to export a file.  Using this extension helps the application to associate itself with the file more easily as it doesn’t treat an SDLXLIFF in the same way as an XLIFF which could have come from any system that can generate an XLIFF file.  Why doesn’t it treat them the same way?  This is one of the flaws (and strengths) of using a standard when it allows for customisation through the use of extension points… an SDLXLIFF, an MXLIFF (Memsource), an MQXLF (memoQ), a TXLF (Wordfast) etc. are all versions of XLIFF that use extension points.  I can’t and won’t comment on the validity of all these variants with the XLIFF standard, but an SDLXLIFF is fully compliant with the XLIFF 1.2 standard.  The extension points are required because the core components of an XLIFF 1.2 file are not sufficient to be able to share additional information that is part of a translation workflow between users of the same tool.  I mentioned earlier that this ability to use extension points is also a strength and of course it is.  I doubt SDLXLIFF would have existed as a replacement for TTX (which was a proprietary format created by Trados) if it wasn’t possible to use it for the things SDL considered necessary when sharing a bilingual file.  For example, things like:

  • tracked changes
  • more sophisiticated commenting that Studio provides
  • enhanced file recognition properties
  • additional properties for dependency files
  • text formatting properties
  • context information for improved translation memory matching
  • document structure, a feature unique to Studio
  • etc.

These additional pieces of information are useful to Studio users when sharing an SDLXLIFF file, whilst other tools can either map their ability to use the information available if they support it, or just ignore it and concentrate on the core XLIFF components for translation.  The same thing will go the other way around where Trados Studio will either provide some mapping capability or just ignore their extension points so the XLIFF file can still be handled.

True interoperability from a file?

This is all great, so why was there a need to create an XLIFF 2.0?  I believe it was created to try and improve the ability of an XLIFF to be used as a reliable and consistent way to share translatable content, I think the original idea might have been to include more features that translation tool providers could support and encourage them to do it in the same way thereby improving the interoperability above simple translation.  There were other changes improving the format to make it more robust but in essence I believe the idea was to remove the use of custom extensions so that everyone worked with the same file.  This approach has two problems which prevent it from achieving its aim and working in practice:

  1. XLIFF 2.0 did not prevent the use of extension points
  2. The “optional” modules it introduced require a translation tool vendor to do significant work to use them and it might be a duplication of something they may already support with an existing custom extension point in XLIFF 1.2

On this basis alone the idea that XLIFF 2.0 will now give translators the ability to work in their tool of choice is questionnable.  They can already work in their tool of choice and XLIFF 2.0 won’t change anything in this regard other than it might actually make it even more complicated than it already is because of the “optional” modules I mentioned earlier mixed in with the ability to use your own extension points.  Perhaps a better idea, if you believe that “standards” are the way forward (I don’t when it comes to interoperability), was the Interoperability Now initiative that seems to have been dormant for some years.  I didn’t agree with everything additional that the proposed XLIFF:DOC wished to include, and it was a little light on what it did choose to include, but at least it was more rigid over what was allowed and as a pure exchange file between translation tools it was quite a good idea.  Some SDL Trados Studio users take a similar approach using TTX as the exchange file.  Even though it was a proprietary format it had been around for a very long time and most translation tools can support it for translation… maybe even more than are able to handle the various flavours of XLIFF that we see from every tool that can create one.  Rarely a week goes by in the SDL Community without a question over how to handle an XLIFF from some other application.  I’m not proposing we go back to TTX though as the XLIFF 1.2 is far better and does what we need for now.

Coming back to these “optional” modules.  XLIFF 2.0 has also defined eight optional modules that extend the XLIFF Core and these are:

  • Translation Candidates
  • Glossary
  • Format style
  • Metadata
  • Resource data
  • Change tracking
  • Size and length restriction
  • Validation

All sounds great and the article I mentioned at the start thinks this is the answer to a translators problems.  In reality this is unlikely to provide any benefits for translators at all because the tools they use either already have custom extensions to support them, and 2.0 will still allow them to be used even if it is adopted, or they handle the same things in other ways.  How many tools today don’t already support a variety of glossary formats for example and how many provide this information to the translator through a different mechanism that is probably far richer and preferable to simple glossaries in the XLIFF file?  In fact anyone serious about terminology would either have to create extension points to the optional glossary module or use another solution in addition to it in order to provide information that is already available in a TBX.  But more importantly if the tool a translator chooses to use doesn’t support one of these things yet then just because it’s in the XLIFF is not going to guarantee that the tool will ever support it.  Adding the ability to make use of the information in the additional modules for XLIFF is not a trivial task and could even be beyond the capability of some translation platforms without significant work.  Change tracking is a good example of this… SDL Trados Studio was the first to support this really well and even today only a few can support it at all, never mind properly, in their editing environment.  Further to this, populating the XLIFF is no trivial task either and I’ve yet to see any CMS delivering XLIFF 2.0 files with fully populated optional extensions.  The current standard is XLIFF 2.1 and I’ve never seen one of these… in fact I don’t even think the use of XLIFF 2.0 is widespread.

Great for businesses?

The last point the author of the blog I started with makes is that XLIFF 2.0 “will allow businesses to embed more information in the files like TM matches” (these were already in XLIFF 1.2 by the way!) “glossaries, or annotations” (commenting in a basic format is already in XLIFF 1.2), “further enhancing interoperability“.  Many businesses can already do these things in greater detail than XLIFF 2.0 is capable of supporting and if they don’t they are unlikely to have the information needed to populate the optional modules anyway.  I will be very surprised to see a company who is interested in terminology for example downgrading their solution to send out the information they want a translator to see by embedding it into an XLIFF.  I also think that if the business is a Language Service Provider then they have already invested in a technology solution to suit their needs, and probably because the features of that solution provide them with additional benefits if everyone in their translation supply chain use the same solution.  When translators start “CAT hopping” the business is at risk of losing the benefits they would like, and the risks are completely out of their control and in the hands of a translator who may well be an expert in handling file exchanges between translation tools, but equally they may not.  They risk having to do a lot of rework because verification rules were not followed, or the exchange of the XLIFF changed the statuses of the translation units incorrectly, or incorrect terminology is being used as there was no connection to their online terminology solutions etc.  The list can go on.  One of the drivers for online working is to remove these problems and with it remove the need to exchange files anyway as everything will be carried out on the server… where does that leave the idea of XLIFF 2.0 giving translators the choice to work in the tools they wish?

In my opinion the solution to true interoperability lies in the use of APIs.  If you don’t know what an API is this video provides a very good explanation, and you might also enjoy this simple ebook from SDL on the use of APIs in our industry.  Most software applications today provide an API, and if they don’t they should.  The API can allow a developer to create a connection to an XLIFF with custom extensions and “translate” them into something they can use; it can allow a developer to connect to a terminology solution and use all of the information it can provide rather than just a simple subset of the data in a file.  An API is normally built to withstand changes in the applications that use them so every provider of an API is free to develop their tools with all the features they like to meet the needs of their customers, and yet still be able to provide information to anyone using other tools.  If you base everything around a standard for a file then simple changes not only have to be agreed by committee where the members could have conflicting interests but they take a long time to be implemented.  Supporting interoperability through the use of APIs means you avoid these sort of problems, you do give translators the ability to work in their tool of choice, and businesses benefit because they don’t have to adopt new ways of exporting their data into a flat file in restrictive ways, they can also expose their data via an API.  The use of APIs to support interoperability is for me a no-brainer!  The sticking point is where vendors see the competition as a threat and this is one reason why we don’t see more integration between translation tools already, and why many translation tool vendors don’t expose their APIs publicly at all, or just expose a limited subset of what could actually be possible.  The answer to the problem of not being afraid of the competition and embracing an API economy is partially solved by partnering and we do see this happenning when a customer chooses to take a solution involving competing technologies.  File exchange isn’t an appropriate solution at all, but APIs are.  For the avoidance of doubt SDL is very open with their APIs and we make them all publicly available for use… certainly solutions around the APIs are the most common topic in my blog.

mis-XLIFF

Something I haven’t mentioned at all is the misuse of XLIFF.  Just because there is a standard doesn’t mean it will be used in the way it was intended.  How many of you have ever worked with WordPress XLIFF for example?  Not the ideal format to handle and we see users having problems with them at least once a month… and they are not the only one!  Part of the problem is again the lack of a more rigid requirement for XLIFF and a lack of awareness of the localization process, all leading to the creation of XLIFF files that might even be valid, but certainly not written in line with the spirit of the standard.  I’m all for making XLIFF easier by reducing its complexity as an exchange file and tightening up the rules to ensure extension points are not allowed.  This could truly help interoperability at the bilingual file level as long as translation tool vendors were prepared to add the ability to export a simplified format for users of other tools.  We have seen this sort of approach in the SDL AppStore where a developer created the Legacy Converter which allows you to convert all your SDLXLIFF files to a TTX or BilingualDOC and then import them back in again to update your SDLXLIFF after they were translated in other tools.  All these formats are well known and supported by probably all translation tools so adding the ability to export to an XLIFF that could be truly used by everyone seems a better idea.  Perhaps adopting the core XLIFF 2.0 standard for this with no extension points and no optional modules would be helpful… but then how many translation tools even support XLIFF 2.0 in the first place?

Existing support for XLIFF 2.0

I read a thread in ProZ a few weeks ago about XLIFF 2.0.  It was started so the original poster could see how many translation tool vendors supported XLIFF 2.0 in terms of simply being able to open it for translation, never mind supporting all the possible variables I’ve already mentioned.  I don’t know how valid this is (I did think Swordfish supported XLIFF 2.0, I’m not sure) but in the absence of the normally very quick to correct ProZ community disagreeing it looks like this:

  • SDL Trados Studio (version 2015 and above)
  • Memsource
  • CafeTran Espresso (experimental only)
  • OmegaT and Okapi (experimental only)

Not exactly rich in providing translators with the ability to use their tools of choice, and probably because there isn’t a lot of value in adding this support for many of the reasons I have already covered.  But this is only the ability to read an XLIFF 2.0… not one of them can create them so if you take the normal scenario an LSP or a translator is faced with where they receive a Word, Excel, IDML, XML, HTML etc. file they would not be able to convert it to XLIFF 2.0 to share with another person anyway.  Trados Studio doesn’t need to convert anything because it works natively on the basis of an XLIFF 1.2 compliant file, but most of the major translation environments export their own flavour of XLIFF as I mentioned right at the start.  Based on this scenario where’s the value in developing an export to XLIFF 2.0 when only two translation tools appear to be able to support it for production use?  The only usecase for this, based on the information above, would be to give the XLIFF file to another translator so they could work on it and send it back.  In effect we are talking about enabling SDL Trados Studio and Memsource to exchange files, and these products can already handle each others XLIFF so why would we need to be able to create XLIFF 2.0 to deliver the same thing?

Notwithstanding all of this what’s the thing most translators do when they receive their Word, Excel, IDML, XML, HTML etc. files?  I doubt converting to XLIFF 2.0 is ever on their minds… they just translate the files and send them back.  Perhaps they just don’t know what they’re missing!

15 thoughts on “XLIFF 2.x… the translator’s panacea?

  1. That guy who wrote the article is clueless… he simply doesn’t know what he is talking about…
    One single sentence in that article is pure truth – when he wrote “I’m the last person that should be writing this”.
    The rest is just crap… no point to spend a single minute discussing with such people. Period.

    1. I wouldn’t be so harsh Evzen. I think the article was probably written with the best of intentions based on what he thought he knew. That’s why I responded because the theme of the article is exactly what many people would like to believe too. But there’s always another side to these things…

  2. As usual very good read. I only missed one info – will the native sdlxliff format be upgraded in time or will there be sdlxliff2 to keep separate filetypes for compatibility?

  3. Evzen, this isn’t Twitter over here! Like Paul, I’m guilty of responding to “things like this.” It’s worth noting that my hedging with “I’m the last person that should be writing this” is more or less the equivalent of Paul’s “I’m not writing as an expert on XLIFF.” 😛

    I think Paul’s reply is thoughtful, and as they say, all publicity is good publicity. Zingword is launching soon as the new best way to find a translator. I think you’ll find that is technically and conceptually one of the most promising products in localization today.

    I think next time, hopefully, my technical co-founder gets involved in the discussion. I’m the master of other domains. 😛

    I basically restated exactly what XLIFF 2.0 is supposed to be, which is an interchange format to improve interoperability among tools, and tried to get support for translators to join the committee since it’s clear that the standard is not doing well. Otherwise, the description I provided in the article is pretty much a stock overview of what the standard purports to do, and it’s not very different from what the co-secretary of the TC told me on our podcast.

    There are several people who want interoperability to move forward via the standard, and who are working on it. Are they all clueless too? Is it clueless to think that translators should be on the TC, or try to participate?

    Paul makes some compelling points in this article, most of which I conceeded in the comments on our post. The most compelling points are those that point out the potential gradient in the richness of the features. The least compelling points on the fantasy-land level are those that say “this is going to be expensive.”

    I maintain that the glossary module is super cool and would meet the needs of lots of projects. This paper looks promising to me (https://www.localisation.ie/sites/default/files/publications/LFV14-1-4.pdf).

    I suppose the whole shebang comes down to really two things:

    1. Who gets to define what interoperable means. For some, interoperability is just what’s in the core part of the standard. For others, interoperable means that all these other things should also happen. When Paul says “loosely related,” that’s an attempt to define what interoperability should mean. Which is totally fine. If anything, what I am saying is that translators need to get involved and contribute THEIR definition of interoperability.

    2. Why would tools makers ever invest in something like this? Paul brings up money, and that’s probably the most important part. For tools makers to invest in this, there has to be either ROI or some other pressure. There’s very little ROI in an expanded definition of interoperability if you are a market leader. That’s just how it is.

    Good interoperability is good for translators. There are incentives for tools makers to maintain as minimum a definition of interoperability as is feasible. If there are translators who want interoperability, it would be good for them to try to put their thumb on the scale. This is strategy 101. It’s business; try not to take it personal.

    On that note, I shouldn’t have name-checked SDL. I’m going to redact it. Lesson learned. I have no beef with SDL: I just think that we need to find a way to get translators more involved.

    1. Hello Robert, Can you elaborate here?

      The least compelling points on the fantasy-land level are those that say “this is going to be expensive.”

      Which points are you referring to and why do you think it won’t be expensive at all?
      I’m afraid I totally disagree with you on the glossary module simply because nobody can read it. That means work would be required to read the content and then display it in a way it would be useful. All of this is already possible using traditional and more capable means. If you think it’s fantasy that it will be cheap to extract this information and render it in a way that makes it useful for translator then you should probably think again, and would it get any priority over more useful development work for the respective vendors? I doubt it.

      There’s very little ROI in an expanded definition of interoperability if you are a market leader. That’s just how it is.

      How wrong you are. It’s the expanded definition of interoperability where the real value lies. The idea of sharing filetypes is hardly interoperability by any means. Today’s online solutions will see this method of working reduce in any event. The need for an XLIFF standard at all could well disappear completely in favour of more instant means of communication.

      I just think that we need to find a way to get translators more involved.

      I totally agree… but in this case maybe you’re a little late to the table.

  4. FWIW, current version of Swordfish allows you to convert XLIFF 2.0 to 1.2 and back.

    If you receive an XLIFF 2.0 file, convert it to 1.2 and translate. When you finish, convert the translated XLIFF 1.2 back to 2.0 and deliver.

    Regards,
    Rodolfo

    1. Thanks Rodolfo, is that the same way Studio does it (XLIFF 2.0 to SDLXLIFF (which is 1.2) and then back to XLIFF 2.0 when you save target ) or is this a specific feature in Swordfish that does the conversion first so that you can work on a 2.0 file?

      1. File menu has two conversion options: 1.2 -> 2.0 and 2.0 -> 1.2 intended for general use.

        Like Studio, Swordfish works on XLIFF 1.2 files. If you receive an XLIFF 2.0, create a 1.2 version before translating and then generate a 2.0 one from the translated document.

        If you want to send out an XLIFF 2.0 file, take your existing 1.2 version and generate a 2.0 one (this is what Fluenta has been doing for two years now). If later you receive a translated XLIFF 2.0, you can generate an equivalent 1.2 for converting back to original format.

        1. That’s quite interesting Rodolfo. Do you insert more information into the 2.0 when you generate it from 1.2? What’s the reason for doing this?

          1. The information is exactly the same. Nothing extra is added.

            Fluenta is a tool that generates XLIFF 1.2 from DITA and back using Swordfish filters. Two very large clients asked me to generate XLIFF 2.0 because it is “newer”. I added an option that transform the XLIFF 1,2 to the equivalent 2.0 and silently converts back to 1.2 when importing the translated XLIFF. Clients were happy until their translators said that they were unable to process XLIFF 2.0 with their old CAT tools. They went back to using XLIFF 1.2.

            I added that transformation to XLIFF 2.0 in Swordfish because some people think it is important to have the newer version number. Swordfish itself is still based on XLIFF 1.2 and doesn’t use 2.0 files at all.

            Transforming XLIFF 2.0 to 2.1 is trivial, just change a couple of numbers. None asked for it yet.

  5. The transformation deals with XLIFF Core plus Candidates and Meta Data modules, otherwise there is no way to have equivalent files when there are TM matches and properties.

    Modules for features that don’t exist in XLIFF 1.2 are ignored.

Leave a Reply