In the last year or so many articles have been written about XLIFF 2.0 explaining what’s so great about it, so I’m not going to write another one of those. I’m in awe of the knowledge and effort the technical standard committees display in delivering the comprehensive documentation they do, working hard to deliver a solution to meet the needs of as many groups as possible. The very existence of a standard however does not mean it’s the panacea for every problem it may be loosely related to. It’s against this background I was prompted to write about this topic after reading this article questionning whether some companies were preventing translators from improving their lives. The article makes a number of claims which I think might be a little misguided in my opinion… in fact this is what it says:
XLIFF 2.0 is a “new” bilingual format for translation that attempts to do a handful important things for translators.
- Improve the standard so that different translation tools makers, like SDL, don’t “need” to create their own proprietary versions that are not compatible with other tools
- Creating true interoperability among tools, so translators can work in the tool of their choice, and end-customers can have flexibility about who they work with too
- Allow businesses to embed more information in the files, like TM matches glossaries, or annotations, further enhancing interoperability
I say “new” because XLIFF 2.0 has been around for years now. Unfortunately, adoption of the XLIFF 2.0 standard has been slow, due to tools makers and other players deciding that interoperability is not in their interest. It’s one of those things where commerce gets in the way of sanity.
Does XLIFF attempt to do anything for translators specifically? I have a character flaw forcing me to respond to things like this, and so I did, but as the commenting was tricky and the discussion unwieldy I thought I’d be able to address this better in my own article. I’m not writing as an expert on XLIFF (because I’m not), but I am passionate about translation and the technology we use, so I’m just delivering my view on the usefulness of XLIFF 2.0. In many ways standards are an essential part of modern day life, but I don’t think they are always the answer to everything. In fact sometimes, despite the best of intentions, I think the standard can actually have the opposite effect of what was originally intended. So I’m going to share my own views here, and note these are my own views that may not reflect those of SDL in general. I’d also note this topic of interoperability is nothing new and in many ways to read an article 4-years on from a GALA event in 2014 (that had a bit of a focus on this topic) pushing XLIFF 2.0 as the solution to a translators problem of interoperability is very disappointing since it demonstrates how little things have moved on. I put a copy of the presentation I delivered at the event here for interest:
But first of all, a quick overview of the XLIFF standard (created under the OASIS nonprofit consortium) just so we know what we’re talking about. XLIFF stands for XML Localisation Interchange File Format and it was created to provide a reliable and consistent way to share translatable content between Content Management Systems and translation tools. This could include of course sharing between two different translation tools. In fact this is a good point to refute the first comment that SDL created their own proprietary version because nothing could be further from the truth.
SDL introduced XLIFF as the native bilingual filetype for SDL Trados Studio 2009 and it was, and still is, fully compliant with the XLIFF 1.2 standard. It’s called an SDLXLIFF and uses the file extension sdlxliff as opposed to plain xliff or xlf. The simple reason for this is that Studio works natively on the SDLXLIFF file which means you can see it in windows explorer any time you like without having to export a file. Using this extension helps the application to associate itself with the file more easily as it doesn’t treat an SDLXLIFF in the same way as an XLIFF which could have come from any system that can generate an XLIFF file. Why doesn’t it treat them the same way? This is one of the flaws (and strengths) of using a standard when it allows for customisation through the use of extension points… an SDLXLIFF, an MXLIFF (Memsource), an MQXLF (memoQ), a TXLF (Wordfast) etc. are all versions of XLIFF that use extension points. I can’t and won’t comment on the validity of all these variants with the XLIFF standard, but an SDLXLIFF is fully compliant with the XLIFF 1.2 standard. The extension points are required because the core components of an XLIFF 1.2 file are not sufficient to be able to share additional information that is part of a translation workflow between users of the same tool. I mentioned earlier that this ability to use extension points is also a strength and of course it is. I doubt SDLXLIFF would have existed as a replacement for TTX (which was a proprietary format created by Trados) if it wasn’t possible to use it for the things SDL considered necessary when sharing a bilingual file. For example, things like:
- tracked changes
- more sophisiticated commenting that Studio provides
- enhanced file recognition properties
- additional properties for dependency files
- text formatting properties
- context information for improved translation memory matching
- document structure, a feature unique to Studio
These additional pieces of information are useful to Studio users when sharing an SDLXLIFF file, whilst other tools can either map their ability to use the information available if they support it, or just ignore it and concentrate on the core XLIFF components for translation. The same thing will go the other way around where Trados Studio will either provide some mapping capability or just ignore their extension points so the XLIFF file can still be handled.
True interoperability from a file?
This is all great, so why was there a need to create an XLIFF 2.0? I believe it was created to try and improve the ability of an XLIFF to be used as a reliable and consistent way to share translatable content, I think the original idea might have been to include more features that translation tool providers could support and encourage them to do it in the same way thereby improving the interoperability above simple translation. There were other changes improving the format to make it more robust but in essence I believe the idea was to remove the use of custom extensions so that everyone worked with the same file. This approach has two problems which prevent it from achieving its aim and working in practice:
- XLIFF 2.0 did not prevent the use of extension points
- The “optional” modules it introduced require a translation tool vendor to do significant work to use them and it might be a duplication of something they may already support with an existing custom extension point in XLIFF 1.2
On this basis alone the idea that XLIFF 2.0 will now give translators the ability to work in their tool of choice is questionnable. They can already work in their tool of choice and XLIFF 2.0 won’t change anything in this regard other than it might actually make it even more complicated than it already is because of the “optional” modules I mentioned earlier mixed in with the ability to use your own extension points. Perhaps a better idea, if you believe that “standards” are the way forward (I don’t when it comes to interoperability), was the Interoperability Now initiative that seems to have been dormant for some years. I didn’t agree with everything additional that the proposed XLIFF:DOC wished to include, and it was a little light on what it did choose to include, but at least it was more rigid over what was allowed and as a pure exchange file between translation tools it was quite a good idea. Some SDL Trados Studio users take a similar approach using TTX as the exchange file. Even though it was a proprietary format it had been around for a very long time and most translation tools can support it for translation… maybe even more than are able to handle the various flavours of XLIFF that we see from every tool that can create one. Rarely a week goes by in the SDL Community without a question over how to handle an XLIFF from some other application. I’m not proposing we go back to TTX though as the XLIFF 1.2 is far better and does what we need for now.
Coming back to these “optional” modules. XLIFF 2.0 has also defined eight optional modules that extend the XLIFF Core and these are:
- Translation Candidates
- Format style
- Resource data
- Change tracking
- Size and length restriction
All sounds great and the article I mentioned at the start thinks this is the answer to a translators problems. In reality this is unlikely to provide any benefits for translators at all because the tools they use either already have custom extensions to support them, and 2.0 will still allow them to be used even if it is adopted, or they handle the same things in other ways. How many tools today don’t already support a variety of glossary formats for example and how many provide this information to the translator through a different mechanism that is probably far richer and preferable to simple glossaries in the XLIFF file? In fact anyone serious about terminology would either have to create extension points to the optional glossary module or use another solution in addition to it in order to provide information that is already available in a TBX. But more importantly if the tool a translator chooses to use doesn’t support one of these things yet then just because it’s in the XLIFF is not going to guarantee that the tool will ever support it. Adding the ability to make use of the information in the additional modules for XLIFF is not a trivial task and could even be beyond the capability of some translation platforms without significant work. Change tracking is a good example of this… SDL Trados Studio was the first to support this really well and even today only a few can support it at all, never mind properly, in their editing environment. Further to this, populating the XLIFF is no trivial task either and I’ve yet to see any CMS delivering XLIFF 2.0 files with fully populated optional extensions. The current standard is XLIFF 2.1 and I’ve never seen one of these… in fact I don’t even think the use of XLIFF 2.0 is widespread.
Great for businesses?
The last point the author of the blog I started with makes is that XLIFF 2.0 “will allow businesses to embed more information in the files like TM matches” (these were already in XLIFF 1.2 by the way!) “glossaries, or annotations” (commenting in a basic format is already in XLIFF 1.2), “further enhancing interoperability“. Many businesses can already do these things in greater detail than XLIFF 2.0 is capable of supporting and if they don’t they are unlikely to have the information needed to populate the optional modules anyway. I will be very surprised to see a company who is interested in terminology for example downgrading their solution to send out the information they want a translator to see by embedding it into an XLIFF. I also think that if the business is a Language Service Provider then they have already invested in a technology solution to suit their needs, and probably because the features of that solution provide them with additional benefits if everyone in their translation supply chain use the same solution. When translators start “CAT hopping” the business is at risk of losing the benefits they would like, and the risks are completely out of their control and in the hands of a translator who may well be an expert in handling file exchanges between translation tools, but equally they may not. They risk having to do a lot of rework because verification rules were not followed, or the exchange of the XLIFF changed the statuses of the translation units incorrectly, or incorrect terminology is being used as there was no connection to their online terminology solutions etc. The list can go on. One of the drivers for online working is to remove these problems and with it remove the need to exchange files anyway as everything will be carried out on the server… where does that leave the idea of XLIFF 2.0 giving translators the choice to work in the tools they wish?
In my opinion the solution to true interoperability lies in the use of APIs. If you don’t know what an API is this video provides a very good explanation, and you might also enjoy this simple ebook from SDL on the use of APIs in our industry. Most software applications today provide an API, and if they don’t they should. The API can allow a developer to create a connection to an XLIFF with custom extensions and “translate” them into something they can use; it can allow a developer to connect to a terminology solution and use all of the information it can provide rather than just a simple subset of the data in a file. An API is normally built to withstand changes in the applications that use them so every provider of an API is free to develop their tools with all the features they like to meet the needs of their customers, and yet still be able to provide information to anyone using other tools. If you base everything around a standard for a file then simple changes not only have to be agreed by committee where the members could have conflicting interests but they take a long time to be implemented. Supporting interoperability through the use of APIs means you avoid these sort of problems, you do give translators the ability to work in their tool of choice, and businesses benefit because they don’t have to adopt new ways of exporting their data into a flat file in restrictive ways, they can also expose their data via an API. The use of APIs to support interoperability is for me a no-brainer! The sticking point is where vendors see the competition as a threat and this is one reason why we don’t see more integration between translation tools already, and why many translation tool vendors don’t expose their APIs publicly at all, or just expose a limited subset of what could actually be possible. The answer to the problem of not being afraid of the competition and embracing an API economy is partially solved by partnering and we do see this happenning when a customer chooses to take a solution involving competing technologies. File exchange isn’t an appropriate solution at all, but APIs are. For the avoidance of doubt SDL is very open with their APIs and we make them all publicly available for use… certainly solutions around the APIs are the most common topic in my blog.
Something I haven’t mentioned at all is the misuse of XLIFF. Just because there is a standard doesn’t mean it will be used in the way it was intended. How many of you have ever worked with WordPress XLIFF for example? Not the ideal format to handle and we see users having problems with them at least once a month… and they are not the only one! Part of the problem is again the lack of a more rigid requirement for XLIFF and a lack of awareness of the localization process, all leading to the creation of XLIFF files that might even be valid, but certainly not written in line with the spirit of the standard. I’m all for making XLIFF easier by reducing its complexity as an exchange file and tightening up the rules to ensure extension points are not allowed. This could truly help interoperability at the bilingual file level as long as translation tool vendors were prepared to add the ability to export a simplified format for users of other tools. We have seen this sort of approach in the SDL AppStore where a developer created the Legacy Converter which allows you to convert all your SDLXLIFF files to a TTX or BilingualDOC and then import them back in again to update your SDLXLIFF after they were translated in other tools. All these formats are well known and supported by probably all translation tools so adding the ability to export to an XLIFF that could be truly used by everyone seems a better idea. Perhaps adopting the core XLIFF 2.0 standard for this with no extension points and no optional modules would be helpful… but then how many translation tools even support XLIFF 2.0 in the first place?
Existing support for XLIFF 2.0
I read a thread in ProZ a few weeks ago about XLIFF 2.0. It was started so the original poster could see how many translation tool vendors supported XLIFF 2.0 in terms of simply being able to open it for translation, never mind supporting all the possible variables I’ve already mentioned. I don’t know how valid this is (I did think Swordfish supported XLIFF 2.0, I’m not sure) but in the absence of the normally very quick to correct ProZ community disagreeing it looks like this:
- SDL Trados Studio (version 2015 and above)
- CafeTran Espresso (experimental only)
- OmegaT and Okapi (experimental only)
Not exactly rich in providing translators with the ability to use their tools of choice, and probably because there isn’t a lot of value in adding this support for many of the reasons I have already covered. But this is only the ability to read an XLIFF 2.0… not one of them can create them so if you take the normal scenario an LSP or a translator is faced with where they receive a Word, Excel, IDML, XML, HTML etc. file they would not be able to convert it to XLIFF 2.0 to share with another person anyway. Trados Studio doesn’t need to convert anything because it works natively on the basis of an XLIFF 1.2 compliant file, but most of the major translation environments export their own flavour of XLIFF as I mentioned right at the start. Based on this scenario where’s the value in developing an export to XLIFF 2.0 when only two translation tools appear to be able to support it for production use? The only usecase for this, based on the information above, would be to give the XLIFF file to another translator so they could work on it and send it back. In effect we are talking about enabling SDL Trados Studio and Memsource to exchange files, and these products can already handle each others XLIFF so why would we need to be able to create XLIFF 2.0 to deliver the same thing?
Notwithstanding all of this what’s the thing most translators do when they receive their Word, Excel, IDML, XML, HTML etc. files? I doubt converting to XLIFF 2.0 is ever on their minds… they just translate the files and send them back. Perhaps they just don’t know what they’re missing!