XML… unravelling chaos

Image of a ball of wool unravelling around the letters XMLWhilst I would definitely not claim to be an expert, writing this blog has allowed me to learn a reasonable amount about XML over the years.  Most of the articles I’ve written have been about explaining how to manage the many amazing features in the filetypes that are supported by Trados Studio… and of course how to deal with the many changes over the years as the filetypes have become more and more sophisticated catering for the demands of our customers and the changes in the technologies applied to XML in general.  The result of these changes has led to some… let’s say… less than user friendly interfaces and features and you’d certainly be forgiven if you thought things were becoming a little chaotic!

Another long article warning (XML does this to me!), but for example, when 2019 SR2 was released some years ago we could see this:

Four different types of XML to choose from when you came to create your filetype!  Fortunately today we only have one:

It turns out the chaos was just the result of having to not only provide many enhancements over the years while still supporting legacy workflows because we had so many customers still using older technology, but we were also building for the future… supporting both desktop and working in the cloud:

Screenshot showing the same new filetype options in the cloud

Everything the same and only one XML to choose from!  It’s all starting to make sense and it supports users who are choosing to work more in the cloud by ensuring they have the same experience whether working with Trados Studio or any of the cloud products such as Trados Enterprise and Trados Team.

But there are still a few things I’ve not discussed in my articles around the use of XML so I decided to put my recent forays into the world of AI to one side and see if I can unravel a little more of the XML chaos that’s still hanging around.

Parser Rules Type

When you create your new XML filetype the second thing you’ll do after providing the standard Filetype Information is create your parser rules.  But before you get to there you have to make a decision about the Parser Rule Type.  It’s the same question in Trados Studio as it is in the cloud:

Screenshot showing the Trados Studio and Cloud UI for this feature together

I won’t keep doing this… but it’s definitely worth noting that most of all this cool stuff we can do in Trados Studio is there in the cloud too.  This takes a lot of work and it adds a lot of complexity that I bet most users won’t really appreciate, and the engineering teams definitely deserve a lot of credit for trying to keep the right balance over the years in supporting the past while forging ahead into the future.

But let’s talk about these two types and why we have them… and more importantly what it means when you choose one over the other.

XPath rules

If you choose to go with XPath rules you are giving yourself complete flexibility to do two things:

  1. create XPath rules to manage the data in the XML in pretty much any way you wish.  Here the rules for XPath can be as complex as you like allowing you to manage some very specific requirements such as the ones I covered in this article, and
  2. work with namespaces to further customise your XML filetype through being able to uniquely identify element and attribute  names, especially when working with multiple XML files in a project where only the namespace provides the ability to avoid any ambiguity between the files.

XML Settings Import

Here you have two choices:

  1. Create an XML filetype based on the default settings, or
  2. Define settings based on XML, sdlftsettings or XSD rule

Looks like this:

Screenshot showing the XML settings page in Trados Studio

If you choose to create an XML filetype based on the default settings then you’ll be taken through the filetype wizard and will have to create all the information needed to handle your XML file yourself.  If you decide to define settings based on XML, sdlftsettings or XSD rules then some, or all, of the work in defining the rules is done for you.

XML: this is going to import the elements that are found in the XML file and make them all “Translatable (except in protected content)”.  They will appear as XPath rules.  It will not import the attribute names or make them available to you at any stage, so you will need to write your own XPath to select them.

sdlftsettings: unless you need to make changes this is probably going to contain all the rules you need because it’s essentially the settings for a Trados Studio custom xml filetype.  When you import it the new filetype will be created and all the rules and filetype settings will be there already.

XSD: this is going to import all the rules provided that could support multiple XML filetypes based on the same schema.  So useful for ensuring you have all the elements possible.  However, there are two gotchas to be aware of.  First if you have multiple root elements only the first one will be imported into your new filetype so you’ll have to add any others manually.  Second the attributes are not going to be available to you so you’ll have to add any rules for these manually by inspecting the XSD or the actual XML files.

Once imported you’ll see the rules similar to this:

Screenshot showing the parser rules after importing an XML file or XSD under XPath rules.

Note that the namespaces are all there, and under the namespace node you’ll see the namespaces and their prefix similar to this:

Screenshot showing the namespace settings for the filetype.

So with the XPath rules you really do have complete flexibility to manage the extraction of translatable text for all sorts of complex criteria.  But it comes with the price of having to define rules based on you inspecting the definition files for the attribute names as they are not made available for you.  A small price to pay I think.

Element Rules

These are essentially there to make things easier for the translator who doesn’t need to do anything complex that might require the use of XPath. It handles the namespaces for you, and it only presents you with the names of the elements and the attributes so you can easily select them.  Perfect for simple scenarios where there is no complex logic required.

XML Settings Import

Here you also have two choices but the second choice is different

  1. Create an XML filetype based on the default settings, or
  2. Define settings based on XML, XSD or DTD rule

Looks like this:

Screenshot showing the XML settings page in Trados Studio for Element Rules

If you decide to define settings based on XML, XSD, or DTD rules then some, or all, of the work in defining the rules is done for you.  I’ll just mention DTD.

DTD: this is going to import the elements that are found in the DTD file and make them all “Translatable (except in protected content)”.  They will appear as Element names.  It will also import the attribute names and make them available to you during the creation process only.  For example, if I edit ChaosTools I can also see the three attributes that are associated with this element:

Screenshot showing the ability to select an element and then the corresponding attribute values for that element.

After the filetype has been created these attribute values are no longer available for selection and you would have to type them in manually.  But I think this is probably a bug, or a simple omission, as this always used to be possible with the legacy XML filetypes.

Interestingly, despite DTD being largely superseded by XSD, I have found that when it comes to using an XSD versus a DTD in practice and using it to create the filetype settings, the DTD actually makes it far easier despite not supporting namespaces, especially when you start supporting multiple XML files with different roots and namespaces into one DTD/XSD.

Once imported you’ll see the rules similar to this:

Screenshot showing the parser rules after import for Element Rules.

Note you don’t see the namespaces this time, and the simpler use of columns for the Element name and the Attribute name are displayed instead.  If you look under the namespace mode which is still available you’ll see this:

Screenshot showing the namespace options... none!

Why is it still available… well if you took the simpler approach to creating you XML filetype and later realised you needed a little more flexibility as your client introduced some rules around what you needed to extract for translation that could only be handled with XPath then you do still have this option to convert your rules to XPath!

Screenshot showing how to convert Element Rules to XPath rules.

This will convert all your rules to XPath like this:

Screenshot showing the converted Element Rules to XPath expressions.

All good except now you have local-name, which is useful when dealing with XML documents that use namespaces and you want to select elements based on their local name, regardless of the namespace prefix:

//*[local-name() = ‘ChaosTools’]

as opposed to the use of the namespace:

//tools:ChaosTools

So the other important point to note is that if you wish to use the namespaces to distinguish between elements with the same name that have different namespaces, then you will also have to add them in manually.  This is because the “Convert to XPath” feature also makes namespaces available to you, but as the use of Element Rules doesn’t pay attention to namespaces the detail isn’t there:

Screenshot showing the empty namespace node afer converting to XPath.

So you would have to define the Prefix and the Uri yourself… or recreate filetype using XPath rules.  Really depends on your level of comfort, although when you play around with these options you’ll soon learn how to do it!

All in all you are going to have to do some work when creating your filetypes because only the sdlftsettings import is going to do everything for you.  Everything else will require you making sure that the rules and settings have been appropriately set up for the XML files you are translating.

Moving from XML1 to XML2

I wanted to add this in here because recently I have seen many users migrating from 2019 or 2021 where they had been using custom XML filetypes created with the older XML1 filetype.  Now that 2022 has simplified the versions of XML available the XML1 filetype may not work correctly anymore in 2022.  For example, if you created custom stylesheets for previewing your XML they may not work at all anymore.

The solution to this problem is to export the sdlftsettings file from your XML1 filetype using Trados Studio 2021 or earlier (so this means you have to have had the foresight to back up your custom filetypes to an sdlftsettings file in the first place) and then create a new XML2 filetype in Trados Studio 2022 by importing the sdlftsettings.

If however you didn’t have your crystal ball available before upgrading, then not to worry as the RWS technical support team have you covered as described in this KB article.

What about the legacy?

I need to ask this question because these had been around forever.  In the old filetypes this defining settings used to be based on INI, ANL, XML, XSD, ITS or DTD rule files.  So what’s with this… why have we dropped them?

INI: these were used by Trados2007 and earlier.  INI files played a role in configuring the handling of various file formats and filters within Trados and determined how different types of documents were processed and segmented for translation.

ANL: these were used by SDLX to define how the software should handle different file types during the translation process.

ITS: these were filetype settings based on the Internationalization Tag Set (ITS) defined by the W3C.

You may have never heard of any of these and this would be one of the reasons why they are no longer supported… they are old!  Trados 2007 has been out of support since 2013.  ITS  was last updated in 2013 as well.  SDLX… to be honest I’m not sure when this was no longer supported, but as a product it hung around to work with SDL TMS for a while and then eventually the filetype support in SDL TMS was adapted to match Trados Studio so SDLX really became redundant.  Although I’m sure we have a few old sweats left who can correct me on that!  However, the point is they are old and it was no longer considered necessary to support these old formats.  So if you do have any then I’m afraid you’ll just have to create them again!!

Leave a Reply