XML Length Restrictions

01This week I spent some time in Stockholm attending one of the SDL Roadshows.  As usual it was a great event, and we have more to come.  In fact this year I get to attend a fair few so if you’re attending Copenhagen, Milan or Paris in May then I’ll look forward to seeing you there!

But I’m not writing about the roadshows.  I also enjoyed a day before the roadshow with some of our very technical customers in a small workshop and as usual they had lots of interesting questions to tax our software and my brain!  But this time I had reinforcements in the shape of Iulia who is a QA Engineer from our Cluj office.  The team in Cluj never cease to amaze me with their dedication to making the products better and in supporting our customers, in addition to their knowledge of our products.  But the reason I want to mention Iulia in particular is because these technical sessions always involve questions around how we handle XML in Studio.  This time was no exception and one question in particular had me dreaming up all kinds of workarounds… they were interesting I think, but unnecessary because Studio has some clever features here I’d never looked at before, but Iulia had.  Of course I don’t know why I’d expect anything less from a team that QA our products, but I thought it would be good to share.

The question was this “When will there be an easy way to check the number of characters for a translatable string in an XML file?“.

Studio of course can check any segment against a fixed character count by using this option in the QA Checker:

02

But if you have an XML file that uses attribute values that vary for different elements used in the file then this check is next to useless.  Take a file like this simple example with one translatable element called segment and a length attribute called length!

03

 

Workaround #1

Create a custom XML filetype using the Studio SDK (Software Development Kit) that could use a length attribute value as a QA check by default.  So it could read the value of whatever attribute you defined and apply it during interactive translation or when you ran a QA check.

Actually I kinda like this workaround and if it was implemented I think it would be a very smart solution.  But it’s not out of the box so I move on…

Workaround #2

Build the attribute value into a stylesheet for a visual check against the segment count shown in the Studio Editor:

04

I quite like this too because the visual is always nice to see and stylesheets are not that hard to create.  But the lack of any automated checking is an obvious drawback here.  So the next solution I can’t call a workaround because it’s using out of the box XML niftiness in Studio as suggested by Iulia.

Solution #3

The idea here is to give each element a different structure based on the attribute value and then use a built in check for maximum or minimum lengths in the Advanced Tag Settings… brilliant!  I look at these settings quite a bit and have never used these, or paid attention to what they can do before.  Proves the value of spending a little time to review the options in this product from time to time and just see what’s possible.

05Studio has this concept of Document Structure Information which can be used to provide the translator with additional information about the segment they are translating in.  So if you open a Word file in Studio for example and then look at the right hand column you might see something like this.  You can click on these abbreviations if you like and you’ll be presented with more detailed information, but this is telling you that you have:
– a Heading style
– a Paragraph style
– a List Item
– a Table Cell (the + symbol means there is more structure in here)
– a Text Box

This is very handy because the context of the segments you are translating can prove useful.  Localisation Engineers will often define their own Document Structure when creating XML filetypes so they can improve the experience and also provide more context for the translator.  It’s this structure we’re interested in here.

In my example I have three different values in the length attributes (10, 20 & 50), so I need to create three parser rules to allow me to set a different length check for each one.  I do this by using a little xpath expression like this:

//segment[@length="50"]

This just means extract the contents of the segment element when the value of the length attribute is 50.  I repeat this three times so I have a rule for 50, 20 and 10 characters.  I then add my Structure to these rules by clicking on the Edit… button and then here:

06

Now I can add a new Structure property to the element by clicking on the Add.. button here:

07

I can call this whatever I like, but to keep it simple I have used the same name as the number of characters so I have some consistent rules around this process (at least this seems logical in my head!):

08

I then click ok twice and stop at the Edit Rule window because now I want to go to my Advanced… settings to add the length check.  I can also see the name of my new Structure property in the field now:

09

In the Advanced… settings pane I can add the value I wish to check for, which in this case is 50 characters:

10

After repeating this a further two times with the appropriate values for each length attribute value I now have four parser rules for my simple example:

11

All I need to do now is go back to my verification settings and choose this option that checks if the target segments are within file specific limits (I don’t know how I missed this in the past because I have been asked this question before and didn’t have a good answer!):

12

And now when I translate the file I can see several things.  First of all I get a warning interactively with the little yellow triangle.  Hovering over this triangle tells me exactly what the problem is:

13

In addition to immediately knowing there is a length restriction on this segment simply because of the pink..ish LN abbreviation in the right hand column I can also read the Document Structure Information by clicking on the coloured LN+ abbreviation on the right.  I only added the “10” as a structure here, but Studio added the LN for the length restriction imposed in the QA settings:

14

 

Finally I also have plenty of detail in the verification message details panel so I know exactly what to try and achieve in order to satisfy the requirement to translate this segment in 10 or less characters:

15

This is really great, sophisticated capability that I’m glad Iulia was able to share.  Combined with the stylesheet it’s a really cool solution… and of course my initial workaround is still an option for anyone with a developer and that would bring all the benefits of this out of the box solution into a maintenance free one for a localisation engineer.  Studio truly is a versatile and capable localisation tool!

Solution #4 (Updated based on the comment from Sinan below)

In Solution #3 above I used an integer value to specify the length restriction.  This works, but now imagine you have 500 different lengths you have to work to in the file.  It would be quite some task to add them, and even then you may come across a new one that you hadn’t catered for.  So a far more economical solution is to use an XPath expression in the Advanced… settings pane like this:

16

In this case the value of every attribute called length will be used for a single rule to extract the translatable content from the segment element here:

17

So much less effort and this will catch whatever the value was set without you having to think about it at all, and using XPath allows you a fair degree of flexibility if the attributes used change, or if you wanted to use values that were specific to certain elements, or the context of certain elements in the file.  Notwithstanding this you could set minimum and maximum rules on this basis which would be almost unworkable using any of the previous solutions.

Of course the QA checking mentioned in Solution #3 would still apply, and the stylesheet if you wanted to use one.

Thanks Sinan!

12 comments
  1. walkqisky said:

    Great! Have never encountered such request, but learned a good lesson here.

    Like

  2. walkqisky said:

    Again, this trick reminds me of an xml snippet in which studio 2014 incorrectly handled but well done by studio 2011, could you give some hint if the latest SP1 release solved this issue? Below is the screenshot fyi.

    Like

    • I think this is just invalid xml and you need to escape the less than symbol. What do you think is wrong here?

      Like

      • walkqisky said:

        I see this circumstance does not happen very often, but this kind of xml file is as it is from my client, just a little confusing why studio 2014 cannot handle it properly like 2011 does, neither html4 nor html5 filetype can.

        Like

      • I can’t open a file with this in 2011 either.

        Like

      • walkqisky said:

        I have sent you a mail for your investigation, thank you in advance for sparing a moment to this unusual issue.

        Like

  3. Nichola Colabella said:

    Dear Paul

    Thank you for your blog post on XML length restrictions.

    I just wondered, where and how you create style sheets and also where you find the advanced tag settings€ in SDL Trados Studio 2014.

    If you could point me in the right direction, that would be great! I had a look under Options and under File types but am possibly looking in the wrong place still.

    Thank you

    Nikki

    Like

    • Hi Nikki,

      Stylesheets are created using XSLT. See XSLT Tutorial for more info. Then to see how to apply it in Studio see this article: Translate with style…

      Advanced Tag Settings… not sure which ones you mean but in this article they would be in the parser rule. So select a parser rule, click on edit, and it’s the bottom option in the window.

      Like

  4. Sinan said:

    Can’t we just use another Xpath rule in maximum length part? This way we would not need to repeat same steps for different length values?

    I think we can just write @length in Maximum Length box in this case.

    Like

    • Yes you could, and this would be a much neater and easier solution. One of the best things for me about writing these articles is the number of of things I learn from everyone else! I may update the article with this tip. Thank you.

      Like

  5. Vojta said:

    Hi Paul,
    Is there any way to set a length limitation that is not absolute but relative to the source segment? e.g. say the segment cannot be more than 50% longer than the source?
    Thank you

    Like

    • Hi Vojta, yes there is a way. In the verification settings you can check for a percentage less or more than the source segment to set min or max length. So go to File -> Options (or Project Settings) then Verification -> QA Checker -> Segments Verification and you’ll see these options in there.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: