Getting a filetype preview…

001One of my favourite features in Studio 2017 is the filetype preview.  The time it can save when you are creating custom filetypes comes from the fun in using it.  I can fill out all the rules and switch between the preview and the rules editor without having to continually close the options, open the file, see if it worked and then close the file and go back to the options again… then repeat from the start… again… and again…   I guess it’s the little things that keep us happy!

I decided to look at this using a YAML file as this seems to be coming up quite a bit recently.  YAML, pronounced “Camel”, stands for “YAML Ain’t Markup Language” and I believe it’s a superset of the JSON format, but with the goal of making it more human readable.  The specification for YAML is here, YAML Specification, and to do a really thorough job I guess I could try and follow the rules set out.  But in practice I’ve found that creating a simple Regular Expression Delimited Text filetype based on the sample files I’ve seen has been the key to handling this format.  Looking ahead I think it would be useful to see a filetype created either as a plugin through the SDL AppStore, or within the core product just to make it easier for users not comfortable with creating their own filetypes.  But I digress…

Example YAML file

I’ve seen a few variants on YAML already but the basic principle for our needs (translatable text extraction) is very similar to JSON in that the text is held in constructs known as “scalars”.

 blog_title: "Bridging the divide, merging segments"
 blog_keywords: merging, paragraph breaks, SDL Trados Studio
 blog_ref: 'For more information <a href=%{info_link}>[Click Here]</a>'

It’s apparently acceptable for the scalers to be contained within single quotes, double quotes or no quotes at all and so far I have examples of each of these.  In fact the sample above uses all three variants.  Reading the specification tells me that it’s also possible to have them in a folded form denoted with >, but I have not come across an example like this yet.  So typically supporting YAML using a custom regex based filetype to suit the examples provided by customers has been trivial.  I can get at the translatable text within the scaler using document structure rules like this (opening pattern followed by closing pattern):

^.*?:\s"
"$

Or this:

^.*?:\s
$

Or this

^.*?:\s'
'$

But then I occasionally came across a file where both single quotes and double quotes were used in the same file… so I added a non-capturing group, “(?:)“, offering the alternatives through the use of the pipe symbol “|“:

^.*?:\s(?:'|")
(?:'|")$

I didn’t come across a file that used no quotes at all, and also combination of quotes… but here’s how I could handle that eventuality:

^.*?:\s(?:'|"|)
(?:'|"|)$

So for the sort of files I’ve come across so far these last pair of opening and closing document structure rules would do the job.  I guess I could have gone straight to this final set, but I thought it might be interesting for anyone playing around with regex for the first time see the iterations.  It also gives you an idea of the sort of  testing you might go through in getting the filetype right… it can be a lot of to-ing and fro-ing.

The other interesting thing about YAML is that it can contain complete html files, or just text marked up with html, or even marked up with script.  The last scalar in my example contains translatable text containing html markup and script.  So I can handle these using Inline tags in Studio and just convert any markup to protected tags.  This is where the purpose of this article really comes in… using the new preview capability.

Preview File

When I go to my File Types options in Studio now I see this at the bottom of the screen:

002

This little addition means I can browse for my test file, in this case multifarious.yml, and with one click see whether the rules I’m creating are extracting the correct text, and also converting inline code/markup to tags.  This replaces this sequence of events:

  1. close the options screen
  2. open the test file for translation
  3. review the content
  4. close the test file
  5. open the options again and apply changes
  6. repeat as needed

In fact the process I outlined there is not even the way many translators/engineers did this in the past.  I have seen people not familiar with the single document process creating a new project each time just to test the filetype settings.  So having this one click preview is a serious timesaver if you are responsible for creating filetypes in your organisation, or even if you do the occasional one and find it requires a lot of to-ing and fro-ing to get it right.  The preview itself is very neat and concise… in my example it looks a little like this:

003

An important point to note is that you can use this feature to check the effects of the settings for any file supported by Studio.  So this is not just a tool for geeky regex loving translators/project managers… it’s really good for preparing files of any kind.

Video

But, I thought the best way to demonstrate this would be by a video as this really shows the benefit, and also how to work through my fabricated YAML filetype in detail.

Video: Length is approx. 8 minutes

For me this is one of the best improvements in this release.  It’s a small thing, but as I do create quite a lot of custom filetypes for various types of files and this is a real timesaver.  In fact it’s also an absolute pleasure to use it!!

12 thoughts on “Getting a filetype preview…

  1. Great new feature, our supporters will love it. The tutorial is really easy to follow, too.
    Maybe it’s only a small detail, but wouldn’t it be preferable to introduce a segmentation hint for the tag? I believe most translators would prefer segmentation at that point.

    1. Thanks Andreas… and I agree with you on the segmentation hint. Studio is driven by the TM in this regard so you need to create a rule in there, but this is more complex, I agree.

  2. This works well, but I think you missed one point here – this way of handling files with software strings is fine when it comes to extracting text between brackets, but it may actually cause some mistranslations. The reason for it is that it’s missing one crucial information, which is the name of the token (translation key). Just an example:

    oil-tank-not-m1-abrams: ‘tank’
    german-phrase-for-was: “war”

    If you extract the text between brackets and skip the keys, without any additional context a translator would only see the “military” idea of the translation. However, if you display the key name next to the translatable string, it would allow avoiding mistranslations. Well, at least some of them, depending on the naming convention. Hopefully this silly example would give you a better view on what I’m referring to.

    Paul, may I trouble you with finding a good solution to this problem in Studio? This problem actually refers to may other file formats as well, where the keys are not structural tags. Even if we manage to display this information in the last column with tags it is not directly visible.

    1. Hi Agenor, I don’t think I missed this point here as the article was about the new preview feature and not really about the nuances of yml translation. The best solution is going to be Passolo, or create a custom filetype in Studio that provides an appropriate preview.

  3. Hello, and thanks for this interesting article.

    Question: what if you have some lines introduced (but not followed) by the > symbol? How can you amend the document structure regex so that those translatable strings would be picked by trados too?

    example:
    title: “this sentence will be picked”
    click-here: this sentence will be picked
    step: this sentence will be picked
    step-1-title: ‘this sentence will be picked’
    step-1-content: >
    this sentence would be not be picked
    step-2-title: this sentence will be picked
    step-2-content: >
    this sentence would not be picked

    Thanks Nicky

  4. Hi!
    Too bad the video is not available anymore. Would you have another video to point to?

    We’re trying to ‘protect’ some %{placehoders} in a yml file and could not acheive this goal yet (using SDL Trados 2021)
    I would really appreciate 🙂 (and if you know any resource/tips on translating Rails applications with SDL trados too!)

    1. Hi Arnaud, the video in the post is working for me. Which video are you referring to?

      On the “Rails applications”. You’d need to share the files you have. I’m not very familiar with them but it does look as though there could be many file formats created from Ruby and the ones I found looked as though they could be handled using the regex delimited filetype. Why don’t you share some samples in the RWS Community?

  5. Many thanks Paul. I was unable to use the YAML filetype in the Appstore so this really helped me out.

    It looks like the backslashes have been lost from all your regexes above though.
    ^.*?:s
    should be
    ^.*?:\s

    I prefer to use character classes so I went for these regexes in the end:
    ^[^:]+:\s*[‘”]*
    [‘”]*\s*$

    Many ways to skin a cat with regex.

    Guy

    PS: Cannot play the video either. I see a black box saying “Dieses Video ist privat.”

    1. Thanks Guy… I have no idea how the backslashes disappeared. I put them back! The video thing is something I am slowly fixing as I come across them. Youtube changed a policy and now all videos have to be public and cannot just be unlisted. Hopefully that also works for you now?

  6. I have an yaml now that has multiline content (including line breaks) when > is preceding the text. I think we need a new way of creating the file settings?. Example:
    news:
    list:
    title: blablabla
    acceptation:
    title: blablabla
    text: >
    blablabla
    blablabla.

    blablabla.
    buttons:
    accept:
    label: blablabla

    1. I wonder if this is really what you meant. Wouldn’t it be this?

      news:
      list:
      title: blablabla
      acceptation:
      title: blablabla
      text: |
      blablabla
      blablabla.
      blablabla.
      buttons:
      accept:
      label: blablabla

      Both your example and mine work for me using the YAML filetype that is out of the box in Trados Studio 2022.

Leave a Reply