Getting a filetype preview…

001One of my favourite features in Studio 2017 is the filetype preview.  The time it can save when you are creating custom filetypes comes from the fun in using it.  I can fill out all the rules and switch between the preview and the rules editor without having to continually close the options, open the file, see if it worked and then close the file and go back to the options again… then repeat from the start… again… and again…   I guess it’s the little things that keep us happy!

I decided to look at this using a YAML file as this seems to be coming up quite a bit recently.  YAML, pronounced “Camel”, stands for “YAML Ain’t Markup Language” and I believe it’s a superset of the JSON format, but with the goal of making it more human readable.  The specification for YAML is here, YAML Specification, and to do a really thorough job I guess I could try and follow the rules set out.  But in practice I’ve found that creating a simple Regular Expression Delimited Text filetype based on the sample files I’ve seen has been the key to handling this format.  Looking ahead I think it would be useful to see a filetype created either as a plugin through the SDL AppStore, or within the core product just to make it easier for users not comfortable with creating their own filetypes.  But I digress…

Example YAML file

I’ve seen a few variants on YAML already but the basic principle for our needs (translatable text extraction) is very similar to JSON in that the text is held in constructs known as “scalars”.

 blog_title: "Bridging the divide, merging segments"
 blog_keywords: merging, paragraph breaks, SDL Trados Studio
 blog_ref: 'For more information <a href=%{info_link}>[Click Here]</a>'

It’s apparently acceptable for the scalers to be contained within single quotes, double quotes or no quotes at all and so far I have examples of each of these.  In fact the sample above uses all three variants.  Reading the specification tells me that it’s also possible to have them in a folded form denoted with >, but I have not come across an example like this yet.  So typically supporting YAML using a custom regex based filetype to suit the examples provided by customers has been trivial.  I can get at the translatable text within the scaler using document structure rules like this (opening pattern followed by closing pattern):

^.*?:\s"
"$

Or this:

^.*?:\s
$

Or this

^.*?:\s'
'$

But then I occasionally came across a file where both single quotes and double quotes were used in the same file… so I added a non-capturing group, “(?:)“, offering the alternatives through the use of the pipe symbol “|“:

^.*?:\s(?:'|")
(?:'|")$

I didn’t come across a file that used no quotes at all, and also combination of quotes… but here’s how I could handle that eventuality:

^.*?:\s(?:'|"|)
(?:'|"|)$

So for the sort of files I’ve come across so far these last pair of opening and closing document structure rules would do the job.  I guess I could have gone straight to this final set, but I thought it might be interesting for anyone playing around with regex for the first time see the iterations.  It also gives you an idea of the sort of  testing you might go through in getting the filetype right… it can be a lot of to-ing and fro-ing.

The other interesting thing about YAML is that it can contain complete html files, or just text marked up with html, or even marked up with script.  The last scalar in my example contains translatable text containing html markup and script.  So I can handle these using Inline tags in Studio and just convert any markup to protected tags.  This is where the purpose of this article really comes in… using the new preview capability.

Preview File

When I go to my File Types options in Studio now I see this at the bottom of the screen:

002

This little addition means I can browse for my test file, in this case multifarious.yml, and with one click see whether the rules I’m creating are extracting the correct text, and also converting inline code/markup to tags.  This replaces this sequence of events:

  1. close the options screen
  2. open the test file for translation
  3. review the content
  4. close the test file
  5. open the options again and apply changes
  6. repeat as needed

In fact the process I outlined there is not even the way many translators/engineers did this in the past.  I have seen people not familiar with the single document process creating a new project each time just to test the filetype settings.  So having this one click preview is a serious timesaver if you are responsible for creating filetypes in your organisation, or even if you do the occasional one and find it requires a lot of to-ing and fro-ing to get it right.  The preview itself is very neat and concise… in my example it looks a little like this:

003

An important point to note is that you can use this feature to check the effects of the settings for any file supported by Studio.  So this is not just a tool for geeky regex loving translators/project managers… it’s really good for preparing files of any kind.

Video

But, I thought the best way to demonstrate this would be by a video as this really shows the benefit, and also how to work through my fabricated YAML filetype in detail.

Video: Length is approx. 8 minutes

For me this is one of the best improvements in this release.  It’s a small thing, but as I do create quite a lot of custom filetypes for various types of files and this is a real timesaver.  In fact it’s also an absolute pleasure to use it!!

6 comments
  1. -a said:

    Great new feature, our supporters will love it. The tutorial is really easy to follow, too.
    Maybe it’s only a small detail, but wouldn’t it be preferable to introduce a segmentation hint for the tag? I believe most translators would prefer segmentation at that point.

    Like

    • Thanks Andreas… and I agree with you on the segmentation hint. Studio is driven by the TM in this regard so you need to create a rule in there, but this is more complex, I agree.

      Like

      • Ankit said:

        Thanks Paul for sharing this article.

        Liked by 1 person

  2. Agenor Hofmann-Delbor said:

    This works well, but I think you missed one point here – this way of handling files with software strings is fine when it comes to extracting text between brackets, but it may actually cause some mistranslations. The reason for it is that it’s missing one crucial information, which is the name of the token (translation key). Just an example:

    oil-tank-not-m1-abrams: ‘tank’
    german-phrase-for-was: “war”

    If you extract the text between brackets and skip the keys, without any additional context a translator would only see the “military” idea of the translation. However, if you display the key name next to the translatable string, it would allow avoiding mistranslations. Well, at least some of them, depending on the naming convention. Hopefully this silly example would give you a better view on what I’m referring to.

    Paul, may I trouble you with finding a good solution to this problem in Studio? This problem actually refers to may other file formats as well, where the keys are not structural tags. Even if we manage to display this information in the last column with tags it is not directly visible.

    Like

    • Hi Agenor, I don’t think I missed this point here as the article was about the new preview feature and not really about the nuances of yml translation. The best solution is going to be Passolo, or create a custom filetype in Studio that provides an appropriate preview.

      Like

  3. Nicola said:

    Hello, and thanks for this interesting article.

    Question: what if you have some lines introduced (but not followed) by the > symbol? How can you amend the document structure regex so that those translatable strings would be picked by trados too?

    example:
    title: “this sentence will be picked”
    click-here: this sentence will be picked
    step: this sentence will be picked
    step-1-title: ‘this sentence will be picked’
    step-1-content: >
    this sentence would be not be picked
    step-2-title: this sentence will be picked
    step-2-content: >
    this sentence would not be picked

    Thanks Nicky

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: