Getting a filetype preview…

001 One of my favourite features in Studio 2017 is the filetype preview. The time it can save when you are creating custom filetypes comes from the fun in using it. I can fill out all the rules and switch between the preview and the rules editor without having to continually close the options, open the file, see if it worked and then close the file and go back to the options again… then repeat from the start… again… and again… I guess it’s the little things that keep us happy!

I decided to look at this using a YAML file as this seems to be coming up quite a bit recently. YAML, pronounced “Camel”, stands for “YAML Ain’t Markup Language” and I believe it’s a superset of the JSON format, but with the goal of making it more human readable. The specification for YAML is here, YAML Specification, and to do a really thorough job I guess I could try and follow the rules set out. But in practice I’ve found that creating a simple Regular Expression Delimited Text filetype based on the sample files I’ve seen has been the key to handling this format. Looking ahead I think it would be useful to see a filetype created either as a plugin through the SDL AppStore, or within the core product just to make it easier for users not comfortable with creating their own filetypes. But I digress…

Contents

Example YAML file
Preview File
Video

Example YAML file

I’ve seen a few variants on YAML already but the basic principle for our needs (translatable text extraction) is very similar to JSON in that the text is held in constructs known as “scalars”.

 blog_title: "Bridging the divide, merging segments"
 blog_keywords: merging, paragraph breaks, SDL Trados Studio
 blog_ref: 'For more information <a href=%{info_link}>[Click Here]</a>'

It’s apparently acceptable for the scalers to be contained within single quotes, double quotes or no quotes at all and so far I have examples of each of these. In fact the sample above uses all three variants. Reading the specification tells me that it’s also possible to have them in a folded form denoted with >, but I have not come across an example like this yet. So typically supporting YAML using a custom regex based filetype to suit the examples provided by customers has been trivial. I can get at the translatable text within the scaler using document structure rules like this (opening pattern followed by closing pattern):

^.*?:\s"
"$

Or this:

^.*?:\s
$

Or this

^.*?:\s'
'$

But then I occasionally came across a file where both single quotes and double quotes were used in the same file… so I added a non-capturing group, “(?:)“, offering the alternatives through the use of the pipe symbol “|“:

^.*?:\s(?:'|")
(?:'|")$

I didn’t come across a file that used no quotes at all, and also combination of quotes… but here’s how I could handle that eventuality:

^.*?:\s(?:'|"|)
(?:'|"|)$

So for the sort of files I’ve come across so far these last pair of opening and closing document structure rules would do the job. I guess I could have gone straight to this final set, but I thought it might be interesting for anyone playing around with regex for the first time see the iterations. It also gives you an idea of the sort of testing you might go through in getting the filetype right… it can be a lot of to-ing and fro-ing.

The other interesting thing about YAML is that it can contain complete html files, or just text marked up with html, or even marked up with script. The last scalar in my example contains translatable text containing html markup and script. So I can handle these using Inline tags in Studio and just convert any markup to protected tags. This is where the purpose of this article really comes in… using the new preview capability.

Preview File

When I go to my File Types options in Studio now I see this at the bottom of the screen:

This little addition means I can browse for my test file, in this case multifarious.yml, and with one click see whether the rules I’m creating are extracting the correct text, and also converting inline code/markup to tags. This replaces this sequence of events:

close the options screen
open the test file for translation
review the content
close the test file
open the options again and apply changes
repeat as needed

In fact the process I outlined there is not even the way many translators/engineers did this in the past. I have seen people not familiar with the single document process creating a new project each time just to test the filetype settings. So having this one click preview is a serious timesaver if you are responsible for creating filetypes in your organisation, or even if you do the occasional one and find it requires a lot of to-ing and fro-ing to get it right. The preview itself is very neat and concise… in my example it looks a little like this:

003

An important point to note is that you can use this feature to check the effects of the settings for any file supported by Studio. So this is not just a tool for geeky regex loving translators/project managers… it’s really good for preparing files of any kind.

Video

But, I thought the best way to demonstrate this would be by a video as this really shows the benefit, and also how to work through my fabricated YAML filetype in detail.

Video: Length is approx. 8 minutes

For me this is one of the best improvements in this release. It’s a small thing, but as I do create quite a lot of custom filetypes for various types of files and this is a real timesaver. In fact it’s also an absolute pleasure to use it!!

12 thoughts on “Getting a filetype preview…”

Great new feature, our supporters will love it. The tutorial is really easy to follow, too.
Maybe it’s only a small detail, but wouldn’t it be preferable to introduce a segmentation hint for the tag? I believe most translators would prefer segmentation at that point.

paulfilkin says:

December 19, 2016 at 10:07 am

Thanks Andreas… and I agree with you on the segmentation hint. Studio is driven by the TM in this regard so you need to create a rule in there, but this is more complex, I agree.

Loading...

Reply
1. Ankit says:
  
  January 5, 2017 at 6:19 am
  
  Thanks Paul for sharing this article.
  
  Loading...
  
  Reply

This works well, but I think you missed one point here – this way of handling files with software strings is fine when it comes to extracting text between brackets, but it may actually cause some mistranslations. The reason for it is that it’s missing one crucial information, which is the name of the token (translation key). Just an example:

oil-tank-not-m1-abrams: ‘tank’
german-phrase-for-was: “war”

If you extract the text between brackets and skip the keys, without any additional context a translator would only see the “military” idea of the translation. However, if you display the key name next to the translatable string, it would allow avoiding mistranslations. Well, at least some of them, depending on the naming convention. Hopefully this silly example would give you a better view on what I’m referring to.

Paul, may I trouble you with finding a good solution to this problem in Studio? This problem actually refers to may other file formats as well, where the keys are not structural tags. Even if we manage to display this information in the last column with tags it is not directly visible.

paulfilkin says:

May 11, 2017 at 9:40 pm

Hi Agenor, I don’t think I missed this point here as the article was about the new preview feature and not really about the nuances of yml translation. The best solution is going to be Passolo, or create a custom filetype in Studio that provides an appropriate preview.

Loading...

Reply

Hello, and thanks for this interesting article.

Question: what if you have some lines introduced (but not followed) by the > symbol? How can you amend the document structure regex so that those translatable strings would be picked by trados too?

example:
title: “this sentence will be picked”
click-here: this sentence will be picked
step: this sentence will be picked
step-1-title: ‘this sentence will be picked’
step-1-content: >
this sentence would be not be picked
step-2-title: this sentence will be picked
step-2-content: >
this sentence would not be picked

Thanks Nicky

Hi!
Too bad the video is not available anymore. Would you have another video to point to?

We’re trying to ‘protect’ some %{placehoders} in a yml file and could not acheive this goal yet (using SDL Trados 2021)
I would really appreciate 🙂 (and if you know any resource/tips on translating Rails applications with SDL trados too!)

paulfilkin says:

November 5, 2021 at 5:41 pm

Hi Arnaud, the video in the post is working for me. Which video are you referring to?

On the “Rails applications”. You’d need to share the files you have. I’m not very familiar with them but it does look as though there could be many file formats created from Ruby and the ones I found looked as though they could be handled using the regex delimited filetype. Why don’t you share some samples in the RWS Community?

Loading...

Reply

Many thanks Paul. I was unable to use the YAML filetype in the Appstore so this really helped me out.

It looks like the backslashes have been lost from all your regexes above though.
^.*?:s
should be
^.*?:\s

I prefer to use character classes so I went for these regexes in the end:
^[^:]+:\s*[‘”]*
[‘”]*\s*$

Many ways to skin a cat with regex.

Guy

PS: Cannot play the video either. I see a black box saying “Dieses Video ist privat.”

paulfilkin says:

April 25, 2022 at 11:08 pm

Thanks Guy… I have no idea how the backslashes disappeared. I put them back! The video thing is something I am slowly fixing as I come across them. Youtube changed a policy and now all videos have to be public and cannot just be unlisted. Hopefully that also works for you now?

Loading...

Reply

I have an yaml now that has multiline content (including line breaks) when > is preceding the text. I think we need a new way of creating the file settings?. Example:
news:
list:
title: blablabla
acceptation:
title: blablabla
text: >
blablabla
blablabla.

blablabla.
buttons:
accept:
label: blablabla

Paul Filkin says:

August 19, 2022 at 5:58 pm

I wonder if this is really what you meant. Wouldn’t it be this?
news: list: title: blablabla acceptation: title: blablabla text: | blablabla blablabla. blablabla. buttons: accept: label: blablabla
Both your example and mine work for me using the YAML filetype that is out of the box in Trados Studio 2022.

Loading...

Reply

-a says:

December 19, 2016 at 9:56 am

Great new feature, our supporters will love it. The tutorial is really easy to follow, too.
Maybe it’s only a small detail, but wouldn’t it be preferable to introduce a segmentation hint for the tag? I believe most translators would prefer segmentation at that point.

Loading...

1. paulfilkin says:
  
  December 19, 2016 at 10:07 am
  
  Thanks Andreas… and I agree with you on the segmentation hint. Studio is driven by the TM in this regard so you need to create a rule in there, but this is more complex, I agree.
  
  Loading...
  
  1. Ankit says:
    
    January 5, 2017 at 6:19 am
    
    Thanks Paul for sharing this article.
    
    Loading...
    
Agenor Hofmann-Delbor says:

May 11, 2017 at 12:58 pm

This works well, but I think you missed one point here – this way of handling files with software strings is fine when it comes to extracting text between brackets, but it may actually cause some mistranslations. The reason for it is that it’s missing one crucial information, which is the name of the token (translation key). Just an example:

oil-tank-not-m1-abrams: ‘tank’
german-phrase-for-was: “war”

If you extract the text between brackets and skip the keys, without any additional context a translator would only see the “military” idea of the translation. However, if you display the key name next to the translatable string, it would allow avoiding mistranslations. Well, at least some of them, depending on the naming convention. Hopefully this silly example would give you a better view on what I’m referring to.

Paul, may I trouble you with finding a good solution to this problem in Studio? This problem actually refers to may other file formats as well, where the keys are not structural tags. Even if we manage to display this information in the last column with tags it is not directly visible.

Loading...

1. paulfilkin says:
  
  May 11, 2017 at 9:40 pm
  
  Hi Agenor, I don’t think I missed this point here as the article was about the new preview feature and not really about the nuances of yml translation. The best solution is going to be Passolo, or create a custom filetype in Studio that provides an appropriate preview.
  
  Loading...
  
Nicola says:

May 20, 2017 at 10:42 pm

Hello, and thanks for this interesting article.

Question: what if you have some lines introduced (but not followed) by the > symbol? How can you amend the document structure regex so that those translatable strings would be picked by trados too?

example:
title: “this sentence will be picked”
click-here: this sentence will be picked
step: this sentence will be picked
step-1-title: ‘this sentence will be picked’
step-1-content: >
this sentence would be not be picked
step-2-title: this sentence will be picked
step-2-content: >
this sentence would not be picked

Thanks Nicky

Loading...

arnaud sellenet says:

November 3, 2021 at 4:04 pm

Hi!
Too bad the video is not available anymore. Would you have another video to point to?

We’re trying to ‘protect’ some %{placehoders} in a yml file and could not acheive this goal yet (using SDL Trados 2021)
I would really appreciate 🙂 (and if you know any resource/tips on translating Rails applications with SDL trados too!)

Loading...

1. paulfilkin says:
  
  November 5, 2021 at 5:41 pm
  
  Hi Arnaud, the video in the post is working for me. Which video are you referring to?
  
  On the “Rails applications”. You’d need to share the files you have. I’m not very familiar with them but it does look as though there could be many file formats created from Ruby and the ones I found looked as though they could be handled using the regex delimited filetype. Why don’t you share some samples in the RWS Community?
  
  Loading...
  
Guy Knight-Jones says:

April 25, 2022 at 8:55 pm

Many thanks Paul. I was unable to use the YAML filetype in the Appstore so this really helped me out.

It looks like the backslashes have been lost from all your regexes above though.
^.*?:s
should be
^.*?:\s

I prefer to use character classes so I went for these regexes in the end:
^[^:]+:\s*[‘”]*
[‘”]*\s*$

Many ways to skin a cat with regex.

Guy

PS: Cannot play the video either. I see a black box saying “Dieses Video ist privat.”

Loading...

1. paulfilkin says:
  
  April 25, 2022 at 11:08 pm
  
  Thanks Guy… I have no idea how the backslashes disappeared. I put them back! The video thing is something I am slowly fixing as I come across them. Youtube changed a policy and now all videos have to be public and cannot just be unlisted. Hopefully that also works for you now?
  
  Loading...
  
Vitor Hugo says:

August 11, 2022 at 5:07 pm

I have an yaml now that has multiline content (including line breaks) when > is preceding the text. I think we need a new way of creating the file settings?. Example:
news:
list:
title: blablabla
acceptation:
title: blablabla
text: >
blablabla
blablabla.

blablabla.
buttons:
accept:
label: blablabla

Loading...

1. Paul Filkin says:
  
  August 19, 2022 at 5:58 pm
  
  I wonder if this is really what you meant. Wouldn’t it be this?
  news: list: title: blablabla acceptation: title: blablabla text: | blablabla blablabla. blablabla. buttons: accept: label: blablabla
  Both your example and mine work for me using the YAML filetype that is out of the box in Trados Studio 2022.
  
  Loading...

Getting a filetype preview…

Example YAML file

Preview File

Video

Related

Published by Paul Filkin

12 thoughts on “Getting a filetype preview…”

Leave a ReplyCancel reply