On the first day of Christmas my Studio gave to me…
12 Verification SDK and API!
11 QA Checker Profiles
10 Segments to exclude
9 Punctuation checks
8 Regular Expressions
7 Terminology Verification
6 Trademark checks
5 Number checks
4 Segment Verifications
3 Length Verification checks
2 Word Lists
… and no Inconsistencies by default
The Quality Assurance features in Studio are quite extensive, and they are often loved and hated all at the same time. Loved because when used correctly they can provide excellent assurance that you’ll have happy clients… hated because the automated recognition of numbers, dates etc. in Studio follow the settings of your computer and sometimes these are not what you need.
Certainly if you haven’t used them before then I’d recommend you take a little time to review this fairly long list simply by opening the Options and taking a look through the out of the box options available to you.
For the most part you have an extensive range of simple to use checks that allow you do things like checking for the length of your target segment, excluding specific segments from a QA check, report on inconsistent translations, check your punctuation, look up against a word list you define for incorrect usage, QA against a well defined MultiTerm termbase and much more. You can also have different settings for different Projects… so watch out for this if you change the settings in the Options and can’t see the changes in your Project!
I’m not going to look at all of this extensive capability as it’s far too much for me to cover in this one article (The Product Help is useful for an overview). So I’m going to focus this effort on Regular Expressions. “Oh no… not again” you may be thinking! But I have a good reason for this and I’ll start with this example. Let’s assume I’m translating from English (GB) to French (FR), and let’s assume the author of the document I’m handling worked for a French Company and as a standard they always used the same number conventions to avoid confusion within their company irrespective of the language that documentation was written in. So they used numbering conventions in sentences like this for example:
2 823,56 € The costs rose from 1 564,99 € to 1 779.95 €
This type of European convention is not handled kindly when the source language is English as there are two main problems to work around. First the fact that the numbers are incorrectly recognised and secondly how to make sure the target version is correct. Using this text as an example the default settings for number checking might achieve something like this:
The red crosses in the centre column carry this message “Error: Number is missing in target segment or is not properly localized.” and in fact it’s repeated so it appears twice. The reason for this is that Studio expects the source to be written as € 2,823.56 because it’s en(GB). So it actually thinks that the source here contains two separate numbers. You can tell this by the blue underlined numbers in the source side of the editor where Studio finds 9 and 174,91 as opposed to 9 174,91 € . The target in contrast is correctly written for fr(FR) and so the default number check in the QA settings reports a missing number for one, and an incorrect localization for the other… it just looks a little silly because both things are reported with the same comment 😉 Furthermore the second segment is a 79% match based on the first segment being confirmed to the TM. So auto-propagation here is non-existent which means every segment needs to be carefully checked to make sure the numbers have been transposed correctly.
This is different to the experience you would have if the source numbers were written as expected for en(GB) where the first one is found automatically and inserted as an AT and the second is auto-propagated as soon as you confirm the first:
In these examples it would be simple to filter on the numbers first, copy source to target and then lock them. But if you wanted to QA your file realistically, or if the numbers were in the middle of a sentence like this where you still need to get at the translatable text around the numbers then it’s a different matter:
In practice I think the best approach is to edit the source numbers using the SDLXLIFF Toolkit and then you can benefit from auto-substitution and auto-propagation as you work, and the default QA checks would be fine. But if you don’t want to do this, then one way around this would be to disable the number check in the QA settings and create a custom regex rule to check the transposition of the numbers.
To do this we’ll use what’s referred to in Studio as a Grouped search expression – report if source matches but not target. So we can use something like this expression to find the source numbers:
And we can check this against the rearranged target like this:
All this does is say look for any group of numbers followed by a space and a euro symbol, and then remember what they are into a back reference by surrounding the pattern with round brackets. The back reference, as there is only one is $1. You can find more on regular expressions and how to use back references in Studio in these articles I’ve written before (so apologies for the lack of detail explaining this one… but drop a comment below if you need some help) :
Regular Expressions – Part 1
Regex… and “economy of accuracy” (Regular Expressions – Part 2)
Search and replace with Regex in Studio – Regular Expressions Part 3
DOGS and CATS… Regular Expressions Part 4!
In Studio this regex rule would look like this, where I specified the error should report a Note rather than an Error… but this could have been an Error or a Warning:
Now if I run the verification (F8 is the default shortcut) I can see this sort of thing:
The two segments in the first file report no errors despite the numbers not being recognised in the source. The same two segments in the second file report an error (with the Note symbol) because the first segment has one of the numbers written in a different format to the source, and the second omitted the thousand value from the first number in the sentence. So I could now use the QA to check and correct my documents even though the numbers are non-standard. Pretty neat, and the same would go for dates, measurements… anything you liked.
You can use the Regular Expressions custom checks for checking pretty much anything which gives you a lot of flexibility if you learn a little about regex… I may have said this once or twice in the past! The conditions you can apply are these:
Using the appropriate one you can check for things like these:
- Check whether a tabulator was deleted in the target
- is the regex for a tabulator, so put this in the RegEx source and RegEx target
- Report if source matches but not the target
- Check whether measurements have been entered in the target segment without a non breaking space
- As an example you could use something like d (km|mcm|mm) where you can add as many units as you like separated by the pipe symbol
- Report if target matches (target check only)
- Another old favourite… check for the use of dumb quotes in the translation
- Just put “ into the RegEx target… not too tricky!
- Report if target matches (target check only)
- Product codes… these will always be specific to your customer but often based on a pattern of some sort. So for example, if you had codes like these P234-89J, K187-12J, L882-65T and you want to make sure they are transposed correctly in the target
- For these patterns you could use something like this in the RegEx source
- Then check for $1 in the RegEx target
- Grouped search expression – report if source and target matches
- For these patterns you could use something like this in the RegEx source
A final point to note is that once you have built up your extensive library of custom QA checks you can save them somewhere safe, or share them with a colleague, by exporting your QA settings from here:
I could probably dream up a pretty long list of useful checks… and you could too. It would be quite interesting to see the sort of thing you use the QA Checker for in the comments. I’ve got examples of some fantastic checks from translators over the years, so maybe you’ll share a few ideas here… would make some good stocking fillers 😉
Have a great Xmas!
15 thoughts on “The 12 QA checks of Christmas…”
One of my colleagues wrote this regular expression, which filters out segments not containing any letters. So, you can have digits, of course, but also brackets and other signs like +, =, etc. Use it in the Display Filter toolbar, then you can select the filtered out segments, copy source to target for all of them, change their status to ‘Translated’, and finally lock them. If you have tables with lots of figures, it might make your life much easier.
I thought I would share this with you just to say ‘thank you’ for all the great ideas you provide us with.
Great stuff… thanks for sharing this Paula. These expressions really can be very powerful and have infinite flexibility, so it’s great to see examples of them being used by translators.
And thank you!
Hi I had a question about the “Word List”.
In Trados 2011, if I do not escape unmatched parentheses, for example “in)”, it throws a parser error.
However, with 2014, it does not throw this error. Was this a bug in 2011? Or should special characters be escaped in the Word List possibly due to regular expressions being on by default?
Hi, I can’t reproduce this problem in 2011 on the basis of the wordlist. Perhaps, if it’s important, you can provide more steps to help me?
Sorry my explanation was insufficient.
Steps to reproduce in 2011:
1) Add “in)” to “Wrong Form” and “in.)” to “Correct Form” in QA Checker Word List for any Project
* Note the quotes above should not be entered in Word List
2) Make sure you have “Enable verification of segment” turned on under Editor options
3) Open up a translation file for the Project and try to confirm segment by hitting “Ctrl + Enter”.
Interestingly I get the same error with 2014. The wordlist is not supposed to be based on regex. I’ll log it with development for investigation and let you know the outcome.
Thank you for the help!
And yes you are right, it occurs in 2014 also, I didn’t realize “Enable verification of segment” was turned off…
Also, in the future, what would be the best way to report bugs? I have a few others that I would like to report.
The best way is through a support contract. Not everything reported is a bug and it takes time for someone to investigate and qualify reported issues, the support contract is the way to do this in an equitable way. If you don’t have a support contract then for now the best approach would be to go through TW_Users or ProZ where SDL monitor and participate on a regular basis.
Is there any way to include inline tags as part of the RegEx check?
You may be wondering why I’m asking so here is an example:
I have an excel file that contains HTML content which I have marked up within the Excel Filter.
One of the translators didn’t understand the list tag … and maybe thought it was a formatting tag like italic, and has put commas after the tag like this:
some text , some more text , and so on…
So this will look like this when exported:
some more text
I tried looking for “,” in the Target but it didn’t work, and I’m guessing its because the has been tagged as an inline tag in my Filter so its probably ignored by the QA checks. Before I pull what’s left of my hair out, would you be able to confirm if this is possible or not?
Also on a some what related note :-), is there anyway to prevent the Translator from using the Trados Bold/Underline/Italic Formatting tags from the Ribbon? I found some of them use these instead of using the correct inline tags that exists in the source.
I love your blogs by the way.
Hi Anthony, glad you like the blog! I like writing it!
On your first question… Can you save the target Excel file and then just search and replace in Excel? The tags are all text in there so it might not be too hard to achieve what you need?
On the second… only a big hammer which we may be able to arrange with every new license of Studio 2014 😉 Seriously, these are controlled by custom XML filetypes, but other recognisable filetypes do allow the translator this flexibility because sometimes they need to add in formatting that wasn’t present in the first place. But like you, I would like to see the ability to make this option “no option”!
Haha, I have wished for a big hammer before 🙂
On the first question, that’s what I had to do, perform some checks in Excel afterwards, but I would have liked to be able to do this within Studio QA checks or Srgment Filter, or have the Translator do it. So I guess the answer is no, that I can’t included the Internal Tags as part of the RegEx checks.
On the second question, I was afraid of that. As you know clients tend not to follow the standard file formats, and they like to paste the entire contents of programming code, or properties files, or html content into Excel files for translation, instead of just sending the original source file instead. They may think its easier for us to translate or better for them to protect the contents, but whatever the reason it has knock-on effects, as in Studio thinks rightly so that its acceptable to allow Excel formatting in an excel file :-), but for the clients contents it may not be acceptable as only html formatting may be allowed. Anyway its just another one of those things we have to deal with.
By the way, for anyone else reading this blog I was able to check for these extra formatting tags by disabling the “Ignore Formatting Tags” option (which is on by default) in the Excel file filter “Tag Check” options.
Thanks for the quick reply Paul,
Is there a verification check where we could see if two different source segments were translated using the same target segment? For example, if you have two source segments such as “In the past 12 months, have you been prescribed…” and “In the past 12 months, have you been diagnosed with…”, we would like a check to make sure that the translations for these two segments are not the same. Sometimes, it seems as though a translator copies and pastes the same translation into another segment because they missed that the source segments were actually different.
There is an option in Studio to make sure that the same source is translated consistently, but I didn’t see the option for the situation I mentioned above. Any ideas?
In Studio I don’t think there is one. But you could use something like Verifika, which has a plugin on the OpenExchange for Studio, and this has the ability to provide Source AND/OR Target inconsistency checks. Xbench also offers this capability and there is a plugin via the OpenExchange for this too.
I believe it would be a great idea if you could sort the Regex Expressions automatically. I have over 40 now and, when I discover one that was not implemented perfectly (discovering new exceptions to the rules, for instance) searching for them is a real pain!
I totally agree!