When I started to look at the subtitling industry little did I know just how fragmented it would be! For years we have talked about SRT and yet when I look at the filetypes that tools like Subtitle Edit claim to support I find over 200! Normally I’m not a big fan of standards but that’s probably because I live in a world where there is little variation and supporting different bilingual files is trivial in comparison. But if there was ever a good argument for one it would be here! Asking people what format they see most often does help to narrow it down, but as we often find when developing software, the interest usually comes after the event and not before! So what formats can a translation tool support today?
Subtitling in most translation environments has long been a context free affair, limited to handling a subtitle file as a text only exercise. There has been the odd exception to this and Star Transit have offered the ability to see a video and play it synchronously with the text in their translation editor for some years, slowly extending their support to include SRT, VTT, webVTT and a TXT formats for the subtitle file (as far as I know). memoQ recently launched a video preview as well, I think with SRT support (I’m not sure here as it’s not easily obtained or installed). We (SDL) only offerred support for SRT in terms of extracting the translatable text and giving you a static preview that showed you the timecodes and the text. Other tool vendors, and for other file formats, often rely on the text in a subtitle file being copied into Microsoft Word where the time-codes and other meta information can be hidden allowing the translator to focus on the translatable text… tools like Tortoise tagger for example can be helpful in preparing the files for translation. But none of them provide contextual previews of the video with embedded subtitles supporting positional and formatting information, and none of them provide any useful quality controls for subtitling other than line length which is based on the standard QA checks in most translation tools. This week SDL released some new plugins onto their appstore that will no doubt kick off some innovation in this area as the need for better audio visual localization tools increases.
Can you use these new plugins for the full Subtitling workflow?
What is a “full subtitling workflow”? I think a localization project for a translator/Agency could be all or part of the following basic workflow:
Very high level, but probably covers the scenarios relevant here as I’m not looking at video editing outside of the subtitling requirement. Spotting and Adaptation are outside the scope of these plugins. These activities involve creating/editing the time-codes, repositioning subtitles and things like that. SDL Trados Studio is a localization tool for traditional translation projects and the plugins that have been developed support that aspect. “TEP” is the standard Translation, Editing, Proofing workflow in most localization projects and “Simulation” is the process of being able to see the translation in context and review it appropiately. The plugins provide the following benefits:
- more subtitle filetype support
- translation with context
- dynamic QA
- translation quality assessment
Having said this, the development may not end here. If if makes sense for specialist subtitlers to be able to do more in the translation environment, and if we can sensibly support it, then I’d love to see the plugins evolve. But for now I think the objective is to get feedback and see what enhancements make the most sense and then try to address them.
What’s New for SDL Trados Studio?
The innovation here is available through plugins that are freely downloadable from the SDL AppStore and they will only work with SDL Trados Studio 2019. There are actually five downloadables which I’ll comment on below as well as providing a link to the SDL AppStore where they can be downloaded:
- Subtitling Preview plugin
- webVTT filetype plugin
- SBV filetype plugin
- STL (Spruce) filetype plugin
- TQA profile using the FAR methodology
Studio Subtitling : this is the core plugin that provides the ability to preview the video and can use the spotting, formattting and positional information (to some extent) in the new filetypes you can find below. It also supports the SRT filetype which is already available out of the box in Studio. It provides a view like this:
A little cramped on the screen, but you’ll get the idea and of course you can move this window and position it anywhere you’d like including on another screen if you have two monitors. You have playback controls and real context while working with your Translation Memories and glossaries as well as all the other benefits of working with a translation tool rather than a subtitle editing tool. The preview itself provides a lot of information including the ability to read right to left in an inverted table view for BiDi languages. I don’t know if this will be desirable or not but we thought it might be when we were making sure we had proper unicode support for all languages:
This particular filetype is actually SRT which is already available in Studio and the preview can use the allowable formatting for this filetype, including colours as you can see. The table itself provides information relating to the content of the file which is updated as you translate it, the words per minute (WPM) speed, characters per second (CPS), number of words in the segment, number of characters, the start and end times for the subtitle and the Studio segment ID. All these columns are sortable which is quite a handy feature since the Studio Editor also follows the activate segment in the table if you navigate using the table instead. Probably not handy at all for movies (if you were translating one with this tool) but possibly useful for more technical videos… who knows! The question of whether you can translate movies with a tool like this isn’t one I’m going to comment on other than to say I think it’s horses for courses… if you prefer to work in a subtitle editing tool because you need to create the source subtitle file in the first place, change timecodes, merge subtitles etc. then you should do that. But I think once you have the source, and then had to manage a project into 50 languages then this could be a god send irrespective of the type of content. Certainly the quality of subtitling in many of the films I see in Netflix and Amazon today leaves a lot to be desired and I doubt this has anything to do with the choice of tools being used.
There are also some useful settings for the preview which give you great control over the text format and the background colour/transparency that you see when working. These will not be changed in the target file, this is only for the preview:
I don’t think I need to explain these as they are all fairly obvious from the image, but I would make special mention of the Time-code part. The time-code in subtitle files can be written in milliseconds or by frame. It’s always displayed in milliseconds in the preview table and it’s always read as milliseconds unless the subtitle file contains information to suggest otherwise. But getting this wrong would reduce the effectiveness of the spotting checks and could affect the calculations for QA So for this reason it is possible to change the format to Frames instead of Milliseconds. To do this the plugin reads the frame rate of the video you have loaded and then checks to ensure that the fractional seconds (to be identified as frames) cannot exceed the frame rate for any of the time code entries in the segments of the document. If you incorrectly try to make this change you’ll receive a warning like this:
If you do correctly change the time-code then the preview window will change to display the new ones, but this time having converted the code read from the file into milliseconds. If you want to know more about this it’s worth checking out the WIKI in the SDL Community for this plugin. You’ll find example calculations in there showing you exactly how this is done.
Alongside the preview there are some additional QA checks provided through the verification feature in Studio. They look like this and I don’t think I need to explain a lot more than that:
Very handy while translating because you’ll be automatically notified if you exceed the boundaries of any of these checks. One thing I will note is that we added these checks as a minimum for now. We were not too sure what translators/project managers would like to see controlled. So if you have suggestions of other things it would be important to measure and control, take advantage of the new Subtitling forum we created today in the SDL Community. In fact we’d welcome any feedback on these new features for Trados Studio.
webVTT filetype : this filetype is based on the specification here but not all things are fully supported. For example, the webVTT format supports the use of CSS stylesheets and this is not supported at the moment. It also supports positional information that can control where the subtitles are displayed on the screen. This is partially supported, so something like this for example:
00:00:33.550 --> 00:00:36.700 position:50% line:30% We realised quickly that we want to improve
would be displayed like this in the preview:
Some of the parameters are fully supported, some only partially, and Region is not supported at all. Formatting information for internal tag support is partially supported, So the formatting for the italics tag (<i></i>), the bold tag (<b></b>) and the underline tag (<u></u>) are fully supported and displayed as you’d expect. All other supported tags in this format are converted to tags in the editor but the formatting is not supported.
SBV filetype : this is a somewhat simpler format and the filetype supports basic formatting such as the italics tag (<i></i>), the bold tag (<b></b>) and the underline tag (<u></u>). Anything else is seen as a tag and treated accordingly. For example:
Here you can see the source text with three tag types, bold, italic and font. The bold and italic is rendered as formatted text in the Studio Editor and in the video preview. The font tags are handled in the Studio Editor so they are protected but they are not reflected as formatting in the video preview or the Studio Editor. There are no controls for positioning in this format as far as I am aware.
STL (Spruce) filetype : the STL format is a good example of the fragmentation in this industry at the moment. I’m aware of three completely different STL formats. EBU STL which is a binary format that was created by the European Broadcasting Union and as far as I’m aware is mostly replaced by EBU-TT (Timed text) which is an XML format. Having said that the second post we saw on the SDL Community after launching these plugins yesterday related to an STL file that could not be opened. It was a binary EBU-STL file! The other one I’m aware of is DVD Pro STL, which I believe was based on Astarte’s DVDirector that was purchased by Apple around 2000… but I’m not sure. This format is a little closer to Spruce STL… well, it’s text based and has some similarities, but is closer in a loose sense of the word!
The Spruce STL format, which is supported here, can contain information controlling font type/size, character attributes, positional control, contrast and effects. However, none of this is supported in the plugin for Studio. It only supports extraction of the translatable text and tag recognition for any inline markup. The time-codes are recognised of course, but the different controls relating to the other things are ignored. I’m not sure where to find a proper specification for this format but there is an old archive here that seems to provide some idea.
TQA profile for subtitling : this is not really a plugin. TQA, or Translation Quality Assessment, is a feature of SDL Trados Studio Professional (Freelance users who receive a package containing TQA are able to use it). You can create your own TQA profiles or edit an existing one and you can also import or export a profile. Out of the box Studio comes with TQA profiles for TAUS DQF, SAE J2450, MQM Core and the old LISA QA model. As far as I’m aware there is no standard for TQA in a subtitling localization process, but with the help of an internet friend (google) I found a paper written by a Professor Jan Pedersen of Stockholm University that translates nicely into the TQA features for Studio. The abstract of the paper explains this nicely:
The FAR model assesses subtitle quality in three areas: Functional equivalence (do the subtitles convey speaker meaning?); Acceptability (do the subtitles sound correct and natural in the target language?); and Readability (can the subtitles be read in a fluent and non-intrusive way?). The FAR model is based on error analysis and has a penalty score system that allows the assessor to pinpoint which area(s) need(s) improvement, which should make it useful for education and feedback.
So I created the profile like this and made it available on the SDL AppStore for anyone to download and use:
There’s a little more detail than this but you’ll get the idea from these two images of how the error analysis and penalty score could fit into here quite nicely. When I first did this, several years ago, I visited Professor Pedersen at the University where he kindly went through the model with me and clarified anything I was unsure of. Studio was missing context for the video itself, so it wan’t possible to check for Readability. But now with the Studio Subtitling plugin this becomes a really useful feature for assisting with improving the process of managing quality for intralingual subtitling.
On merging/splitting segments
On a final note I wanted to mention merging and splitting segments because this came up a lot in a webinar we ran when we launched the plugin. Splitting segments is fine from a translation perspective, it may still give an undesirable result, and this is handled nicely by the plugin. But merging correctly is not supported because we don’t combine timecodes to create a new one. So the first point to note is that you might come across this message if you attempt to activate merging across paragraph breaks:
If you ignore it and activate merging in this scenario anyway then you might end up with something like this where I merged segments #1 and #2, and then #3 and #4; and also split segments #5 and #6:
I can do it if I set the project up to allow merging to take place across paragraph breaks. But the resultant target file based on what you see above looks like this:
1 00:00:13,150 --> 00:00:16,050 <b><i>El equipo de experiencia del desarrollador</i></b> empezó a tomar forma: En torno a 2014, cuando me incorporé<u> a SDL</u> . 2 00:00:16,050 --> 00:00:19,350 3 00:00:19,350 --> 00:00:22,250 Esto es algo <font color="red"> completamente nuevo</font><font color="orange"> y único </font> 4 00:00:22,250 --> 00:00:23,550 5 00:00:23,550 --> 00:00:28,450 - so in the beginning it was more a series of doing experiments 6 00:00:28,450 --> 00:00:31,000 - and trying to understand where we want to go
You can see that the split segments are still in the same time-code in #5 and #6. But the merged segments have caused segments #2 and #4 to now be empty. In both cases this might not be what was desired for several reasons:
- if you wanted #1 to start at 00:00:13,150 and end at 00:00:19,350 it has not done. Similarly for #3. The original spotting applies.
- if you wanted to simply treat segments #1 and #2 as a single TU for the translation but not change the spotting then it has not done as #2 is now empty in the target file. Similarly for #3 and #4.
- if you wanted #5 and #5a to be created with new time-codes so you had two new subtitle entries it has not done. Similarly for #6 and #6a.
- if you wanted to split #5 and #6 for the purpose of translation and create two TUs, but still retain the original spotting then this is successful.
In the first three cases the result may not be what you were after. In the first two cases the result is definitely going to require additional work with a subtitle editing software; in the last result it might require work. Since merging cannot be achieved without requiring work afterwards it is not supported. You can do it, but keep in mind what is likely to happen if you do.
I’d recommend you review the WIKI article I mentioned before as this explains more about this topic as well.