Language Cloud… word-counts… best practice?

001Best practice!  This is a phrase I’ve had a love/hate relationship with over the course of my entire career… or maybe it’s just a love to hate!  The phrase is something that should perhaps be called “Best Suggestions” and not “Best Practice” because all too often I think it’s used to describe the way someone wants you to work as opposed to anything that represents the views of a majority of users over a long period of time, or anything that takes account the way different people want to work.  In fact with new technology how can it be “Best Practice” when it hasn’t been around long enough in the first place?  I think for a clearly defined and well established process then “Best Practice” has it’s place… but otherwise it’s often the easy answer to a more complex problem, or just a problem that is considered too hard to address.
So what has all this got to do with SDL Language Cloud?  In case you didn’t know it the first release of SDL Language Cloud in the last month or so has introduced some very nice capabilities that replace the SDL BeGlobal Community machine translation feature for SDL Trados Studio 2014.  These features, in a nutshell are:

  1. Free Machine Translation with 96 language pairs for up to 600,000 characters a month
  2. Paid Machine Translation as above but with some additional options
    1. Use simple glossaries in the form of TBX files to ensure they are translated the way you wish
    2. Extend your ability to have more than 600,000 characters a month
    3. Use a trained Machine Translation engine for reduced post-editing effort with limited language pairs and currently only from English into something else

The features available through SDL Language Cloud will surely increase as time goes by and the products that can take advantage of this mature further, and the tools that third party developers start to create using the API for Language Cloud bring more solutions and possibilities to the table.
Now we’ve got that out of the way what’s “Best Practice” got to do with all this?  Well the simple introduction of these new machine translation features have added a little complexity around the way some users work.  But before we get into that, lets note a couple of KB articles that begin to put this into context: and we’ll talk Studio 2011 as well as Studio 2014 here.
KB #5287 |  Machine translation in SDL Trados Studio 2014 SP1 CU6 and higher
KB #4966 |  Machine translation in SDL Trados Studio 2014 SP1 CU5 and lower
The relevant parts here are related to word counting.  KB #5287 refers to Language Cloud and the free allowance of 600,000 characters a month.  KB #4966 refers to BeGlobal Community and the free allowance of 4,000 words per day.  Why is this important?  It’s important because the take up of machine translation by translators using Studio has been so significant that once a change that effects the way it’s delivered is implemented we hear about this very quickly because so many translators are reliant upon it as part of a productivity gain as they work.  So whilst we sometimes hear so many negative comments from some users about machine translation, I think the reality is already very different.  It’s not for everyone and it may never be… but it is for a significant number.
The recent change to Language Cloud and the implementation of these wordcount limits has caught many users by surprise, and for a number of reasons that are related to their workflow.  So this is where “best practice” comes into it.  Not because I’m going to share “best practice”, but because many users have asked me how can they work without exceeding the limits because they are not abusing the free machine translation available through BeGlobal Community or Language Cloud by pushing tens of thousands or words a day or more through the facility.  So I’ve been asked “what’s the best practice?”.
I don’t like to give out advice without thoroughly checking it out myself and discussing the workflows some users are applying.  So first of all here’s the problem:

  • translating in Studio 2011, not exceeding the daily allowance, and yet BeGlobal stops working before the day is out – error message is “Error 2004: The user has reached the maximum number of words for this day”

How can this be?  The reason is most likely one, or all of these:

  • Attempting to pre-translate a whole file, or a complete project, with Machine Translation so you have something translated in your target segments already.
    • once the count reaches 4,000 words you get the error and Machine Translation is no longer offered.
    • what’s worse is that you don’t even get the first 4,000 words in your segments so you get no benefit at all if you work this way.
  • Working interactively with Machine Translation and regularly moving back and forth between segments as you work.
    • if you have Machine Translation enabled then every segment you enter will be sent to the server, translated and counted.
    • This is the case whether you have done this before or not, and also irrespective of whether there is an approved translation in there already or not.

So it’s easy to see how you could exceed the daily limit like this when working with Studio 2011 and the BeGlobal Community Machine Translation provider.  But what about Language Cloud and Studio 2014?  Well this is different for a couple of reasons.  First of all it’s a monthly limit and not a daily one which means pre-translating a project at the start is less likely to hit the limit, and working interactively won’t reach the limit, at least not for the first few weeks anyway.  Theoretically if you pre-translate a lot of projects because you are preparing work for others then you are also likely to reach the limit here too.  Secondly Studio 2014 has an additional control on how Machine Translation is used.
This option, which is unchecked by default, means that characters are not being counted where you already have a match from your Translation Memory.  So the problems that you find in Studio 2011 should not occur here.  This is good, and I think the reason we have not seen as many Studio 2014 users being affected by the new controls over word-count is because of this option, and also because the monthly limit means pre-translating a single project and then working on it for a week is less likely to take you over the limit.  This is much more appropriate for a translator using Machine Translation to get some additional help as they work.
But let’s come back to the advice I wanted to provide on how to work with Machine Translation as a Translator, so you can take the benefits you need from the free allowance.  As I was thinking about this it became clear that a couple of things would make life much easier and allow me to make a “best suggestion”.

SDL BeGlobal Community Plus

The first thing was to find a way to prevent the word-count from racking up when working.  So our new OpenExchange Evangelist Developer Romulus Crisan spent a little time in his first couple of weeks at SDL tackling this problem, and he came up with a solution by adding some functionality to the existing BeGlobal Community Plugin, and this has been made available free of charge as an OpenExchange application called SDL BeGlobal Community Plus.
This works by adding a simple checkbox like this to the original provider:
It’s checked by default, and this means that if there is anything at all in the segment already then BeGlobal is not activated and you will not get a Machine Translation result.  But if your segment has nothing in it then you will.  So a simple, but effective control for Studio 2011 users to help them with managing the daily allowance of 4,000 words.  So my “best suggestion” would first of all be to only work interactively and don’t pre-translate with Machine Translation.

  1. Open the file for translation
  2. If there is no text in the target then you will get Machine Translation alongside your Translation Memory result
  3. If you want a Machine Translation anyway then just clear the segment, or maybe cut it to your clipboard for easy replacement if it didn’t come from your TM

If you really want to pre-translate, perhaps you are working offline or you have a poor internet connection that takes too long to return the results, then perhaps adopt this workflow:

  1. Pre-translate with your Translation Memories first and disable BeGlobal Community Plus
  2. Enable BeGlobal Community Plus and pre-translate again
  3. If you have so many missing translations that you are worried you will still overdo it then open the file first and decide how many segments you will get to today.  Then copy source to target for the rest and then pre-translate so you don’t add to the count for these segments you won’t get to.

A bit fiddly perhaps, but workable and hopefully you won’t see the “Error 2004” again.
In case you’re wondering what an Evangelist Developer is Romulus explains this very nicely in his blog here and if any of you are OpenExchange developers you’ll probably have seen him busily answering questions in our developer community and helping developers working on their own ideas.

Controlled MT

The second thought was to provide a way to turn off the MT at will, without having to go through the Project settings every time you need to do this.  So we created a plugin for Studio 2014 that could do just this.  The idea is that you add your MT providers through a new provider called “Controlled MT”:
Once they are added you will see something like this in the settings and then in the Translation Results window as you work in the Studio Editor:
You can add as many MT providers as you like “inside” the Controlled MT provider and these can all be enabled or disabled with a single click or keyboard shortcut from the Studio Editor:
This provides you with much better control over whether you wish to use MT as you are working or not, and as a consequence have far better control over the wordcount as you work.  In the case of free facilities this means you are less likely to use up your free allowance without realising it, and if you are paying for your MT then it should help to manage the costs.
Now, this particular application has not been made available on the OpenExchange yet, but it is the first application we have made available as code as part of the new initiative we have set-up to encourage developers to be more open with the work they are doing and share some of the ideas and code snippets they have created to provide useful functionality in Studio.  So if you are a developer and are interested to see how this was achieved then take a look at this Github site created by Romulus.  You can expect to see a lot more in here as we grow this community, and we’ll be sharing more some of the initiatives we are working on over the next few months… so if you’re a developer stay tuned!  In the meantime perhaps some of you MT provider developers might like to take this idea and use the example to make your own MT plugins work in this way?
If you’re not a developer then don’t worry, we will make this application available on the OpenExchange as a downloadable plugin too.  I’ll update this post when it becomes available.

3 thoughts on “Language Cloud… word-counts… best practice?

  1. Hi Paul,
    thanks, these are really interesting and helpful “best practices” and technical hints!
    Does the “Controlled MT” plugin also allow you to prevent the use of certain/all MT engines in your project and the packages you sent to other translators?
    This would indeed be a relief for corporate and LSP users who are worried that their freelancers sent their (confidential) texts to random MT providers…
    Kind regards

    1. Hi Christine,
      No it doesn’t, but that’s an interesting question. I think it would be possible to do this, but only at Project/package level. So the reality is that this would only be an artificial restriction in any case. I think attempting to restrict the use of MT these days is futile with all the available workarounds within the tools and outside the tools. Even online systems or encryption of the data files for use only in that project cannot prevent their use by an enterprising user.
      The only really secure way to ensure MT is not used is to have the translations done onsite in a controlled environment.

Leave a Reply