The ins and outs of AutoSuggest

001The AutoSuggest feature in Studio has been around since the launch of Studio 2009 and based on the questions I see from time to time I think it’s a feature that could use a little explanation on what it’s all about.  In simple terms it’s a mechanism for prompting you as you type with suggested target text that is based on the source text of the document you are translating.  So sometimes it might be a translation of some or all of the text in the source segment, and sometimes it might be providing an easy way to replicate the source text into the target.  This is done by you entering a character via the keyboard and then Studio suggests suitable text that can be applied with a single keystroke.  In terms of productivity this is a great feature and given how many other translation tools have copied this in one form or another I think it’s clear it really works too!

AutoSuggest comes from a number of different sources, some out of the box with every version of the product, and some requiring a specific license.  The ability to create resources for AutoSuggest is also controlled by license for some things, but not for all.  When you purchase Studio, any version at all, you have the ability to use the AutoSuggest resources out of the box from three places:

  1. An AutoSuggest Dictionary (*.bpm file)
  2. A termbase (*.sdltb file)
  3. AutoText (*.txt file)

Update Studio 2015: The introduction of this version brings a new feature to AutoSuggest.  It can now optionally use your Translation Memories and Machine Translation for realtime autosuggestions.  There is a great article here from Nora Diaz on what this means in practice.

But if you want to have access to all the capabilities of AutoSuggest without limitation then just any license of Studio is not enough.  This is because there are three more things to be aware of.

  1. AutoSuggest Dictionary creation
  2. Termbase creation
  3. Add SDL OpenExchange custom AutoSuggest Providers

Whether or not you can do any of these things depends on the type of license you own, and we’ll look at licensing at the end of this article.  First I want to consider the four main sources that can feed AutoSuggest and one special feature for creating AutoSuggest Dictionaries (just one of the sources).  When you look at the AutoSuggest options in File -> Options -> AutoSuggest in Studio 2014 you’ll see a list of the AutoSuggest Providers you have access to and you just enable them by ticking the box.  I have six sources which are made up of the three out of the box sources, and then three additional sources I have added via the SDL OpenExchange:

009

So before I get into the License requirements I’m going to take a quick look at what each of these four sources mean… three out of the box and one from the OpenExchange that in reality could be any number of providers (I have three, so we’ll just look at these three currently available on the SDL OpenExchange).

AutoSuggest Providers

AutoSuggest Dictionary

The AutoSuggest Dictionary is created in Studio by extracting phrases from your Translation Memory and placing them in a bilingual dictionary file, based on a SQL database, and saving this file with the extension BPM.  When this was first released with Studio 2009 there was a requirement that the Translation Memory contained 25,000 Translation Units (TUs) to ensure that the statistical relevance of the extracted phrases to the source segment you are translating was high.  In this way the amount of irrelevant matches, or noise, was reduced which of course provides a better experience for the translator.  Over time this restriction has been lowered based on feedback and further testing and today the requirement is for your Translation Memory to have a minimum of 10,000 TUs.

The AutoSuggest Dictionary cannot be edited (unless you risk fiddling and possibly corrupting the BPM file with a SQL editor) so it’s worth regenerating your BPM file from your Translation Memory every now and again.  I’m often asked how often?  This really depends on how much your Translation Memory is growing so you could do this once a day after finishing work, once a week, or even once a month.  There is no hard and fast rule… perhaps some of the experienced translators reading this article can share their experience in the comments!  Maybe someone will create an OpenExchange App that will schedule this for you and automatically regenerate your Dictionaries as you work, or overnight to suit you!

Once you have generated your AutoSuggest Dictionary (which you can only do if you have the appropriate license), and if the material you are translating is relevant to the content of your Translation Memory then as you type you will get this kind of effect:

004

The suggested translation will pop up in front of you (with the green/gold icon) as you type allowing you to ignore it altogether, keep typing to narrow down the list, or just select the appropriate suggestion with the keyboard.  Many translators have reported as much as 30% to 40% productivity enhancements due to less typing as a result of this feature in Studio.  Clearly the amount of gain is going to be dependent on the relevance and type of source material but I think it’s fairly obvious this is advantageous when typing.

I have documented a fairly exhaustive process on how you generate these dictionaries in the past, and also provided information on where you can find additional resources to help with some language pairs based on EU languages.  You can find these posts here which were written for Studio 2011, but I think they are still useful today:

Termbase

The AutoSuggest capability is used to place the translated source of any recognised terms into the target text as you type.  This is particularly powerful because you can build simple glossaries for as many words as you like and this allows you to take advantage of the AutoSuggest capability from the moment you start work.  You just add terms to your termbase as you’re translating or reviewing and they immediately become available for placement via AutoSuggest.

The Termbase suggestion is presented alongside any other suggestions.  In the screenshot below I have included the Term Recognition window on the right so you can see where this is coming from.  The fact this is a termbase AutoSuggestion is also confirmed by the grey database icon to the left of the suggestion at the top of the list.  The other suggestions are from the AutoSuggest Dictionary (with the green/gold icon) and they take second priority since a recognised term is probably what you wish to enter here and you save a keystroke when it is already at the top of the list.

005

Obviously the criteria for creating your own termbase is that you either have access to SDL MultiTerm or one of several OpenExchange applications that can do this for you.  If you have a Freelance license then you will already have SDL MultiTerm and the ability to use OpenExchange Apps, you will also have the ability to add/edit terms to build your glossaries/termbases as you work.  Some license types do not have this as you will see at the end of this post.

If you’re looking for a head start on how to work with termbases or where you can find additional resources then perhaps review some of these posts (oldest first):

AutoText

This is perhaps the hardest resource to find a really good use for (hard for me… not hard for all the nice reasons posted in the comments below!).  It’s monolingual only and won’t work if any other AutoSuggest providers are present and can suggest a match.  It also takes four characters before kicking in so in this example I had to type “comp” before the suggestion was presented:

006

AutoText suggestions are identified by the little disk drive icon (at least I think this is what it is!) shown above.  To be honest I struggle to see the usecase for this quite a bit because the best use I can come up with would be where you had long lists of words that are not translated.  But in this case you’d probably be better off adding them to your variables list in your language resources.  I guess you could add lists of long technical words you think you might have to type a lot, or import a text file from a monolingual dictionary into this list and it would be there just in case you had no other resource that was used first… but I’d welcome suggestions from anyone who finds this a really useful feature.  Just post them into the comments and I can provide a better answer the next time I’m asked too!

If you don’t know where to find this just go to File -> Options -> AutoSuggest -> AutoText where you select you select your target language and then either add the entries one at a time or import them via a text file.  If you want to know what format this should be just ad a few words manually and then export the list and save it on your desktop.  You can open it with a simple text editor and see what it looks like.  While you are working you can add terms via the ribbon, Advanced -> Add AutoText, or with a keyboard shortcut.  I think the default is Alt+F7 but you can customise this anyway.

OpenExchange AutoSuggest Providers

At the time of writing I’m only aware of three AutoSuggest Providers available on the OpenExchange so here they are.

MT AutoSuggest

This is a pretty cool plugin that will offer Machine Translation suggestions for the entire sentence and/or parts of it so they can be placed with the click of a button.  This will use whatever Machine Translation plugin you happen to be using in your Project.  It won’t return suggestions from them all, so you only get one of them, and the speed at which it works is dependent on your internet connection.  However, I have spoken to many Translators who love this way of working with Machine Translation because it doesn’t get in their way.  It is only applied when you want, through a single keyboard shortcut as an AutoSuggestion.

There is no icon visible with this provider so if you use multiple Machine Translation engines in your Project you won’t know which one is providing the suggestion either so this could be improved.  Where the suggestion represents the full translation of the source segment you can see the full text in a slightly greyed window below the AutoSuggest box.

008

But a very useful feature and one seen as the future of Machine Translation for Translators by many interested observers.

Google AutoSuggest

This application works in the same way as the MT AutoSuggest plugin described above, but with one interesting advantage.  You don’t even need to have Google MT enabled, nor do you require an account with Google to get an API key.  Once again there is no icon visible, and where the suggestion represents the full translation of the source segment you can again see the full text in a slightly greyed window below the AutoSuggest box.

007

You can watch a video on this one that was created when the plugin was created as part of the SDL OpenExchange Developers Competition in 2014.

Regex AutoSuggest

This is an excellent plugin that really deserves an article to itself and will probably see one soon as there is so much you can do with it.  The basic idea is that you can create a regular expression to match a pattern of text in the source and then either copy it exactly, or transpose it using another regular expression, into the target segment.  And if that wasn’t enough you can also create lists of variables and group then under a single match pattern for use in matching source text.  Sounds complicated, but it’s not really.

The Regular Expression replacements work by either allowing you to copy the source exactly as written, which is excellent for Alphanumerics which are not recognised in the source by Studio.  Prior to SP2 this occurred a lot, but even with SP2 you may find some that are not picked up.  I discussed this a little in an article a few weeks ago when Studio 2014 SP2 was released… the example looked like this:

008

So those tricky looking expressions just found the same pattern that is presented when typing the start of the alphanumeric into the target segment.  You can also use this to apply a translation.  So in the text I used for the other examples the expression “Article 56”, “Article 39” etc. comes up a lot.  I could save myself some time by looking for this pattern and replacing it with “l’article 56”, “l’article 39” etc.

014

This would generate me something like this where the Regex AutoSuggest result is now evident by the little “R” icon to the left of the suggestion:

013

In this example I also used an AutoSuggest Dictionary and you can see the Regex AutoSuggest takes second place and is at the bottom of my list, but I use the arrow up key to select this in one go which still saves me keystrokes.

The new version of this plugin adds the variable list capability and these work by providing a place to enter all the variables under a named list, in this example I called it “Month”, with their translations like this:

010

Then you create a regular expression to manipulate the source but instead of requiring 12 expressions for each type of date manipulation you’ll ever require, you now only need one, like this for example:

011

So this would allow me to look for a source text such as “01 January 2014” and replace it via AutoSuggest into “01 Ionawr, 2014” for example.  Like this:

012

So by adding the name of the variable list enclosed with hash symbols (#Month#) into the regular expression I am able to transpose all 12 months with a single expression, and I can reuse this variable list in other expressions as the need arises.  Very cool!

Licensing

Finally, let’s take a look at the licensing requirements to be able to use these great features.  Not every version of Studio will give you the possibility to access them all and some can be restricted in terms of how they are used.  There are five different license types available for Studio; Express, Starter, Freelance, Professional and WorkGroup, each with their own capability.  You can find a description of all the differences in KB #4939, but I’m hoping this table below should help to explain what’s possible in terms of AutoSuggest alone.

002

It’s clear that if you have a Workgroup or a Professional License that you get everything here, but the other three could use a better explanation.  So let’s do that by licence type.

Freelance

The only thing a standard Freelance license lacks is the ability to actually create the AutoSuggest Dictionary yourself.  So you can use one if someone provides the *.bpm file(s), or you download one from the SDL OpenExchange website where there are several to be found that have been created and shared by others.  But you can only create your own AutoSuggest Dictionary if you have a special license called the AutoSuggest Creator License.

003

This Add-on is often thrown in as part of a special deal so many users have this license and didn’t even realise it.  But if you purchase the Freelance License without this Add-on then you’ll need this if you want to be able to create your own AutoSuggest Dictionaries.  You can find this license at TranslationZone in the Shop.

Starter and Express

I’ve lumped these two together because the restrictions here are very similar.

Allows creating termbases (add/edit terms): neither of these versions come with SDL MultiTerm.  So if you want to be able to manage your own termbase for use with the AutoSuggest feature then you’ll probably need SDL MultiTerm to create your termbase in the first place

Allows using termbases (read only): if someone provides you with a termbase then you will be able to place recognised terms using the AutoSuggest feature.  But as this is read only you won’t be able to build on this resource unless someone with the appropriate license adds all the terms you would like.

Can access Machine Translation from within the tool: If you wanted to use an AutoSuggest feature that applied Machine Translation then this won’t work if you are using the Starter Edition because you can’t add Machine Translation.  If you receive a Project Package that is set up to use Machine Translation then this is a different kettle of fish; in this scenario you can use it.  But you cannot add Machine Translation to your own Projects, or to a Packaged Project that did not contain the Machine Translation Provider in the first place.

AutoSuggest use: This is referring to the use of an AutoSuggest Dictionary. You cannot add your own *.bpm file to a Project that you have created, or to one created via a Project Package.  But if the *.bpm is provided as part of a Project Package then you can make use of it.

AutoSuggest Create: Neither of these versions can create an AutoSuggest Dictionary and there is no license Add-on for these versions either.

Runs 3rd party plug-in apps from the OpenExchange: The Starter Edition will not allow integrated applications to be used at all, and if you only have a valid license for the Starter Edition in your MySDL account then you won’t be able to download any apps from the OpenExchange at all.
The Express Edition is a little different in this regard.  This license is used by a Company to provide their translators access to Studio so they can work on Project Packages created by them only, or to connect to an instance of their SDL WorldServer, SDL TeamWorks or SDL TMS solution.  So applications created via the SDK/API can be used by the Express Edition.  The exception to this rule are most SDL developed applications made freely available on the OpenExchange as they often have license checks in there that look for Freelance, WorkGroup or Professional only.

35 comments
  1. Birgit Strauss said:

    Thank you, Paul, and I wish you all the best for 2015!

    Like

  2. Glad your featuring the Regex Plugin, which brings a bit of MemoQ to Studio 😉
    As for MT Autosuggest, I somehow like the Beglobal source better, it tends to offer better suggestion in Polish than Google.

    Liked by 1 person

    • Yep… the regex plugin is an excellent tool for providing more flexibility over the transformation of recognised patterns. Interesting observation on the MT AutoSuggest… even the tables in the example I showed seem to contain more phrases from Language Cloud (not BeGlobal) than the Google AutoSuggest plugin. I wonder if it would be the same if you added Google MT to your Project and then use the MT AutoSuggest with a Google MT Provider instead of the Google AutoSuggest plugin?

      Like

      • In my opinion Google MT in the English-Polish direction is so miserable in certain specific topics that it is not worth using. I also think the idea to allow users to contribute translations is backfiring. I have seen more useless translations from Google MT (English to Polish) recently than ever before.
        On the other hand, when I go from Polish to English, Google MT translations are often very helpful.

        Liked by 1 person

      • You wrote “LanguageCloud (not BeGlobal)” Has Beglobal been discontinued? I found this morning that it no longer works, and even the beglobal.com URL does not work in the browser.

        Like

      • Thanks, however I have a problem using the Language Cloud, and I sent you the details in an e-mail message.

        Like

  3. Wow, great blog post (and title) to kick off the new year, Paul!

    How often do I regenerate my Autosuggest Dictionaries?
    When I realise they’re not giving me suggestions that I think should be offered, and before I start a project that’s similar to other recent ones. That’s probably only every couple of months. No doubt it would be more effective if I recreated them more often but I find it hard to remember, and a bore to do. An OpenExchange app to regenerate AS dictionaries would be very good news.

    Do I use AutoText?
    Yes, quite a bit. It makes up for the lack of real-time autosuggestion-building that other tools have. It’s good for hard-to-spell words (e.g. drug names) and for long terms / short phrases (e.g. “evidence-based medicine”) that come up a lot in a particular project.

    Regex AutoSuggest
    I was already using this for proper names (thanks for that idea, Paul). And with the latest version (1.4), I’ve just set up an entry for a condensed date format (DDMMMYYYY) with abbreviated months in the variables list:
    (\d+)(#AbvMonth#)(\d+) Replace with: $1$2$3
    That’s a real time saver for one particular client!

    Cutting out the noise
    I like Studio’s AutoSuggest because you can control it. You can decide the order to call up your resources, the maximum number of suggestions and the minimum length of them. This all cuts out the noise that you get in other tools that don’t have this level of customisation.

    Happy New Year,
    Emma

    Like

    • Thanks Emma, I think that’s sound advice on the regeneration, and thank you for your feedback on auto-text. Is there a point where auto-text is preferable to a termbase for this?

      Like

      • 1- Autotext doesn’t depend on the source for a word/short phrase, so it sometimes offers words that aren’t picked up in a TB.
        2- I use Autotext as a quick-and-dirty list, without much thought. I like to keep TBs cleaner, so I often need to change default TBs to add specific words. Also, I prefer to keep TBs for terminology.
        3- It’s (even) quicker to add a word to an Autotext list because you don’t have to select a source word at all.
        However, the difference in speed between adding to AT list or TB used to be much bigger in pre-SP2 days;)

        Like

  4. I, too, like AutoText. One big advantage is that it is so easy to add words and expressions — takes about five seconds. And extending a termbase is somehow a bigger decision than adding an entry in the AutoText list.

    Like

    • I agree, Mats. Adding to a TB is a “bigger decision”, whereas an AutoText list is like a Post-It.

      Like

      • “… like a Post-it.” I like that analogy Emma!

        Like

  5. Thank you Paul for this post.
    AutoText is really nice, as Emma has explained. It can also serve as a “cheat sheet” for expanding acronyms, although not perfect for that and a text expansion software might be better for that purpose.

    While AutoSuggest is great for what it is, I think SDL is missing on its full potential. First, the fact that the creator is sold separately is confusing. People assume they don’t have it because they don’t remember paying for it, while in practice it is included in most(?) group/special sales that I can recall. If I’m not wrong about it, why not just include it as part of every Freelancer license? Second, the TU limitation means that not everyone will be able to use it right away, hence it is not becoming part of the workflow, classified as some obscure — only for “powerusers” — feature, and ended up underused. It can arguably improve the user experience better than other features that get more attention, why not make it more readily available and prominent in the basic workflow?

    Furthermore, using AutoSuggest only as an autocomplete mechanism is missing on its potential, in my opinion. It would be nice if it was turned into a fuller contextual reference repository, adding the abilities to index various bilingual and monolingual documents that will be parsed for AutoSuggest suggestions, as well as included in concordance search. For example, indexing entire dictionaries, or what interest me personally the most, indexing entire projects. A typical use case for the latter would be indexing into the relevant contextual AS repository all small and/or possibly one-off projects for which one doesn’t want to create a dedicated TM, but at the same time wants to be able to use them for concordance searched and Auto-suggestions. This could be further expanded to create more complex bodies of reference available from within the Editor environment.

    Just a thought.

    Happy New Year!

    Liked by 1 person

    • Thanks Shai.

      The expanding acronymns idea is a good one, and I guess this could be a pretty simple thing for a developer to provide. Regex Match AutoSuggest demonstrates the possible quite a bit and I reckon the ability to import lists of anronyms with their associated expanded text would be quite simple. Maybe we’ll see something like this. Is there a source for these types of things already so if it is developed the import feature could be sure to use existing lists?

      On the licensing… I think has been discussed in the past but not sure where it got to; maybe worth bringing it up again.

      AutoSuggest is an Autocomplete I think. Even if the input was monolingual/bilingual/projects, after the indexing it would still be an AutoComplete as you need to start typing in the first place. Having an additional search/lookup window for that type of source is a good idea (sounds familiar too ;-)) and AutoSuggest could be used to access it, but it would still be an AutoComplete. Aren’t most projects in your TM anyway?

      Final thing on AutoText. Would you find it better if the number of characters you typed were less, or would this introduce too much noise?

      Like

      • The first thing that comes to mind as an acronym lists is the ones used for medical transcription. I think I have something like that. I’ll look it up.

        Reducing the number of characters for displaying an AutoText entry could indeed introduce noise, and therefore needs experimenting with. I suspect it is very circumstance specific, so perhaps offering the ability to increase/decrease the number of characters would be better than a one-size-fits-all setting. Leaving the safe default of 4 unchanged is also I good idea, I think, because it is quite a balanced setting for general use.

        As an autocompleter AutoSuggest is very good. However, I think that its functionality could be expended into more robust reference system.
        While one can use TMs and termbases to do most of what I suggested, I think it creates unnecessary clutter. In my opinion TMs and termbases should be used to store project specific information (i.e. knowledge), and not as a catch-all reference bin
        .
        For example, while I maintain Client-Project specific TMs, I wouldn’t mind having a larger point of reference in which I can archive my entire body of work in a certain field or sub-field. This is also useful for storing those odd little projects that don’t necessarily deserve their own TM (that even if created might not be used again very often as a reference TM in a project), but should not go to waste either.
        You can also use it to archive things like country and city names,names of medications, long chemical formulas, and so forth, that while can be stored in a reference TM or termbase, it is not the ideal way to add them as a reference in my opinion.

        And on a side note, the new alphanumeric tokens in Studio 2014 SP2 coupled with the Regex AutoSuggest provider are a powerful tandem for automating the transfer/adaptation of specific types of information. A very robust and reliable way to enhance the autocompletion functionality even without an AutoSuggest dictionary.

        Like

      • Also worth looking at AutoHotKey as this is the perfect solution for text expansion in any application. Very simple to create lists of acronyms and their associated full text.

        Like

      • Oli Christ said:

        Creating per-customer/per-domain AutoSuggest dictionaries is a matter of workflow: once a project is finished, you can create a TM from the final bilingual files which you can then import into a customer/domain/any other TM (or import the bilingual files directly). That TM can then be used for a customer/domain specific AutoSuggest dictionary.

        That worked at least last time I looked at it 😉

        While adding workflow overheads, one benefit of the approach is that these “master TMs” (and derived resources, such as AutoSuggest dictionaries) are always clean and don’t contain noise TUs. Not a bad thing to have in any case, if you can’t keep the bilingual files around. Automating the workflow should be simple using Studio’s APIs.

        But I agree, having more “structure” in the dictionaries would be beneficial so that you could control the suggestions indirectly through e.g. project settings instead of having to switch between dictionaries manually.

        Liked by 1 person

      • Thanks Oli… great to hear from you, and good advice as usual on using the bilingual files. One question here though… will this method provide superior autosuggest dictionaries compared to maintaining domain specific TMs in the first place?

        Like

      • Oli Christ said:

        That’s hard to say, Paul – it depends on other aspects of the workflow. It may not make a big difference as far as the AutoSuggest dictionaries are concerned – rare terms (such as misspelled translations which may be contained in noisy TMs) probably won’t pass the frequency thresholds during dictionary extraction, so their impact on the dictionary is probably low. OTOH, you want to exclude any misspellings, and creating derived resources from the bilingual files ensures lowest possible noise and maximum quality, particularly if you use small TMs.

        TMs built from the final bilingual files can also be used for other purposes – for example, if you finish a project, create a TM from the bilingual files, and then batch-translate the source files in the project with that TM, a “multiple translation penalty” may indicate an inconsistent translation (or be intentional – to be determined). This can also be used for QA purposes by the project manager.

        Like

  6. Hi Paul,

    Thanks for the post! I understand now the Regex AutoSuggest plugin!

    Just to let you know that both links provided on the “AutoSuggest Dictionary” section are duplicated. I believe the first one is actually https://multifarious.filkin.com/2012/08/10/making-the-most-of-your-resources/.

    I don’t use AutoText, as I prefer AutHotkey because it works on all applications and even if I move between PCs.

    Regards,

    … Jesús Prieto …

    Like

  7. Christine Bruckner said:

    Hi Paul,

    just to add a few more use cases for AutoText: It is really helpful for long, recurring target language expressions that are not really terms.
    – Gender-specific language in German can be much longer than in the source language. Example: Spanish “campesin@s” needs to be translated into German as “Bäuerinnen und Bauern”.
    – Certain source language expressions that require more explicit translation or re-phrasing in a specific text. Example: “la región” in a Latin American text which am currently translating actually means/needs to be translated as “Lateinamerika und karibischer Raum”.
    So AutoText can be more than a quick and dirty list or post-it because it not only saves typing time but ensures consistency 🙂

    Regarding AutoSuggest dictionaries:
    For political and legal translations, I regularly convert the European Commission’s (DGT) TMs into AutoSuggest dictionaries. While the TMs are usually too large for concordance search/interactive lookup, AutoSuggestions come up very quickly.
    There are a AutoSuggest dictionaries in a few language combinations available as AutoSuggest dictionaries on OpenExchange, but they seem to be a bit outdated (2011 edition). But creating new ones from the resources and instructions available under https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory is quick easy (although it takes some time for your PC to do so).
    Most statistical MT engines are trained on the DGT TMs anyway, but still I prefer an AutoSuggestion from the DGT dictionary to a MyMemory MT (AutoSuggest) proposal.

    Kind regards

    Christine

    Like

    • Christine Bruckner said:

      One more remark regarding the DGT TMs as AutoSuggest dictionaries: For some reason, it is not possible to generate an AutoSuggest dictionaries directly from the TMX file that is generated by the TMXtract.jar application. You therefore need to import the TMX file into a Studio TM and generate the AutoSuggest dictionary from the TM. Maybe SDL could enhance the AutoSuggest creator to directly read the DGT TMX file, it would save quite a lot of processing time…

      Christine

      Like

      • Hi Christine, I actually think the process of upgrading to a Studio TM first is a good one as it will clean up a lot of the crap that’s in the DGT TMX and that Studio doesn’t really need to be outputting as suggestions. However, I think it must depend on what language as I have managed this several times without upgrading to Studio first when I’ve been testing. What language pairs are you having problems with?

        Like

      • Christine Bruckner said:

        Hi Paul,
        I cannot create the AutoSuggest dictionaries directly from the DGT TMX files for the following language pairs:
        – enGB-deDE
        – esES-deDE

        I have tried with the full TMX from the 2013 and the 2014 DGT files. Maybe the files are just too large…

        You are right that the Studio TM import might clean up some waste from the DGT TMs – but one the other hand, it merges repetitive (identical) translation units into one, and thus distorts the frequency information which AutoSuggest probably considers when ranking its suggestions? I am just guessing, but most algorithms in information retrieval usually rely on frequency information.

        Kind regards
        Christine

        Like

      • Hi Christine, I think I must have been mistaken. I’m sure I did this before but I can’t now! I can only achieve this as you have. Maybe I’m confusing the import as this also failed until we went to 2011 SP2 I think. So looks like the workaround is to upgrade to Studio first and deal with the additional processing time. I’m not sure it’s worth development time to enhance Studio for this but perhaps something around the OpenExchange would make more sense.

        Like

      • I can’t create either the AutoSuggest dictionnaries from DGT TMXs in these 2 combinations using Studio 2014 SP2:
        – enGB-esES
        – deDE-esES

        … Jesús Prieto …

        Like

      • The creation of the AutoSuggest Dictionary needs to be from a Studio TM (TMX or SDLTM) so an upgrade of the DGT format to Studio is required first.

        Liked by 1 person

    • Wow… I’m well and truly put in my place with regards to the use of AutoText! I take a it all back! Thank you for the very explanatory examples. Good info on the DGT Dictionary versus something like MyMemory MT… something I had never thought about at all.

      Like

  8. Hannu Jaatinen said:

    If I activate Language Cloud just for MT AutoSuggest, will my translations get uploaded to Language Cloud or is it just oneway traffic to my computer in the form of AutoSuggestions?

    Like

    • One way traffic. We have no idea if your translations are good or not 😉 I’m sure they are though!

      Like

  9. Miki Ito said:

    Hi, Paul. Thank you for re-directing me here in response to my thread in SDL Community.
    Please bear with me another (basic) question..

    It’s still related to TRADOS recognising repetitive words in documents.
    I had a project package from an agency and I did translate almost a whole lot by myself as it didn’t show me any
    words repeated numerous times in a drop-down list. The agency pointed out there were some misspelling errors
    on the words that had already been stored.

    I understand I have to manually store words in Termbase but is it still the case for those came in a project package?
    I have some doubts that I didn’t install TRADOS properly or some files are in wrong paths..??

    Like

    • If the package didn’t contain a Translation Memory, AutoSuggest Dictionary or Termbase then you won’t get anything from the project package by itself. It’s up to whoever created the package to provide these things… if they wish. It’s not mandatory.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: