When close enough… isn’t!

Cartoon-style illustration of a man with a two-tone beard shouting "Context matters, you fuzzy tin can!" while blasting terminology like "Cloud computing" at a confused robot labeled "89%."The RWS Community has always been a great place for me to learn more about the Trados products, especially with regard to how its use is perceived by others, and how the workflows the product was designed for can often be very different to reality.  More recently, since my own role at RWS has changed, I’m somewhat removed from these realities and I also tend to pay less attention to the release notes for the desktop tools than I used to.  But now I’m seeing that the community helps here as well and even triggers renewed interest in parts of the product I’d stopped paying a lot of attention to.

A recent post from Yulia Kovalskaya about fuzzy matches in terminology got me thinking about a topic I’d never thought much about before… fuzzy matching of terms coming from your termbase.  In fact I hadn’t even noticed that the introduction of Trados Studio 2024 introduced improvements in terminology integration between Trados Studio and MultiTerm that are all part of a bigger effort to be able to pave the way for Studio’s integration with third-party terminology solutions… something I’ve long wished to see so that it could become possible for 3rd party developers to improve the terminology solutions available with Trados Studio, and with a deeper integration that could use many of the features that were only available to MultiTerm.  So great to read these four key improvements:

  • The Term Recognition window now displays the fuzzy match score next to each term, providing a clearer indication of the term’s relevance.
  • Terminology Verification has been enhanced to accurately identify and flag duplicate terms within the same segment.
  • Improved handling of source terms in non-Latin languages.
  • Cleaner integration for third-party terminology app providers, allowing for more seamless integration with Trados Studio.

The main one of interest here, mainly because Yulia referred to it and I wasn’t aware of it at all, was the Term Recognition window displaying a fuzzy match score for the terms found in the source segment.  I did reply to the question, but decided it might make a good article to explore a little more as we often talk about translation memory matches, but not terminology… or at least I haven’t!

MultiTerm Matching

The exact mechanism behind how MultiTerm matches terms is a secret, that even if I knew, I’d be shot for sharing!  A cartoon-style image of a man with a two-tone beard dodging a laser blast from an angry robot, while a sheep with a clipboard runs away in the background.So we’ll keep this based around the question that Yulia asked, but expand a little in simple terms to clarify the difference between translation memory matching and terminology matching.

Translation Memory searching breaks down into two (ignoring the more complex areas of fragment matching) and these would be a normal TM search, and a concordance search.  In some ways a concordance search could be seen as being similar to a terminology search (MultiTerm) but they are not the same and the match values you now see in Trados Studio 2024 could well raise questions similar to the one Yulia asked because of these differences.

TM Segment Matching (Normal TM Search)

This is the standard matching process Trados Studio uses when you move to a new segment during translation.  It compares the entire source segment you’re working on against all stored segments in your translation memory.  The match score is calculated using algorithms like Levenshtein distance, which measures how many edits (insertions, deletions, substitutions) are needed to turn your segment into one that already exists in the TM.

The result is a fuzzy match score (e.g. 100%, 85%, 70%) that reflects how similar the whole sentence is to previous translations… not just a part of it.  It’s designed for full-sentence reuse, and is less sensitive to the meaning and more focused on structure and word-level similarity.

Example:
A new source segment like “The neural network was trained last year” might return an 80% match against “The neural network was trained in 2020,” because there are only small differences as shown below:

Screenshot of Trados Studio displaying translation memory (TM) matches. Segment #1 in the editor reads: “The neural network was trained last year” and receives an 80% TM match from the entry “The neural network was trained in 2020”, with the Romanian translation “Rețeaua neuronală a fost antrenată în 2020,”. The match is shown in the Translation Results pane.

TM Concordance Search Matching

This works a bit like a TM segment comparison.  Most likely using algorithms similar to those used in full TM matches such as Levenshtein distance (character-level or token-level edit distance).

The fuzzy score tells you how close your selected phrase is to phrases in stored TM segments.  When you select a phrase and run a concordance search, Trados compares that phrase to parts of segments stored in the TM.  It doesn’t penalise for extra words before or after the matched portion because the match is not against the full segment, but rather how closely your selected phrase aligns with a portion of the TM segment.

Example:
Selecting “training the neural network” and running a concordance search on it might yield 75% against “neural network training was completed yesterday”, because the words match, but they are in a different order and missing the word “the“, so the alignment is imperfect as shown below:

Screenshot of a CAT tool showing a concordance search for the phrase “training the neural”. In segment #2 of the main translation window, the phrase “training the neural network” is highlighted. The Concordance Search pane above displays a 75% match against the entry “neural network training was completed yesterday”, whose Romanian translation is “antrenarea rețelei neuronale a fost finalizată ieri”.

MultiTerm Fuzzy Matching

MultiTerm uses a form of fuzzy matching to identify terms in your source segment, even when the match isn’t exact.  No single technique is the standard for this, but a hybrid approach is commonplace.  For example, n-gram matching for scalability and flexibility, layered with edit distance for precision, and token heuristics for real-world robustness, all optimized via indexing.  The goal being to determine how closely a given term resembles a portion of the source text, based on several linguistic and structural factors.

The fuzzy match score reflects the degree of similarity between the stored term and its possible appearance in the segment.  This similarity is typically influenced by:

  • Token presence: Are the key words in the term actually present in the segment?

  • Token order: Are the words in the same sequence as in the term?

  • Token proximity: Are the words close together, or scattered across the sentence?

  • Intervening content: Do additional words or modifiers interrupt the term’s structure?

  • Orthographic variation: Are there minor spelling differences, pluralisation, or inflections?

It’s important to note that this process is form-based, not meaning-based.  It doesn’t consider synonyms, paraphrasing, or surrounding sentence context.  It focuses purely on how similar the source text is to the stored term in terms of surface form.

Example:
If your termbase contains “language model”, and your source segment says:
“our model is trained on multiple language datasets” then MultiTerm might suggest a match with a lower score… 61% if you set the minimum threshold low enough as shown below:.

Screenshot of a CAT tool showing the third segment: “our model is trained on multiple language datasets”. The term “language model” is partially recognised with a 61% match, shown in the Term Recognition pane. The suggested Romanian equivalent is “model lingvistic”. The partial match is based on the words “language” and “model” appearing separately in the segment, but not as a fixed phrase.This is the only example of the new feature in 2024 showing the fuzzy match value in the term recognition window that I’ll show in this article.  You can see the “61” between the source term (language model) and the termbase name (tb).  So if you want some more I’d encourage you to visit the RWS Community and review the thread with Yulia as you’ll find a good number of examples to illustrate the point.

Comparison

I’ve never looked at how these different features carry out their matching in this blog in any real detail before, and it is interesting, and when I review this text I’m wondering what the real difference is between concordance matching and terminology matching since both of them aim to identify partial matches within larger segments.  I think it’s less about the algorithms, although there are differences, and more about the underlying goals which affect how similarity is interpreted.

  • MultiTerm is about enforcing terminological consistency, and concordance is about offering contextual guidance from translation history.
  • MultiTerm asks “Does this sentence contain my predefined term?” whilst concordance asks “Where else have I used something like this phrase?”
  • MultiTerm expects a tight match to the term’s surface form whilst a concordance search allows for looser alignment… the goal being to show potentially useful matches, even if imperfect.
  • In MultiTerm, a fuzzy score is a signal of confidence that the exact term is being referenced whilst in concordance, a fuzzy score is more about resemblance, not correctness… it’s a hint, not a validation.

In essence, while both concordance and terminology searches leverage fuzzy matching to identify partial matches, their purposes diverge.  MultiTerm drives terminological precision with a focus on exact term validation, while concordance offers a broader, context-rich exploration of translation history.  This distinction shapes their fuzzy score interpretations… MultiTerm’s as a marker of correctness, concordance’s as a suggestion of relevance.

A good feature?

Ultimately, the introduction of this new feature in Trados Studio 2024 to display the fuzzy match scores shine a light on a subtle truth: similarity isn’t sameness.  MultiTerm demands precision to ensure the terminology is controlled, while concordance thrives on loose resemblance to help shape the translation and provide some consistency… yet both remind us that in translation, ‘close enough’ isn’t always enough.  When consistency or context hangs in the balance, ‘When close enough… isn’t!’, captures the challenge of knowing when a match really does fit the bill.

So when I think about the concept of a fuzzy match for terminology lookup it almost seems insane without the ability to see whether your match is 100% or not!

What about AI?

I can’t leave this article without at least mentioning AI a little!!  The introduction of AI into translation workflows will undoubtedly influence the need for managed terminology, but will it eliminate it?  Or will it just reshape its role and how it’s implemented?

Managed terminology, as we do with MultiTerm, is all about ensuring consistency, accuracy, and compliance… especially in industries like legal, medical, or technical translation.  That kind of control is something AI doesn’t always prioritise.  Large Language Models (LLMs) and neural machine translation (NMT) are trained on massive amounts of data and can often figure out the right term from context.  So if you feed them “neural network,” they might correctly return “réseau neuronal” in French even without a termbase.  That sounds great in theory, and in some cases it probably works, but it’s also where problems can start.  The same model might decide to use “réseau nerveux” instead, which might be technically valid in a different context… but just not the one you intended.  So while AI might reduce the need for manually curated termbases in more general content, it still can’t guarantee the precision you get from managed terminology.

In fact, I think AI might actually help strengthen the case for termbases, not replace them.  For example, I can imagine tools evolving (if there aren’t any around already) that use AI to scan your content… manuals, websites, old TMs… and automatically suggest key terms for your termbase.  You still need a human to approve them, but that kind of automation could make building and maintaining a termbase a lot easier.  This fits nicely with the work being done to open up terminology integration in Studio 2024 where allowing third-party providers to plug into the term recognition workflow with a deeper integration supports a real opportunity to bring in more dynamic, AI-driven term suggestions alongside our existing termbases.

That said, we shouldn’t forget where this matters most.  If you’re translating for a pharmaceutical company, “acetylsalicylic acid” has to stay just that.  You can’t have AI deciding “aspirin” is close enough.  Or “liable” vs. “responsible” in a legal context where it might seem like a small thing, but it can completely change the meaning.  AI might be helpful, but it still needs the discipline and rules provided by a managed termbase.  Post-editing in these situations is still going to rely heavily on those resources.

One of the interesting things AI does bring to the table is the ability to understand context really well.  So instead of relying on fuzzy match rules like token order or proximity, it can just know that “language model training” and “training the language model” mean the same thing.  Of course this can be helpful, but doesn’t always provide consistency.  A termbase might require “cloud computing,” but AI might generate “cloud-based computation” instead because it flows better.  That tension between AI’s flexibility and the non-negotiable nature of managed terminology is probably going to become more of a thing for translators to manage.

I think we’re already seeing AI blurring the lines between concordance, TM matches, and terminology.  The role of termbases not disappearing, but perhaps shifting so they’re less of a front-and-centre tool, and more of a background calibrator, keeping AI suggestions in check.  In the near to far future… answers on a postcard!

Bottom Line

AI won’t erase the need for managed terminology… it’ll probably transform it.  While AI can infer terms dynamically and reduce reliance on static termbases, it’s probably not ready yet to fully replace the precision and control that tools like MultiTerm provide, especially in domains where terminology is non-negotiable.  Instead, AI will most likely augment terminology management, automating term extraction and enhancing integration, shifting termbases from rigid enforcers to adaptive calibrators.  As AI pushes translations toward ‘good enough,’ managed terminology will remain the guardrail ensuring that, as the saying goes, ‘When close enough… isn’t!’… precision still prevails where it counts.

… and the winner is?

Every now and then I see an application and I think… this one is going to be a game changer for Studio users.  There have been a few, but the top two for me have been the “SDLXLIFF to Legacy Converter” which really helped users working with mixed workflows between the old Trados tools and the new Studio 2009, and the “Glossary Converter” which has totally changed the way translators view working with terminology and in my opinion has also been responsible for some of the improvements we see in the Studio/MultiTerm products today.  There are many more, and AnyTM is a contender, but if I were to only pick my top three where I instantly thought WOW!, then the first two would feature.  So what about the third?  You could say I have the benefit of hindsight with the first two although I’m not joking about my reaction when I first saw them, but the third is brand new and I’m already predicting success!

Continue reading “… and the winner is?”

square pegs in round holes…

An image showing the letters XDT which represents the MultiTerm definition file.It’s all about the termbase definition when you want to merge termbases, or import data into MultiTerm termbases.  The XDT… otherwise known as the MultiTerm Termbase Definition file is the key to being able to ensure you are not trying to knock square pegs into round holes!  I’ve written in the past about the flexibility of MultiTerm and it’s this flexibility that can make it tricky for new users when they try to merge their collections of termbases together, or add to their data by importing a file from a colleague.

So what do we mean by definition?  Let’s think about keys as I think this isAn image showing a number of different keys. quite a good analogy… the four keys in the image on the right will all open a lock, but they won’t all open the same lock.  If you want one of these keys to open another lock then you need to change its shape, or it’s “definition”, to be able to open the lock.  A termbase definition works in a similar way because MultiTerm is flexible enough to support you creating your own lock.  That lock might be the same as someone else’s, but theirs could also have a different number of pins and tumblers which means your key won’t fit.

Continue reading “square pegs in round holes…”

Feature rich… it’s overflowing!

01I first wrote about the Glossary Converter on September 17, 2012… over three years ago.  Not only is it a surprisingly long time ago, but I still meet people at every conference I attend who have never heard of this marvelous little tool, and in some cases never heard of the OpenExchange either.  So when I toyed with the idea of writing an article about Xmas coming early and talking about the OpenExchange and all the goodies inside, part of me couldn’t resist writing about this tool again.  In the three years since it was first released it’s morphed beyond all recognition and today it’s awash with features that belie it’s appearance.

I like to take a little credit for the emergence of this tool because back in 2012 I asked around trying to get someone to create one so that it was straightforward for anyone to create a MultiTerm Glossary from a simple two column spreadsheet… the sort of glossary that most translators use for their day to day needs.  I was over the moon when Gerhard (the developer) was interested and created the tool I wrote about back then.  But I can take no credit whatsoever for what the tool has become today and it’s well worth revisiting!

Continue reading “Feature rich… it’s overflowing!”

ATA56 – SDL Trados Studio Advanced

01I ran a beginners and an advanced workshop at the ATA56 pre-conference day in Miami this year.  A really fun day for me as we start the day with no specific agenda or pre-defined course and then try to shape the session to suit the needs of the attendees.  The beginner tends to be a little more prescribed, to start off with at least, and the intention is to try and cover the basics of how Studio and MultiTerm work.

The advanced is a lot different… after all, what is advanced?

Continue reading “ATA56 – SDL Trados Studio Advanced”

It’s not all head in the clouds!!

01

When the developer of the Word Cloud plugin for SDL Trados Studio first showed me the application he developed I was pretty impressed… mainly because it just looked so cool, but also because I could think of a couple of useful applications for it.

  1. You could see at a glance what the content of the project was and how interesting it might be for you
  2. It looks cool… or did I say that already?

Continue reading “It’s not all head in the clouds!!”

Keep Calm and use your Project Templates…

01I think I’ve discussed Project Templates in the past, although perhaps only in passing.  So let’s start off by painting a picture of the situation you find yourself in where templates come in handy.  You maintain your own Translation Memories, in fact you have five you regularly use for every project but keep them separate because they are based on different sublanguages and you have some clients who adhere strictly to the minor linguistic differences.  You have a couple of termbases that you also like to add to every project and you find it easier to manage the terminology for your clients in separate termbases rather than use custom fields that complicate the ability to import/export with your colleagues.  You also have very specific quality assurance rules that you’ve honed over many years of translating and you know these are reliable and help you when you work.

So that’s a nice straighforward scenario that is probably followed by many of your colleagues… but then a new Project Manager with an agency you regularly work with starts to send you Project Packages for the first time, and another direct Corporate client of yours purchased GroupShare and you started to receive links to online projects.  The Project Managers in question are not as experienced as you and they create their projects with default settings and their own less relevant resources, and they send them out to you.  No problem you say, and you just add your own Translation Memories one at a time, your termbases one at a time, and you import your own quality assurance rules. This is all fine as Studio lets you take advantage of your own resources and your client is quite happy because you’re still turning in quality translations as you always have. But then you have to do this again… and again… and again… and it all starts to get a little tiresome.  Surely there’s a way to add more resources at a time and apply them to an existing project? Continue reading “Keep Calm and use your Project Templates…”

What a whopper!

01I love this cartoon with the husband and wife fishing on a calm weekend off.
“Honey, I got a big one on!”
She’s hooked a whopper and he casually responds in the way he always does when she occasionally catches a fish on Sunday morning.
“Yes dear, uh huh…”
The equipment they’ve got, from the boat to the fishing rods, is all perfectly suitable for their usual weekend activities but hopelessly inadequate for handling something like this!  Little do they know that the whopper under the surface is going to give them a little more trouble when they try to bring him on board!
Continue reading “What a whopper!”

Multitudinous terminology!

01Why is MultiTerm a separate program, I can do exactly the same thing with another CAT tool?  This is a fairly common question, and it has a very good answer too.  It’s because MultiTerm is multitudinous!  That is, it can be extended by you to provide a variety of termbases, so many in fact that you could probably create a structure to match anything you liked and you won’t be shoe horned into a fixed structure.  As I thought about this the Penrose steps came into my mind.  They don’t necessarily have anything to do with terminology solutions for translators, but these steps don’t behave in a known manner either and my mind enjoyed the nonsensical link!  I also liked this word multitudinous; partly because of the obvious use of the prefix multi- but also because the use of a word like this suggests complexity to me, and in many ways this is what users think when the subject of MultiTerm comes up.

Continue reading “Multitudinous terminology!”