I first wrote about the Glossary Converter on September 17, 2012… over three years ago. Not only is it a surprisingly long time ago, but I still meet people at every conference I attend who have never heard of this marvelous little tool, and in some cases never heard of the OpenExchange either. So when I toyed with the idea of writing an article about Xmas coming early and talking about the OpenExchange and all the goodies inside, part of me couldn’t resist writing about this tool again. In the three years since it was first released it’s morphed beyond all recognition and today it’s awash with features that belie it’s appearance.
I like to take a little credit for the emergence of this tool because back in 2012 I asked around trying to get someone to create one so that it was straightforward for anyone to create a MultiTerm Glossary from a simple two column spreadsheet… the sort of glossary that most translators use for their day to day needs. I was over the moon when Gerhard (the developer) was interested and created the tool I wrote about back then. But I can take no credit whatsoever for what the tool has become today and it’s well worth revisiting!
Contents
The Glossary Converter
By way of introduction to the features I’m going to start with the menu that appears when you click on “Settings” from the simply disguised user interface:
Five tabs that lead you into a wealth of capability that go well beyond the original simple glossary solution. Let’s start with the General tab.
General
The features in this tab allow you to do two things:
- Convert any of the filetypes listed to any of the filetypes listed!
- Use a MultiTerm template when creating a termbase
The first one is obvious and is explained by the table below, you can convert SDLTB (MultiTerm term base) to XDT (MultiTerm definition file and the exported MultiTerm XML), Excel (XLS, XLST, CSV, TXT), TBX, UTX or TMX! So here it’s possible to not only convert spreadsheets to MultiTerm termbases, but also to terminology exchange files, or translation memory exchange files.. and even back the other way. You can even use it to convert a spreadsheet to a TMX translation memory or the other way around… fantastic! If the cross section between products is blue you can convert between them:
The second thing is a very handy way to transform data based on predefined definitions or existing Termbases. So you can have different definitions (xdt files) saved for different usecases, maybe separate extracts for different translators depending on language pairs, domain info, who the client is etc. Then when you need to create an export of your larger MultiTerm termbase based on these definitions you just drag and drop, select the definition you want, and set the format you want the termbase to be converted to. Very smart and a real timesaver.
Spreadsheet
This tab is what you’d expect… a variety of settings related to handling terminology in spreadsheets. I recall with fondness the discussions with our product management team many years ago about how MultiTerm should not really be exported to a spreadsheet and why the internal csv export could never be better than it is. The reality is everything they told me is true, and for terminology management as a whole there are very few alternatives to MultiTerm in terms of it’s overall capabilities. However, they didn’t really consider the usecase for the vast majority of the SDL Trados userbase and this is not only far simpler, but it requires a different approach. The Glossary Converter has not only provided this alternative approach but it has exceeded expectations and makes it possible to handle all but the most complex tasks in Excel. This has proven to be of immense benefit to not only the translation community, but to terminologists as well and has almost made “MultiTerm Convert” redundant!
The spreadsheet settings support;
- “fast mode” for reducing the conversion time when using large and complex spreadsheets,
- “tags” in the column headers for supporting the ability to use fields with the same name on different levels (Entry, Language, Index and term),
- “unprocessed XML” to support the use of cross references (hyperlinks) between a MultiTerm Termbase and a spreadsheet. This was something you previously needed to use MultiTerm Convert for, but no longer!
- “synonyms” used in your termbase. The safer way to handle spreadsheet conversion is to work on a single line per entry and use a separator to define the use of synonyms. But if you really want to you can also use a multi-line format… I’d recommend the help provided to see how that works, and why it’s hard to support.
It’s no surprise that the Glossary Converter is the most heavily downloaded application on the OpenExchange!
Fields
It’s these little bu##*!s that have been responsible for much of the difficulties faced by translators after they underwent training, learned how to create a simple bilingual termbase using the predefined templates in MultiTerm and ended up with a simple structure like this:
It’s great for the first few days when you’re really enthusiastic and think you’ll use all this metadata, but pretty soon it wears thin and then you try to import and export… major headache! This capability is fantastic for a terminologist who can configure this the way they wish, and then manage the terms as part of a complete lifecycle, but it’s not a simple concept to get your head around when you only want a glossary and have no interest in learning about all the capabilities MultiTerm has even if you do own it!
The Glossary Converter will create the structure on the left without batting an eyelid and it won’t confuse you by offerring you the enticing possibility to structure your termbase with a little more detail. However, it can do it if you want to!! That’s what this “Fields” tab is all about, and it supports your creation on the fly in an easy to use interface that is virtually on a single screen. It supports the full capability of MultiTerm in this regard, even allowing you to create picklists during a spreadsheet conversion, or use references to images. The best part is it then remembers what you did so the next time it’s just a drag and drop convert with no aditional work… you can also save your creations as templates you can load and use again to suit your needs at the time.
Brilliant!
User Interface
This tab holds two features that allow you to tailor the product a little, the UI language and the UI themes. The UI supports eight languages plus English (DE, ES, FR, IT, NL, PL, RO, RU… mostly translated by happy users all mentioned in the help) and if you’d like to translate this into another language Gerhard is happy to support this and will send you a resx or Excel file which you can translate and send back for inclusion in a future release. See the help for details!
I’m not sure where all the inspiration for the UI themes come from (there is a seasonal flavour!), but there is probably something for all tastes… I’ll admit to being a little conservative in this regard and I prefer the default green at the bottom of the image below.
Merging
Merging is the latest addition to the Glossary Converter and it allows you to do what it says… merge files. So you could merge three spreadsheets into one MultiTerm term base by dragging and dropping all three into the interface, or you could merge a spreadsheet and an existing term base by dropping them in the interface too… so simplicity itself you’d think! The available settings provide some control over what you are merging and can help reduce loss of data, but may result in the need for careful quality control after the merge is complete. By way of example, and because I don’t want you to think the application is at fault, I have copied a couple of problems associated with merging that Gerhard provides in the help.
Synonymns (different word, same meaning)
When they are in the source, and you chose the source language to merge on, then you may suffer from “not enough” merging because the converter can only merge identical words, it has no concept of meaning. So here “trunk” and “boot” should have been merged as they are synonyms of “Kofferraum”:
Homonyms (same word, different meaning)
Here you may suffer from “too much” merging because if the converter sees the same word twice it merges. So here “lock” in a door is not the same as “lock” in a canal but they get merged anyway:
.
Gerhard has provided a lot more detail in the help about merging because it is fraught with complications due to different definitions, homonyms, synonyms, field names/types, retaining a Master termbase (helps with homonym problem), merging on entry number etc. So I’m not going to repeat it all here… these are all the things that MultiTerm handles reasonably well and the Glossary Converter does an excellent job of supporting it, but if you are going to try it make sure you understand exactly what you are merging and make sure you read the help section Gerhard provided… it’s a lesson on terminology management!! Sometimes it’s easier, especially if you have simple glossaries, to merge them in Excel and remove duplicates via the sorting mechanisms provided there. Then simply recreate the glossary.
The one final point I will make is that the interface in the latest version (version 4) introduces a traffic light system to make sure you are aware of any merging settings you have introduced. It would be very easy to forget and then run a conversion that merged when you were not expecting it!
The Glossary Plugin
Now, I can’t leave this article without giving a mention to the Glossary Plugin. This is another tool developed by Gerhard but using the Glossary Converter for most of it’s functions. The Glossary Converter is a standalone application, but the Plugin is integrated into Studio and looks like this:
It sits in the “Projects View” in the “Home” ribbon and has six functions as follows:
- Add : this feature can create a new empty termbase and add it to your Project in two clicks!
- Add (MT) : this calls up the MultiTerm wizard and allows you to create your term base the old fashioned way, but adds it to the active Project at the end.
- Import : this allows you to select any of the supported files (SDLTB, MultiTerm XML, Excel (XLS, XLSX, CSV, TXT), TBX, UTX, TMX) and they will be converted to a MultiTerm termbase and added to the active Project
- Export : this will export all the termbases that are enabled in your Project to the format specified in the settings of the Glossary Converter
- Clean : this will delete the selected termbases (a window appears showing the termbases) from your Project.
- Run GC : this runs the standalone Glossary Converter application
This is a great application because it simplifies the creation and adding of termbases to a Studio Project from pretty much any format you receive, and more importantly supports you in creating client specific term bases or glossaries which can then be provided in a variety of formats to your client when the job is complete.
Conclusion
In my opinion the introduction of the SDL Language Platform, and in this case the Glossary Converter, has been responsible for making the management of terminology for use in Studio more efficient and easier to handle than in any other tool. There is absolutely no reason why you should not be using MultiTerm for every project you work on, whether it’s for managing your own termbase or creating and managing one for your clients. You can keep it simple, or make it more complex, both with minimal effort. This kind of flexibility that can handle such a range of requirements doesn’t exist in any other translation tool!
Here’s the links again if you haven’t downloaded these applications yet:
And here’s a few resources which might be useful as they show the use of these tools in various scenarios:
Blog articles:
- Creating a TM from a Termbase, or Glossary, in SDL Trados Studio
- Glossaries made easy…
- Great news for terminology exchange…
- If I knew then what I know now!
- Is MultiTerm really that hard to learn?
- Glossary to TM… been there, done that…
- Yanks versus Brits… linguistically speaking!
- What a whopper!
- Export for External Review – a detour
- FIT XXth World Congress – Berlin
- The ATA55 in Chicago and the SDL OpenExchange (now RWS AppStore)… which apps?
- Converting Wordfast resources… out with the old!
Youtube videos
- Great news for terminology exchange…
- The Glossary Converter in practice
- Yanks versus Brits… linguistically speaking!
- Speedy Project Glossaries
- Glossary Plugin – importing a termbase from a spreadsheet
- Converting the Microsoft Terminology Collections to a MultiTerm Termbase
- More complex Glossary Converter
- Wordfast Glossary Conversion for SDL MultiTerm using the Glossary Converter
- Merging Glossaries or termbases with the Glossary Converter
- Merging multiple spreadsheets into one termbase…
Good Morning Paul, Thanks ever soooooo much! I discovered this tool 2 days ago, indeed, thanks to your blog. It helped me a great deal! Please go on, you’re ever so helpful. Best Regards / Mit freundlichen grüßen / Meilleures salutations Caroline Charlier luxtranslations-logo-new-gross
re “This kind of flexibility that can handle such a range of requirements doesn’t exist in any other translation tool!”: except for CafeTran, of course 😉
Nothing beats a tab-delimited text file for flexibility combined with simplicity, if you ask me.
It doesn’t exist in CafeTran either! The tab-delimited is great, works here too, but CafeTran can’t handle the complex stuff that MultiTerm can handle at all.
Really, like what (“complex stuff”) exactly?
Like anything a terminology solution should be able to support… this document might be helpful.
Nice try, Paul, but pointing me in the direction of a bunch of slides isn’t what I asked. I wanted to know what MultiTerm can do that CafeTran cannot (apart from crash)? We can add as many fields (columns) as we like to our txt glossaries, and clickable URLs and images, and regexes, and all kinds of other magical “stuff” (there’s that word again). I keep my own master TB in tlTerm (http://tshwanedje.com/terminology/), so I do know what terminology management software looks like.
ok, I just thought comparing MultiTerm to CafeTran is too big a subject. We could trade blows but I don’t know it well enough… so based on what I’ve read perhaps not handling MultiTerm (XDT and MultiTerm XML is almost a defacto standard for many users), not handling multilingual termbases to support multilingual projects (not even sure Cafe Tran supports multilingual projects?), not being able to import/export TBX files, not being able to work with terminology outside of a translation job etc. But why waste time on this since I know you’ll argue till the cows come home! I believe Cafe Tran has two ways to handle its language pairs (according to the help… obviously I don’t know the product well), the first is tab-delimited files and the second is TMX files. Both of these are flat files which tells me the solution is not designed to be able to work the same way as MultiTerm no matter what arguments you produce. It is not a terminology tool and I don’t think it even pretends to be so attempting to compare feature for feature is a futile exercise.
I have installed CafeTran and will take a better look at its Glossaries as the help looks interesting… but my experience so far in testing it for it’s ability to work with XML was disappointing as it has no support for anything but the most basic of XML files. I also find it very difficult to get started with… but probably once I work through the help I’ll figure it out.
• not handling MultiTerm XDT and MultiTerm XML is almost a defacto standard for many users) > hmm, not sure I care. I’m a translator, not an academic or lexicographer
• not handling multilingual termbases to support multilingual projects (not even sure Cafe Tran supports multilingual projects?) > multilingual projects are for project managers, not translators
• not being able to import/export TBX files > no one uses (or likes) TBX. it’s a terrible format.
• not being able to work with terminology outside of a translation job > that’s the beauty of tab-del txt: you can open it in a wide variety of great tools, and immediately start using/editing them (paired with e.g. Ron’s Editor, tab-dels are very easy to work with). try that with a MultiTerm file 😉
What is and what isn’t a “terminology tool”, will depend on who is using it. I’d say that for the majority of normal translators, a simple tab-delimited text file (or even a TMX – the two terminology containers in CafeTran) are much more useful than a full-blown MultiTerm monster file, which they can’t even use without resorting to one of your various OpenExchange workarounds. It’s great that you have all these people willing to create these little tools, but they are often patching up holes in your own software.
However, having said all of this, my personal experience is that 95% of translators out there in the wild don’t use *any* terminology tool/solution, whether in a CAT tool or not, or in MS Word, so the whole discussion is probably kind of academic 😉
You asked this “I wanted to know what MultiTerm can do that CafeTran cannot”. Well MultiTerm is sold as a standalone tool for Terminologists so what did you expect!! CafeTran is a translators tool, great for some things I’m sure, not so great for others. Your personal choice as we see in the forums on a regular basis. But perhaps it’s just not the answer to every localisation task we see in the industry today!
Kind of a shame that the terminology component of SDL Studio (which is a CAT tool, for translators) is “a standalone tool for Terminologists”.
We’re talking about MultiTerm, not Studio. Studio is able to use MultiTerm components, but MultiTerm is also a standalone product in its own right and has nothing to do with Studio in that regard. The truth is Trados developed MultiTerm long before they ever created a translation tool and it has capabilities that a translator is probably not going to need, and what use would Studio be to a terminologist?
I think it’s clear you’re just going to argue about everything and it’s unlikely I’m going to rush out and become a CafeTran user so I think I’ll draw a line under this conversation now.
Great post, btw!
PS: so, in a sense, CAT tool developers should probably be focusing on one thing above all others: making their tool’s terminology handling as easy & intuitive as possible (if they want anyone to actually use it).
Thank you for the updated post Paul. I, like many other users, find the Glossary Converter extremely useful as a standalone tool, but the thing that absolutely delights me is the Glossary Plugin, I just love how easy it is to create and add a new termbase, plus the added flexibility to use either Multiterm or Glossary Converter.
Hi Paul, Michael, Caroline, Nora,
I understand Michael’s point that a tab-delimited text is suitable for many situations. But I also find the term recognition/verification in Trados very useful.
I believe UTX is perfect here because it is a “standardized tab-delimited” glossary format. The data structure is standardized so that you can easily reuse and share your glossary data.
Glossary Converter can already handle the conversion to/from UTX 1.11 as Paul mentioned above.
Our team at AAMT has just released the new UTX 1.20 beta specification (available for free).
http://www.aamt.info/english/utx/
Your feedback and comments are much appreciated.
@YAMAMOTO: UTX sounds very interesting, but I noticed that there doesn’t seem to be any support for synonyms! In CafeTran tab-delimited txt glossaries, synonyms are separated by semicolons, thus: kat;poes;kater [TAB] cat
A standard for txt terminology (that is actually used) is greatly needed, but it really needs to support synonyms.
Hi Michael,
Thank you for your reply.
UTX supports synonyms by means of concept IDs (see the specification for more details).
http://www.aamt.info/english/utx/download.htm#spec
In UTX, similar terms share the same concept IDs.
When you use concept IDs, one term is expected to have “approved” term status. In other words, it is the “best” (or “default”) translation. Unlike TBX, concept IDs are only assigned to terms with “synonyms.”
In UTX, Your example looks like this.
term:nl term:en term status concept ID
kat cat approved 1
poes cat 1
kater cat 1
But I guess it’s a good idea to explicitly mention “synonyms” in the UTX specification. Thank you.
Hello
is there a way to modify the “created by” and “modified by” fields when importing from Excel using GC? Everytime I convert an Excel, the resulting Term Base contains fields “Created by” and “Modified by” populated with “glossaryconverter” and I would really like to change that. I appreciate any help!
Hi, I guess you also posted the question we answered here in the SDL Community.