Strong words… “Committing the Cardinal Sin“!
I can remember from my early days with SDL many interesting, and often frustrating conversations with the then Product Manager for MultiTerm. The almost religious use of phrases like “You can’t use spreadsheets for terminology”… “It only takes a few steps to be able to create a simple glossary with MultiTerm”… “You can’t properly export a MultiTerm termbase to Excel”… and many more discussions along these lines. Well, over the last year or so mainly thanks to the SDL OpenExchange (now RWS AppStore) which removes the shackles of being tied to “the way it’s always done” we have seen one tool in particular that has proven this traditional way of thinking wrong. But not because our friendly product manager was wrong… he was mostly right. When you think about Terminology Management in the traditional sense then Excel is not really suited to managing concept oriented databases that are designed for the terminology professional. It has its place, but is definitely prone to error and difficult to manage as the database grows. But what if you only want a glossary?
The Glossary Converter on the OpenExchange changed all of this and today the conversion between fairly complex databases to Excel for routine maintenance is commonplace, the ability to convert Excel glossaries to MultiTerm is childs play. There is another tool that provides even more flexibility, again on the OpenExchange, called Excelling MultiTerm which I mean to write about one of these days. But none of these has the audacity to commit the “Cardinal Sin“… which is of course using the spreadsheet as your termbase with no conversion required at all!
How is this possible? Well, in case you missed it I wrote briefly about this in my last article. There are a couple of new APIs that have been made available for developers in the new Studio 2015 SR2 release and one of these is the Terminology API. In a nutshell this allows a developer to use another source for your terms, in preference to, or in addition to MultiTerm. So to help developers see how to use this we created the possibility to simply use a spreadsheet… I mean the “royal we” (it was Romulus Crisan really!)
Format of the Spreadsheet
We do need a few rules with our example. The good thing is a developer could take our code and extend this to more complex scenarios if required, but I think MultiTerm, or another database approach will still be better as the structures gets more complicated. For our sins however we have set the scope as follows:
- One column for source terms
- One column for target terms
- One column for Approval, Forbidden terms
- Ability to separate synonyms with whatever you like (default is a pipe symbol)
The reason for this is twofold. First it’s intended to be an example of how to use the Terminology Provider API for developers to use and learn from, and secondly it’s aimed at the glossary user rather than the complex terminology user. However, I reckon this simple format will satisfy the needs of a very large number of translators indeed! I would like to add though that our original simple format and usecase quickly became a lot more sophisticated thanks to our Beta testers, and Nora Diaz in particular… so thank you Nora for the great feedback to shape how this is used in practice!
How do you use it?
The “Add a termbase” feature in Studio has been changed to look and feel a lot more like the “Add a Translation Memory” feature. So it now looks like this:
By default you get the “File-based” and “Server-based” MultiTerm termbases and then after that you can have whatever terminology plugin you can find! Unsurprisingly there is only one available on the OpenExchange today, because this API has only just been made available, but I expect this to change over the course of the year as more glossary and terminology providers see the opportunity to add their products into Studio for use in interactive translation. The one that is available today is what we’re talking about, the Excel-based Terminology provider.
So you select this one and will be presented with a few basic options like this:
The Has header checkbox allows you to ignore any titles in the first row of the spreadsheet. If you don’t check this then they will treated as terms in your glossary. The Is Read Only checkbox is something we put in there just in case you do try to use this in a shared environment, like dropbox for example. We haven’t tested this at all, but I’m almost certain someone will try it! So if you do then the idea is you would make the shared Excel glossary a Read Only one using this option on every machine linking to it. This way we hope it will minimise the danger of corrupting the file. So you write to your own Excel file and just read the shared one, safe in the knowledge you can’t write to it even if you try!
So it’s pretty straightforward, you just set the source and target columns to match what’s in your spreadsheet, add an approved column, or notes column (it says approved but you could use this for whatever you like), set your languages and if you are using synonyms you specify whatever text separator you used. So to elaborate on this a little, your file could look like this:
An Excel distraction!
If you maintained a spreadsheet in the past with synonyms in different columns then just concatenate them into a new column. If you’re not familiar with this very neat feature in Excel you could do something like this:
This would allow you to concatenate words in cells B2, C2 and D2 with a pipe symbol between them, like this:
So now you would have the correct string containing all synonyms in one cell, E2. You can then copy the cell down your spreadsheet and do them all at once. Make sure you then select the column, copy it and paste it in as values so you have the text as opposed to the formula, then finally delete the original three columns so you only have the one with the concatenated synoynms:
IMPORTANT: Make a copy of your original spreadsheet before you start to mess around with it!!
If you don’t use synonyms then don’t worry, you only have to select your columns in Studio exactly as they are.
Back to the app!
Once you have set your settings just select your spreadsheet and click on the submit button:
Now you should have added the spreadsheet as if it was a MultiTerm termbase, only less clicks 😉
Some interesting things to note. If the filename is long then it doesn’t fit into the window, just like MultiTerm, and you need to make sure your languages match those you need, just like MultiTerm. Now you’re ready to go!
I moved these windows next to each other just to show you how they work. So here you have the new custom termbase Viewer next to the Termbase Recognition window. It’s also worth noting that we tested with a spreadsheet containing over a million lines in it, something this large the filebased MultiTerm solution would not be able to handle at all… this one has some 90k lines so still pretty sizeable and I’m not noticing any performance degradation at all:
The recognition window you’ll be familiar with because this is just how MultiTerm works. You see the terms that were found in the spreadsheet and you can also view the term details from there in the same way you would using MultiTerm to see the terms in the termbase viewer. In this case there is little value in using the viewer for more detail because it’s exactly the same as the term recognition window. However, if you want to edit anything this is where it comes into its own. There are a number of features around this to be aware of, and all associated with the buttons you can see in the screenshot above:
- Add: this feature allows you to add new terms whether they are in your translation or not. So if you just wanted to add something you found while looking up another term on a website perhaps, then you click on Add and just type the new source term into the cell, then press the Tab key (or use the mouse) and you’ll be taken to the target where you can add that too. Add all the synonyms you wish, and notes or approval status, and then you must click on the Save Entry button.
So works the same as using the “Quick Add New term” and “Add New term” icons in Studio itself. Using these will add the terms and save them to the spreadsheet automatically.
If you keep adding new entries using the Add button in the Viewer and forget to press Save Entry after each one then you will need to use the Sync feature before you finish (see below) or you’ll lose them when you close Studio.
- Delete: clicking this button will delete the entry you have highlighted in the Viewer.
- Save Entry: this will save the entire entry you are currently editing, and only that one… you can edit an entry just by double clicking it, or selecting and pressing F2
- Sync: this will synchronise all the changes you have made in Studio with the spreadsheet. If you have made changes into the spreadsheet while working (be careful with this, it’s not a great idea – see limitations below) then these will also be synchronised and made available in the Project. Restarting Studio will also synchronise changes made in the spreadsheet with the Studio Project.
In the translation editor itself you of course get all the usual benefits of being able to see the highlighted terms, and place them using AutoSuggest:
So pretty cool addition, and virtually hassle free if all you want is a simple glossary!
Limitations and Important Precautions
Verification: at the moment the API doesn’t extend to being able to make use of the term verification features that MultiTerm offers. So you cannot use your Excel spreadsheet of terms for verification purposes.
Settings: the settings for the spreadsheets you add can only be accessed when you add a spreadsheet. This is also a limitation of the current version of the API and available functionality in Studio. So if you wish to change something you have to remove the spreadsheet and add it again. During testing I occasionally corrupted the settings in Studio, probably as a result of messing around so much, but if this should happen it’s worth noting that the solution was simple in every case:
- remove the spreadsheet from the list and close Studio
- delete this file : c:Users[USERNAME]AppDataRoamingSDL CommunityExcelTerminologyexcelTerminology.json
- restart Studio and add your spreadsheet back
The Excel file itself: Excel is not a database! There are no controls over it at all. So here’s a few precautions worth noting:
- take a copy of your Excel Glossary and make sure you always have a backup
- don’t open the Excel glossary while working on it in Studio. We have built in some warnings to remind you if it’;s open during operations more likely to put the file at risk, but better to get into the right habit from the start
- don’t share it on a shared drive, or dropbox for example, unless you maintain it in read only mode and don’t be surprised if it gets broken!
I don’t mean these warnings to put you off using this app! Several of our Beta testers loved it, especially the simplicity of working with it in Studio and for maintenance in Excel. But it’s important to note that there are limitations with using a simple file like this instead of a database… it’s clever and going to be useful for any developers who want to know how to build a custom terminology provider, but it’s no substitute for a properly managed database with appropriate controls designed to manage access for read/write operations in single user and multi-user environments.
It’s also the “Cardinal sin” I’ve wanted to commit for many years!!
So, what’s it called and where can I get it? I really wanted to call it “Cardinal sin” but opted for the less provocative “TermExcelerator” which is available on the SDL OpenExchange (now RWS AppStore) here.