This article is all about out with the old and in with the new in more ways than one! In the last week I have been asked three times about converting Wordfast translation memories and Wordfast glossaries into resources that could be used in Studio and MultiTerm. Normally, for the TXT translation memories I get I would go the traditional route and use a copy of Wordfast to export as TMX. Then it’s simple, but what if you don’t have Wordfast or don’t want to have to try and use it? Wordfast glossaries are new territory for me as I’d never looked at these before. But on a quick check it looked as though they are also TXT files so I decided to take a better look.
Before I get into the detail I’ll just add that I’m not very familiar with Wordfast so I’m basing my suggestions on the small number of files I have received, or created, and the process I used to convert them to formats more useful for a Studio user. I’ll start with the glossaries as this is where I got the idea from, I better explain my opening statement too… this is because after I did an initial conversion using the Glossary Converter from the SDL Openexchange I was asked to explain how this would work with MultiTerm Convert. This of course made me think about the old versus the new… I wouldn’t compare Wordfast and Studio in this way at all 😉
These glossaries, at least the filebased ones as opposed to server versions (no idea if they even use these?) are simple TXT based formats. This means you can open them in a text editor and read the data inside. They also have Translation Memories that are in TXT format, and as you may know the old Trados had Translation Memories that were TXT format too. Why am I mentioning this? Well, we now have three TXT files but you can be sure the data within is different in each case. So this means that converting them to something usable in any other tool means knowing how to manipulate this data into a format you can work with.
The Wordfast glossary TXT seems to be a fairly simple layout with four fields I think… at least this is all I have seen. None of these fields have headers in the TXT file but they seem to be this, from left to right:
- Source language
- Target language
- Description field
Once you know this the process of converting is pretty straightforward. So I have done it two ways in order to satisfy a couple of requests I recently received.
First of all, using the Glossary Converter. This one is a longer video, but only because in this one I was explaining how to understand the TXT file and how best to manipulate it to get what you need. The conversion was the easy part![youtube=http://youtu.be/QnzO4QxTtng]
The second way is the conventional way using SDL MultiTerm Convert which is based on deriving a termbase definition from the file you are converting so that you can then create a MultiTerm termbase and import matching XML data into it. This is an application that is installed with MultiTerm so you’ll find this in the program group for SDL MultiTerm.
SDL MultiTerm Convert
There’s no sound on this one (it was late, I’m staying at my Mothers house, and I didn’t want to wake anyone up. But I still wanted to make the video while I was in the mood!) so I annotated it instead… hopefully well enough to make sure it’s clear. This process has always been something many users stuggled with and is just one reason why the Glossary Converter has been such a successful application.[youtube=http://youtu.be/g46bRUBxdXY]
Now that I watch it back perhaps this will be useful for anyone trying to learn how to use SDL MultiTerm Convert, not least because there is no sound which might remove the rambling language barrier imposed by me talking too fast!
Before I move onto Translation Memories it’s probably worth mentioning that Wordfast also has the concept of blacklists. These consist of two monolingual columns like this:
These are also saved as a TXT file and this time just two columns in the tab delimited file… Forbidden and Suggestion. This is a very different approach to that taken by MultiTerm where you can set up forbidden terms by adding the forbidden words as synonyms with a forbidden attribute. So if the blacklist contains suggestions that are also in the Wordfast glossary then you could do a lookup in Excel to match the forbidden terms with the suggested ones. However, I doubt very much that Wordfast users operate in this way as it doesn’t really make sense to duplicate the effort. So what you could do is add these to your glossary and also add a field for the blacklist attributes like this:
Since the Wordfast Blacklist is monolingual you could now translate the terms into the corresponding target, in this case Spanish. This would be quite an interesting way to make use of these resources and turn them into a proper managed termbase. I ran through this quickly using the Glossary Converter so you could see how this would be used in MultiTerm and Studio. The principle of using forbidden terms in Studio is explained in this article, but watch the video below first:[youtube=http://youtu.be/UFSFvqQUslE]
There is also one other way this Blacklist could be used in Studio but this is less practical. Studio has this concept of a word list where you can enter monolingual terms with a correct and incorrect form. This might be more appropriate. The disadvantage is that Studio bizarrely doesn’t have an import facility for this list so you would have to type them in one by one (unless you know how to do this via an XML file where Studio writes this data). So I think it would make sense to have this and if anyone leaves comments below to tell me that this is exactly what Wordfast users do with this then we may create a small tool on the OpenExchange to make this possible with an import/export directly from the Wordfast Blacklist TXT:
Wordfast Translation Memories
These are also TXT files with different content to the Glossary TXT, but I could use the Glossary Converter for this too. There is a very neat feature in this tool that allows you to convert files to TMX with the same drag and drop simplicity. However, there are potentially drawbacks to this method so I am going to recommend that if you don’t have a copy of Wordfast that you use Olifant which you can download for free from here. There is also a reasonable online help if you can’t figure out how to use it. I created a quick video here that explains how to do this too, and I also went over the Glossary Converter method so you can see the difference:[youtube=http://youtu.be/VR7nRRMx6lw]
Wordfast Bilingual Files
These files come in two flavours as far as I’m aware, Bilingual DOC and TXML. Wordfast Classic produces files very similar to the old Trados Bilingual DOC files. I say very similar because they have the same markup concept, but I think there may be minor differences in places around certain features in Microsoft Word. So I think if you work with these files you should be able to handle them in Studio using the same Bilingual DOC filter that is used for Trados Bilingual DOC files. This means you need to also make sure that the files are fully segmented prior to opening them in Studio because anything not segmented will not be extracted for translation at all. You can read more about this process in this article.
Wordfast Pro creates a different bilingual filetype called TXML. You can also handle this file type in Studio by using the free OpenExchange filetype plugin for TXML.
When you install this one pay attention to the note in the OpenExchange:
"If you install the filetype and then don’t see it in your list, or you think the installation was too fast and it didn’t seem to do anything, then please go to Options -> Fileypes and then look on the right. It probably says "additional filetypes exist". Click that, select the missing filetype you installed and you'll be good to go."
Once you’ve installed this filetype you can open TXML files in Studio, translate them, and the target translation will be the completely translated TXML which you can return to whoever asked you to work with this file.
Working with Wordfast resources is quite straightforward, and you should be able to enjoy making use of the most likely resources to be made available to you in Studio. I imagine the same process applies the other way around as you could use the Glossary Converter to create a glossary in exactly the same format as the one that Wordfast is looking for. It might have to be simplified a bit depending on what sort of terminology structures you are working with, but to just make a glossary available from MultiTerm for Wordfast is a trivial task now you know how to do it! Translation Memories are similarly easy. You would just export your SDLTM to TMX and then convert this to a Wordfast TXT using the simple instructions from Dominique Pivard. I used these successfully after I realised the command in that article didn’t apply to my copy of Wordfast Pro… so it is this process for me because there is no Open command to select the TM:
in Wordfast Pro: copy the TMX memory you want to use in any folder, then select Translation Memory > New/Select TM > Add TM; (then change the filetype dropdown to TMX to be able to find your TMX) it will be converted on-the-fly to the native Wordfast format and a file with the same name, but the .TXT extension will be created in the same folder.
Have fun converting!