Converting Wordfast resources… out with the old!

01This article is all about out with the old and in with the new in more ways than one!  In the last week I have been asked three times about converting Wordfast translation memories and Wordfast glossaries into resources that could be used in Studio and MultiTerm.  Normally, for the TXT translation memories I get I would go the traditional route and use a copy of Wordfast to export as TMX.  Then it’s simple, but what if you don’t have Wordfast or don’t want to have to try and use it?  Wordfast glossaries are new territory for me as I’d never looked at these before.  But on a quick check it looked as though they are also TXT files so I decided to take a better look.

Before I get into the detail I’ll just add that I’m not very familiar with Wordfast so I’m basing my suggestions on the small number of files I have received, or created, and the process I used to convert them to formats more useful for a Studio user.  I’ll start with the glossaries as this is where I got the idea from,  I better explain my opening statement too… this is because after I did an initial conversion using the Glossary Converter from the SDL Openexchange I was asked to explain how this would work with MultiTerm Convert.  This of course made me think about the old versus the new… I wouldn’t compare Wordfast and Studio in this way at all 😉

Wordfast Glossaries

These glossaries, at least the filebased ones as opposed to server versions (no idea if they even use these?) are simple TXT based formats.  This means you can open them in a text editor and read the data inside.  They also have Translation Memories that are in TXT format, and as you may know the old Trados had Translation Memories that were TXT format too.  Why am I mentioning this?  Well, we now have three TXT files but you can be sure the data within is different in each case.  So this means that converting them to something usable in any other tool means knowing how to manipulate this data into a format you can work with.

The Wordfast glossary TXT seems to be a fairly simple layout with four fields I think… at least this is all I have seen.  None of these fields have headers in the TXT file but they seem to be this, from left to right:

  1. Source language
  2. Target language
  3. Description field
  4. Domain

Once you know this the process of converting is pretty straightforward.  So I have done it two ways in order to satisfy a couple of requests I recently received.

Glossary Converter

First of all, using the Glossary Converter.  This one is a longer video, but only because in this one I was explaining how to understand the TXT file and how best to manipulate it to get what you need.  The conversion was the easy part!

 

The second way is the conventional way using SDL MultiTerm Convert which is based on deriving a termbase definition from the file you are converting so that you can then create a MultiTerm termbase and import matching XML data into it.  This is an application that is installed with MultiTerm so you’ll find this in the program group for SDL MultiTerm.

SDL MultiTerm Convert

There’s no sound on this one (it was late, I’m staying at my Mothers house, and I didn’t want to wake anyone up.  But I still wanted to make the video while I was in the mood!) so I annotated it instead… hopefully well enough to make sure it’s clear.  This process has always been something many users stuggled with and is just one reason why the Glossary Converter has been such a successful application.

 

Now that I watch it back perhaps this will be useful for anyone trying to learn how to use SDL MultiTerm Convert, not least because there is no sound which might remove the rambling language barrier imposed by me talking too fast!

Blacklists

Before I move onto Translation Memories it’s probably worth mentioning that Wordfast also has the concept of blacklists.  These consist of two monolingual columns like this:

03

These are also saved as a TXT file and this time just two columns in the tab delimited file… Forbidden and Suggestion.  This is a very different approach to that taken by MultiTerm where you can set up forbidden terms by adding the forbidden words as synonyms with a forbidden attribute.  So if the blacklist contains suggestions that are also in the Wordfast glossary then you could do a lookup in Excel to match the forbidden terms with the suggested ones.  However, I doubt very much that Wordfast users operate in this way as it doesn’t really make sense to duplicate the effort.  So what you could do is add these to your glossary and also add a field for the blacklist attributes like this:

04

Since the Wordfast Blacklist is monolingual you could now translate the terms into the corresponding target, in this case Spanish.  This would be quite an interesting way to make use of these resources and turn them into a proper managed termbase.  I ran through this quickly using the Glossary Converter so you could see how this would be used in MultiTerm and Studio.  The principle of using forbidden terms in Studio is explained in this article, but watch the video below first:

 

There is also one other way this Blacklist could be used in Studio but this is less practical.  Studio has this concept of a word list where you can enter monolingual terms with a correct and incorrect form.  This might be more appropriate.  The disadvantage is that Studio bizarrely doesn’t have an import facility for this list so you would have to type them in one by one (unless you know how to do this via an XML file where Studio writes this data).  So I think it would make sense to have this and if anyone leaves comments below to tell me that this is exactly what Wordfast users do with this then we may create a small tool on the OpenExchange to make this possible with an import/export directly from the Wordfast Blacklist TXT:

05

Wordfast Translation Memories

These are also TXT files with different content to the Glossary TXT, but I could use the Glossary Converter for this too.  There is a very neat feature in this tool that allows you to convert files to TMX with the same drag and drop simplicity.  However, there are potentially drawbacks to this method so I am going to recommend that if you don’t have a copy of Wordfast that you use Olifant which you can download for free from here.  There is also a reasonable online help if you can’t figure out how to use it.  I created a quick video here that explains how to do this too, and I also went over the Glossary Converter method so you can see the difference:

 

Wordfast Bilingual Files

These files come in two flavours as far as I’m aware, Bilingual DOC and TXML.  Wordfast Classic produces files very similar to the old Trados Bilingual DOC files.  I say very similar because they have the same markup concept, but I think there may be minor differences in places around certain features in Microsoft Word.  So I think if you work with these files you should be able to handle them in Studio using the same Bilingual DOC filter that is used for Trados Bilingual DOC files.  This means you need to also make sure that the files are fully segmented prior to opening them in Studio because anything not segmented will not be extracted for translation at all.  You can read more about this process in this article.

Wordfast Pro creates a different bilingual filetype called TXML.  You can also handle this file type in Studio by using the free OpenExchange filetype plugin for TXML.

02

When you install this one pay attention to the note in the OpenExchange:

"If you install the filetype and then don’t see it in your list, or you
think the installation was too fast and it didn’t seem to do anything,
then please go to Options -> Fileypes and then look on the right. It 
probably says "additional filetypes exist". Click that, select the 
missing filetype you installed and you'll be good to go."

Once you’ve installed this filetype you can open TXML files in Studio, translate them, and the target translation will be the completely translated TXML which you can return to whoever asked you to work with this file.

Conclusion

Working with Wordfast resources is quite straightforward, and you should be able to enjoy making use of the most likely resources to be made available to you in Studio.  I imagine the same process applies the other way around as you could use the Glossary Converter to create a glossary in exactly the same format as the one that Wordfast is looking for.  It might have to be simplified a bit depending on what sort of terminology structures you are working with, but to just make a glossary available from MultiTerm for Wordfast is a trivial task now you know how to do it!  Translation Memories are similarly easy.  You would just export your SDLTM to TMX and then convert this to a Wordfast TXT using the simple instructions from Dominique Pivard.  I used these successfully after I realised the command in that article didn’t apply to my copy of Wordfast Pro… so it is this process for me because there is no Open command to select the TM:

in Wordfast Pro: copy the TMX memory you want to use in any folder, 
then select Translation Memory > New/Select TM > Add TM; (then change 
the filetype dropdown to TMX to be able to find your TMX) it will be 
converted on-the-fly to the native Wordfast format and a file with 
the same name, but the .TXT extension will be created in the same folder.

Have fun converting!

15 comments
  1. Daniel said:

    I thought I would share a few useful tips as I do Wordfast to Studio conversions quite frequently…

    1) Wordfast glossaries can be open directly in Excel (though you may have to select the correct import settings: “Delimited width” per field (with “Tab” delimiters), and most of the time, Unicode 7 or 8 encoding, at least for English/French combinations. Once imported in Excel, you know what to do…

    2) Wordfast TMs can be converted to TMX using a small utility called Wf2Tmx.exe. It is free to use, just not easy to find on the Web*. It will convert from TXT to TMX and from TMX to TXT. Though the conversion to TMX sometimes fails or doesn’t give perfect results, it is pretty useful and works 99% of the time for me.

    3) Wordfast files can be imported and translated directly in Studio using this great File type plugin, which you already know: http://www.translationzone.com/openexchange/app/filetypedefinitionforwordfasttxml-371.html#31889
    So far, after a few months (years?) of continuous use, I have been able to translate every single TXML file in Studio using this plugin, and the output is absolutely the same as a TXML translated directly with Wordfast. My clients have never been able to notice any difference.
    Now, the only thing I haven’t been able to do is use Wordfast remote TM’s or glossaries. Yes, they do exist, but it is obvious that their protocol, format and other settings are not known, and therefore Studio cannot make use of them. I am still hoping someone will be able to create another app for this use. That day, Wordfast will go to the Recycle bin for good…

    * For those who are interested to get this utility, I tried to retrace where I found it back at the time. I think it was in the Wordfast Yahoo group https://groups.yahoo.com/neo/groups/WF_PRO/info. You have to create an account to get access, and from there, you can download it (hopefully). My own account has been deactivated as I haven’t accessed it for a while, so I can’t check this right now. Back at the time, the account creation needed to be verified by the Yahoo admin, so it was not an instantaneous access…

    Like

  2. Hi Paul!

    I am wondering if the email I received below is legit?

    It arouses suspicion since I haven’t purchased anything in a (long) while.

    As you can see below, others are wondering too.

    Please let me know so that I can update the folks on the Colorado Translators Association listserve.

    Thanks!

    Kathy

    *~*~*~*~*~*~*~*~*~*~*~*~*~*~*

    Kathy DiCenzo, Freelance Translation Project Manager and German > EN translator, Tel: +(720) 890-7934 (Sorry, no texts!) FAX: +(720) 596-8900

    The information contained in this message is confidential and intended only for the use of the individual or entity named above, and may be privileged. Any unauthorized review, use, disclosure, or distribution is prohibited. If you are not the intended recipient, please reply to the sender immediately, stating that you have received the message in error, then please delete this e-mail. Thank you.

    Like

  3. G’day Paul

    The WF glossary format is nearly fully described in the user manual:
    http://www.wordfast.net/zip/wf_en6.zip

    In the glossary, only the first column is compulsory (i.e. you can have an entry that contains only an SL term, if you don’t know the TL term yet, and when you get a glossary match in WFC, you can simply add the TL term at that time). You can have as many columns as you wish (e.g. added manually), but anything after the third column is ignored by WFC when it comes to displaying glossary information in WFC during translation.

    In the glossary, the first column is the SL term, the second column is the TL term, and the third column is the comment.

    When adding a glossary term in WFC, the dialog contains fields for the SL term, TL term, comment, and three additional fields named F1, F2 and F3. The user can enter anything into the “F” fields, and WFC will remember what the user entered.

    (The #%General%# was probably added by the user — perhaps he entered it once, and then forgot to remove it again, which meant that all subsequent terms that were added got that label).

    The “F” fields can also take field codes. For example, if you put “{Today}” in F1, then the “today’s date” (the date that the term was added) will be saved in column 4. Other field codes include {User}, {TM}, {SrcLang}, etc.

    If you’re importing your own glossaries, then you’d know what fields are important to you, but if you’re importing someone else’s glossaries, then you can either ignore column 4 up to the end of the line (that’s what I would do) or you can treat everything from the third column to the end of the line as a single comment.

    Note that SL terms can also use wild cards. You may, therefore, find an SL term like “colo*r” (matches both color and colour) or “$# fine” (matches both $1 fine and $10000 fine), though I’m not sure how many glossaries will have those.

    Samuel
    WFC user

    Like

    • G’day Paul

      The wrong-form-correct-form feature of Trados 2009+ is probably the right place for the blacklist entries. The usefulness of the blacklist is that it warns the user of a notifiable word in the target text regardless of which words are present in the source text. The blacklist can be useful to build up gradually when you become aware of the preferences of the client’s proofreader, particularly for words that may have multiple possible SL terms.

      The 2nd column in the blacklist is optional, so not all blacklist terms will have entries in that field. Can you add terms to the wrong-form-correct-form feature without adding the correct form?

      Samuel
      WFC user

      Like

    • Thanks Samuel, useful to know there is a document like this. I think the most important thing is that it’s a doddle to convert whatever columns are there, but certainly useful to understand more about how it’s really set up and supposed to work in Wordfast.

      Like

  4. Marion said:

    Hi Paul
    Many thanks for doing these step-by-step videos. Converting glossaries and TMs has just become a bit easier for me thanks to these tutorials.
    Are you aware of the “Recent” bar in Word and Excel when opening files? In your first video e.g. you could save a couple of clicks by selecting the second file (general.txt) from the top shown in the “Recent” bar, the green one on the left. Both, in Word and Excel, I think the list appears again on the right side after you click on Open Other Documents/Open Other Workbooks. Or you can click on “Recent”(the top one) instead of Computer and Browse. And choose the file you want to open.
    Hope that helps 🙂

    Like

  5. I recently purchased Trados 2015 and learning how to use it has been a daunting task. Your guide on converting TMs has been very helpful, though I think there’s a question left unanswered: Is it possible to convert from TMX to TXML? A colleage and I have been translating a document and she has asked me for my TM in TXML format. I realize Trados does not create TXML files, but SDLIFF. Any help on this would be inmensely appreciated.

    Like

    • Hi Rodrigo, that seems a strange request and I’m wondering whether your colleague really just means how to get a TMX into Wordfast? I don’t think Wordfast can convert a TMX to TXML directly either. If this is really what’s required I guess you could open the TMX in Studio using the TMX filetype from the appstore and then open the sdlxliff in WordfastPro and the bilingual file WordfastPro creates will be a TXML.

      Like

      • rhrs1987 said:

        Hello paulfilkin.
        I think I didn’t explain myself clearly. I work with trados 2015 and she works with WordfastPro. What I’d like to do is to convert from tmx into txml. Indeed she did say she needed a txml file for using transcheck feature in WFP, and that also a colleague of her working with trados has sent TMXs to her converted into TXML in past occasions. Last night I read elsewhere that WFP doesn’t support sldxliff files. is that true?

        Like

      • rhrs1987 said:

        Update: I was able to convert to txml by uploading the sdlxliff to the WF Anywhere platform and then downloading it as a txml. In theory, WFP should be able to open sdlxliff files and automatically convert them to txml. I’m quite sure this is what her friend working with Trados must have done.

        Like

      • That makes a lot more sense… but if you find otherwise I’d be interested to know what they did!

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: