A few bilingual TBX resources

01Since writing my last article on handling large TBX files I have extracted a few TBX files as language pairs only from the very large TBX provided by IATE and thought I would share them here for others to use.  If you want a specific language pair from the 25 languages within the IATE TBX then drop a note into the comments.  I can’t guarantee I’ll do it quickly, but as the process is fairly straightforward I will add them from time to time.
All of the files below are extracted from the following original: Download IATE, European Union, [2014] Also note that because of the nature of this TBX not all languages are equal.  This means there will be many more English terms in the TBX than anything else.  This gives you two options:

  1. Import the TBX into your favourite tool and complete the missing terms for your own personal use or;
  2. Remove the monolingual entries before import so you have a 100% populated termbase (straightforward if you convert the TBX to Excel first)

You may also find that some entries contain invalid XML that could prevent the import of the TBX into many validating tools.  If this happens you will have to remove the offending entries with a text editor first.  Hopefully you will have an editor that reports the problem and explains where it is.  If you are intending to import this into Multiterm they will probably still be too large as they are, please refer to this article of instructions on how to break them down into bite sized chunks.

Update Date: 150222

Since writing this article the team responsible for IATE have created a tool you can download that allows you to extract single pairs in any combination.  It’s called IATExtract.  So I won’t be extracting anymore from this date.  The pairs below were requested in the comments so I have done these, but I won’t be doing any more.  Notwithstanding this the latest download from IATE will contain more terms anyway so you are best to take your information from their website.
 

Available pairs (Date: 150222)

Just click on the pair you want to download and you should get a zip file containing the extracted TBX file.  If you find there is no extension on the file inside the zip just add a TBX extension to it after extracting it to a folder on your computer… I know I forgot to rename at least one so there may be more!
English <> Czech
English <> Danish
English <> Dutch
English <> French
English <> German
English <> Greek
English <> Italian
English <> Polish
English <> Portuguese
English <> Romanian
English <> Spanish
English <> Swedish
French <> Dutch
French <> Polish
French <> Portuguese
German <> Czech
German <> Danish
German <> Dutch
German <> French
German <> Slovak
Italian <> Czech
Italian <> Portuguese
Polish <> Dutch
Spanish <> French
Spanish <> Netherlands

 FIGS (Date: 140723)

Not a bilingual pair, but as I have it already I have also loaded a TBX containing only FR, IT, DE, ES and EN.  Might be useful for anyone looking for this combination.
FIGS + English

66 thoughts on “A few bilingual TBX resources

  1. Thanks for doing this, and for your articles. Well, I tried to get the Danish file, without much luck, so if you are able to do it I would certainly appreciate it.

      1. Wow, that was fast. Yes, that is my languages. Just downloaded the file, and will try to import it tonight. Now I know where to look for instructions. Thanks again.

  2. Hi Paul, thanks a lot for this great job and all these files. Could you please add the English-Spanish pair ? Thanks in advance and keep on going. Cheers,

  3. Post-scriptum: the Spanish-French pair will be very userful too, if it is not too much for you… and if you have sufficient spare time ! 🙂

      1. Thanks a bunch, Paul.
        Buona giornata,
        Giles
        2014-07-22 9:38 GMT+02:00 multifarious :
        > paulfilkin commented: “Thanks Giles… both pairs added.” >

  4. Hello Paul,
    Thank you for all the good work.
    The combination German-Dutch and French-Dutch, as ‘Manicle’ suggested above, would also be more than highly appreciated by me… For many translators such (IT) matters are really giving headaches, so muchos gracias herefore, this would really be fantastic. Have a nice day.
    Phil

      1. Hi Paul,
        I tried to import the German-Dutch version into Multiterm.
        Is it ok to import the tbx file straightforward into Multiterm.
        I created a new multiterm termbase file and tried to import it into this termbase, but the import wizard finishes by saying that no terms are processed.
        Thank you for your advice
        Phil

        1. Hi Philip, the way to tackle this is to convert the TBX to MultiTerm XML. So you create a new termbase in MultiTerm with the appropriate definition and then import the XML. The problem part, or rather the difficult part, is that you probably won’t be able to convert the whole TBX in one go. So if you look at the previous article I referred to it explains how I managed this for the full TBX with all 24 languages, and also one with just 5 languages. The process will be the same for you. I just loaded the TBX files for users because these are useful for anyone and not just MultiTerm users.

    1. Thank you Paul.
      Is there a way to import the tbx files “simply” into Studio itself as a TM; as many translators, I’m not strong in software-related issues/conversions, so may there’s a trick to import them without technical pains/issues into Studio? You never know 🙂
      Many thanks,
      Phil

      1. Hi Philip, the main problem with converting to a TM is that a TBX is concept based which means that each entry could have multiple terms in each language. Furthermore each entry could have, for example, 10 terms in German and 5 terms in Dutch, or 3 terms in Dutch and no terms in German especially since this TBX is the result of an extract from a larger TBX based on 24 languages. Which ones should be matched for a TM? Technically the process of conversion is quite simple for some tools, so Xbench for example quite easily imports a TBX and then allows you to export it as a TMX. How useful this is, and what the logic is to deal with synoymns and fields I have no idea but the process is simple enough…. in fact I just did it!
        You can download the TMX here : de(DE) – nl(BE)

  5. I’d like to suggest the following lang pairs:
    English Portuguese
    French Portuguese
    Italian Portuguese
    Thanks, F.

  6. Thank you Paul. I emailed the IATE development team yesterday and got this answer back today which I think is worth sharing:
    “We are preparing smaller files, organized by individual languages: they
    should be easier to be handled.
    After downloading, it will be possible for the users to create customized
    language pairs (using SDL), to meet their specific needs.
    We expect to have these files available on IATE by the end of August 2014.
    Best regards,
    Coordination IATE Support & Development Team
    TRANSLATION CENTRE FOR THE BODIES OF THE EUROPEAN UNION
    iate@cdt.europa.eu

    1. That’s good news David. I did expect to see this as they have only just started to share. I think their online facility is better as well and I expect to see more done with this in the near future since it contains a lot more metadata than the TBX and is more useful.
      Thanks for sharing your email.

  7. Thanks Paul! I’m currently experimenting with importing the Dutch-English TBX you posted here into Heartsome Translation Studio (the newly open source version), to see how well it preserves the data structure, and to then possibly export it back out into sth more useful (hopefully a tabbed UTF-8 text file for use as a CafeTran termbase).

    1. Great, let me know how you get on. I read your other comment on FIGS so I loaded this to the article as well just in case anyone is interested in these five languages together in a single TBX.

    1. Probably good to ask someone from ApSIC or maybe another user of Xbench will see this post and respond. I tested a couple just now to check and all seems well so I don’t think the TBX files are the problem. I’m using Xbench 3.0.0 build 1243 if this helps?

  8. Hi Paul,
    I am trying to import your IATE FIGS tbx file into a Multiterm 2014 terminology base, and I run into problems.
    I get the following error message when I try to convert it using SDL MultiTerm Convert.
    The conversion option could not be initialised properly.
    Exception of type ‘System.OutofMemoryException’ was thrown.
    Do you have an idea how I could overcome this difficulty?

      1. Hi Paul, I had this same issue with the complete Iate file, then i’ve tried to process the 2 language pair files you have posted (EN-PT and FR-PT) and the problem now was corrupted characters… used Editpad lite to find and fix corrupted charact but then the error turned into “not valid tbx format” in all my cat tools. Used Glossary converter, Studio 2014, Multiterm Desktop 2014, Multiterm convert, Across, MemoQ, Xbench. We are in mid september and Iate did not keep up to their dates on publishing 2 pair files. Any help would be much appreciated.
        P.S. By the way, I’ve been able to import TBX files from Microsoft language resources from EN-PT and EN-BRpt without any issues.

        1. Hi Paulo, the problem you will face is that even the single pair will probably be too large for most tools. So you still need to break it up into bitesized chunks unless you are using MultiTerm Server. My recommendation to you would be to take advantage of the great work Henk Sanderson has done that I described here: IATE, the last word… maybe!

          1. Seconded.
            Henk’s files are better than the pretty decent ones I managed to compile from the IATE material.
            Henk is also very helpful. He responded instantly to a minor issue I had with the EN-EL tmx file.

  9. Paul,
    I downloaded the PT – EN file, but mt convert keeps giving me this error:
    the conversion option could not be initiasised properly.
    “hexadecimal value 0xFFFF, is an invalid character. Line 18277776, position 39.
    i’m also getting an error trying to convert the IATE file (after extracting the language pair) about missing .dtd file.
    Any help is appreciated, and keep up the good work!

    1. Hi Robin, I went to that line and location in the TBX and found this:
      <term>protocolo de alteraç￿￿ã￿o</term>
      So if you delete or just correct this you’ll probably be ok… I didn’t test it. As I have pointed out in this article the TBX files leave a lot to be desired and you would be better off taking a cleaned up version from Henk. In fact funnily enough the exact problem you found is in that article!!

  10. Thanks for the reply, Paul. I did a workaround using the iate extractor and created around 15 sdltb files by the time I was finished. I plugged them into a project file and I’m testing it right now. Do you think having many separate small tb files in one project has a downside?
    Thanks again for your help!

  11. Hello Paul,
    What a great help to translators! Would it be possible to have English>Russian pair?
    Thank you!
    Anna

    1. Hi Anna,
      I added a note into the article as follows because you have a much better option now:
      Update Date: 150222
      Since writing this article the team responsible for IATE have created a tool you can download that allows you to extract single pairs in any combination. It’s called IATExtract. So I won’t be extracting anymore from this date. The pairs below were requested in the comments so I have done these, but I won’t be doing any more. Notwithstanding this the latest download from IATE will contain more terms anyway so you are best to take your information from their website.
      So I think the best approach is to do this yourself using the links in the article.
      Regards
      Paul

      1. Thank you, Paul.
        I just found out that Russian is NOT one of 24 official EU languages. So, I can’t extract En-Ru pair. 🙁
        Best regards!

  12. Hello, I tried to download the English – French file. It downloaded automatically to my Dropbox and after that, I could not open it.
    I am very bad with computers, can you send me the Multiterm database? Or send me something similar that I can convert to Multiterm?
    Thank you for your great work.

    1. Hi Suzanne, you downloaded a zip. Did you unzip it first? Inside it contains a TBX and you can convert that to MultiTerm. However, I’d recommend you review this article if you really want to go through with this: https://multifarious.filkin.com/2014/07/13/what-a-whopper/. If you’re very bad with computers you are taking on more than a bite sized chunk!
      Perhaps you might find this new plugin useful:
      https://appstore.sdl.com/language/app/iate-terminology/950/
      This was released last week and it allows you to use the IATE terminology in Studio without doing any conversions at all… it’s just plug and play for all the supported languages and all the content curated by the IATE Team.

    1. Hi Carlos,
      You can do this yourself really easily these days. Since I wrote this article… a very long time ago… IATE provided an easier way to do this and you can simply choose the languages you want and click download. Go here – https://iate.europa.eu/download-iate and create a free account if needed, then you’ll see how simple it is.

Leave a Reply to paulfilkinCancel reply