Bilingual Excel… and stuff!

Copyright Rudall30 | Dreamstime.comI’ve written about how to handle bilingual excel files, csv files and tab delimited files in the past.  In fact one of the most popular articles I have ever written was this one “Creating a TM from a Termbase, or Glossary, in SDL Trados Studio” in July 2012, over three years ago.  Despite writing it I’m still struggling a little with why this would be useful other than if you have been given a glossary to translate or proofread perhaps… but nonetheless it doesn’t really matter what I think because clearly it was useful!

So, why am I bringing this up three years later?  Well, the recent launch of Studio 2015 introduced a new filetype that seems worthy of some discussion.  It’s a Bilingual Excel filetype that allows you to handle excel files with bilingual content in a similar fashion to the way it used to be possible in the previous article.  There are some interesting differences though, and notably the first would be that you won’t lose any formatting in the excel file which is something that happened if you had to handle files like these as CSV or Tab Delimited Text.  That in itself mught be interesting for some users because this was the first thing I’d hear when suggesting the CSV filetype as a solution for handling files of this nature.  Most of the time I don’t think this is really an issue but for those occasions where it is this is a good point.

But this new filetype is more than just an Excel version of the old one.  So let’s just take a look at the options using this excel layout as an example:

06

So I have five columns of text, with the source and target in columns B and C, the name of the character playing the part (it’s a film script) in column A, a maximum character length for the text in column D and some notes in column E.  The text is also partially translated.

Columns

In addition to the usual source and target column I have a couple of other options.

02

I can set a maximum number of characters that are allowed in the target.  This is quite useful because sometimes, particularly with gaming scripts where the text box is a limited size, it’s important for the translator to know how many characters are allowed.  So here, if you use this option the standard QA Checker in Studio can use this and flag something like this if you go over the limit:

07

You can also check the allowable length at any time by clicking on the document structure column on the right hand side.  If you don’t have the context information populated (see below) then the righthand column in Studio will say LN (for Length Restriction ;-)) but if you do, as I do in this example, then it may use a different code with a plus symbol indicating there is more than one code in there.  So in my example it says ACT+:

08

The checkbox “Preserve Target Style” allows you to apply the style of the target cell in Excel to the target translation rather than overwrite with the style of the source cell.  So just giving you another option for handling formatting in the Excel file.

Exclude

In here we have another new option compared to the CSV filetype, and that’s “Translation column content“.  If you check this then any of the cells that have been translated in the Excel file already will be ignored.  So if you do check this then the options in the next part of the settings will not apply:

03

Existing Translations

These options were already available in the CSV filetype and are quite useful because they can save you having to deal with existing translations at all, and more importantly using the locking option allows you to exclude these segments from the analysis:

04

Context and Comments

We had Comments availability in the previous CSV filetype too, but there the comments were added to the document structure window.  Useful but hard to get at as you needed to click on the document structure column to see the available information and you only saw one cell at a time.

05

In this filetype the comments can be displayed as Studio comments like this which allows you to see more at a time and to read them without having to click on anything at all.  In fact if you have a lot of comments and they are needed to provide important translation context then moving them to a window on the side can be very useful and easy to use.  If you don’t know how to move windows take a look at this article:

09

The Context Information column is useful because it provides a good way ot tracking string IDs, or any other information which might be useful to know as you work.  In this example I used the name of the characters in the film.  These are in column A of my spreadsheet and they are displayed in the Document Structure Column as noted above in the section on Columns.

Where is it?

Perhaps one little thing I forgot to mention and that’s where it is.  This is quite important to note because the default settings for Studio are like this with all three types of Excel filetype checked:

10

Studio uses the filetypes on a first come first served basis depending on information in the filetype settings.  So if you want to use the Bilingual Excel filetype you need to either disable the Microsoft Excel 2007-2013 filetype or just move the Bilingual Excel filetype so it sits above the others in the list.  I guess if you do a lot of these and also work with Excel then you could create project templates that allow you to simply select the appropriate one to match the filetype you’re working with and this would save you having to mess around with which one is active and taking priority in the list.

So all in all quite a useful filetype.  There is no preview with this, but in many ways it doesn’t feel as though it needs one as the layout of Studio is very similar to the sort of files you are likely to be handling with this filetype and hopefully there are enough options to include the contextual information from the file to help anyway.  But before I end I thought it might be interesting to share a little translation conundrum that was posted on ProZ a few weeks ago where Excel and this new filetype could be used to solve it; this is the stuff!

Stuff…

Excel is an interesting format for many things, so I thought I’d share an little problem that appeared on ProZ a few weeks ago.  There are many ways to handle this but I thought it might be fun to share a way to tackle it using the Bilingual Excel filetype… and I’m not trying to start a war over whose tools handle it the best… this is just some excel stuff I thought would be fun to share.  Since the original idea and reading what some of the other solutions are I’d probably handle this using regex in EditPadPro to get the text out anyway.  But I like this because it’s just Excel and Studio.

The problem was how to create a TMX translation memory from an SGML file that was formatted something like this (you can see the full text in the ProZ post and the video at the end):

<doc id='N0001'>
 <head>
  <title>What is a Fenqing ?</title>
  <corpus url='http://code.google.com/p/evbcorpus/'>EVBCorpus</corpus>
  <author attributes='stuff in here'>name</author>
  <citation>"Building a Bilingual Corpus for MT"</citation>
 </head>
 <text>
 <spair id='1'>
  <s id='en1'>What is a Fenqing ?</s>
  <s id='vn1'>Fenqing là gì ?</s>
 </spair>
 <spair id='2'>
  <s id='en2'>Fenqing is a Chinese word which literally ...</s>
  <s id='vn2'>Fenqing là một từ tiếng Hoa mà nghĩa đen...</s>
 </spair>
 </text>
</doc>

So here’s one way to do it!

Create an XML filetype for this SGML… pretty simple using just two rules (if you don’t know how to do it this article might help but you can also watch the video as I explain it in there):

//s (always translatable)
//* (Don’t translate)

So these rules extract the translatable content in the s element and nothing else.  There is no distinction between English or Vietnamese at this stage as I have ignored the language attributes altogether.  Next I just open the SGML file in Studio and save it.  Now I have an SDLXLIFF with source/target repeated in the source column only throughout the file.

Now I can use the SDLXLIFF Converter for MSOffice (installed with Studio since 2011) and convert the SDLXLIFF to Excel.  If you didn’t know this was possible take a look at this article.

The result of this operation is that I now have an excel file with an ID column, a source column (populated), a target column (empty) and an empty notes column.

Now comes the fun excel part.  I can use this formula in the target column:

=IF(ISEVEN(A3),B3,””)

The ISEVEN function in excel is a neat formula that lets you check whether numbers are odd or even.  You probably see where I’m going with this now.

This formula will look at the ID column (column A) and check if’s an even number or not.  If it is then it will copy the contents into the active cell.  If it’s an odd number it puts nothing at all.  Once I’ve done this I can copy the formula down the spreadsheet, copy all of column C (target column) and paste it as plain text to remove the formulae.

Now I have a spreadheet with every other row containing source on the left and target on the right. So I can filter on the target column and sort it in alphabetical order. Now I just delete all the rows with nothing in the target.

This leaves me with a simple spreadsheet I can drag into the Glossary Converter and convert to TMX which resolves the question asked by the user.  However, seeing as I am more likely to want to use this Translation Memory in Studio I won’t do that.  Instead, I just open the excel file with the Bilingual Filetype and then update it straight into a Studio Translation Memory.  Piece of cake!!

If you want to see this in realtime, and the video is only 10 minutes long then you can see this in the video below… I hope it’s useful and perhaps gives you a few ideas of how excel can be useful for data manipulation especially since we have the new Bilingual Excel filetype:

Video is 10 minutes 8 seconds

36 comments
  1. alimayfield2015 said:

    Re your intro regarding creating TMs from glossaries, one very useful application is the translation of display lists. Say you have a control unit with a text display that has 100s of terms and phrases, often laboriously abbreviated, to instruct and inform the user. Then you need to translate the control unit manual or that of an appliance it can be used with. Often, a whole series of segments will need to be translated each containing just one display text. A TM that contains all the displays already makes the job of accurate translation easy and quick. You can’t pretranslate with a glossary!

    Liked by 1 person

      • alimayfield2015 said:

        Great article… thank you too!!

        Like

  2. Hi Really like your articles!

    I am struggling with bilingual excel files. I have a client that supplies xlsm files, and they won’t import. When save the xlsm file as a xlsx I get an error mess

    see attached pic

    Any idea how to solve this?

    Really appreciate advise

    best Joost

    Like

    • Hi Joost, you didn’t attach the error message. It’s tricky in here as this isn’t really designed for that kind of support, so why not post your question in the SDL Community at http://xl8.one This is a better place for support as it’s also easier to answer questions in detail, and you reach a wider audience.

      Like

  3. Evzen said:

    Well, the bilingual Excel filetype is a good start, but still too short-sighted and half-baked, so to say…

    Recently I had to deal with client’s Excels with a sort of products catalog – item ID in column A, one-line item description in B and full item description in C… followed by a pairs of columns (corresponding to B and C) for a number of languages.
    And even with the newest Studio I’m screwed… because it allows to define ONLY SINGLE COLUMNS PAIR… :-\
    (I don’t need to mention that the content was also partially translated, do I?)

    I suppose it wouldn’t take too much to think the idea a bit more thoroughly and allow users to define multiple ‘pairs’ of source-target columns, to make the filetype versatile and way more usable.

    Like

    • Why not multiple ‘triplets’ or ‘quadruplets’… where do you stop? This could get very complex and never be fully baked. Your problem is basic to resolve using the normal Excel filetype but it does require a little prep work which I’m sure you did without any problems.
      I do agree that this is just a start though, and I can also think of some useful enhancements to this filetype, but I think your usecase here is probably less common than the one it solves already… I think! I guess you always have the option to create a fully baked version yourself using the API although I really do think the effort in preparing the files for translation might outweigh the effort in creating a filetype that offers endless flexibility.
      I would like to see one though 🙂

      Like

      • Evzen said:

        It actually does not need to “get very complex” at all… Just look at the MemoQ’s Excel filter abilities to define what to include and what to exclude… (no, it can’t handle these bilingual Excels, though… I’m mentioning it just as example of great flexibility of defining the (not)translatable content).

        And where do you stop? Nowhere, actually… why would you?
        The Filetype currently defines SINGLE ‘set of properties’ for processing – source column, target column, length limit column, etc.
        So it should be very easily extensible to MULTIPLE such ‘sets of properties’… just like e.g. the XML/HTML parser configuration – one line in the configuration for one ‘set fo properties’ and I can add as many as I want.
        As simple as that. No triplets, no quadruplets… just multiple pairs, that’s it… it’s not a rocket science.

        Regarding your suggestion to “create fully baked version myself using the API” – sorry, I don’t buy that… for two reasons:
        1) majority of Trados Studio users – including me – are USERS, not developers… they don’t have Visual Studios, nor they do have necessary programming skills
        2) some of these people paid lot of money for the Professional license and expect to get appropriately working product for their money, not some DIY platform (the LSP company I currently work for has recently spent AWFUL lot of money for upgrading their Pro licenses and doesn’t really look for additional expenses for development of something what one would expect to be already included).

        Addition to point 1: Since SDL took away the possibility to use COM for automation etc., dozens of us scripters (i.e. non-developers) are screwed and CAN’T actually do anything because all this .NET stuff is all Greek to us.
        Scripting Trados 2007 in VBScript or JScript was easy and even a non-developer could do that.
        Well, not anymore with this .NET weirdness 😦
        (And no, I’m not going to start learning PowerShell just because SDL decided to stop supporting COM… it’s just as Greek as the other .NET stuff. Besides, I’m the customer, so it should be me who says what he wants, not the other way around :-\)

        (And before someone says “COM is obsolete, limited, blablabla” – if Microsoft can do it with Office, it apparently still CAN be done, if there is a will.)

        Like

      • ok – this is my personal blog and not an SDL one so I can be a little more free with my comments. You are being unreasonable. Just because you have a file that cannot be handled the way you want by any translation software out of the box does not mean it should be in the software. I see people doing all sorts of crazy things in different applications that they then expect a translator or a translation company to handle. Just because they do this, often because they are not schooled in the best way to prepare their data in the first place, does not mean every CAT tool has to include it in their application. I too would love to see filetypes that could handle absolutley anything that was thrown at them, but if you were a developer I think you’d know it’s not that simple. If it was it would be in the software. Nobody creates software deliberately crippled when it would be simple to deliver something better. You have to weigh up the costs against the return and that’s where you draw the line. Spending a fortune to fix a problem that doesn’t happen very much doesn’t feel a sound proposition for anyone… although I have no doubt it would make you and me very happy!
        On the COM versus powershell and .NET No comment from one non-developer to another!

        Like

      • Evzen said:

        I think you got me wrong. I’m not asking for the functionality because of that particular file, I’m asking for the functionality because of the ‘unfinishedness’ of the feature. (Note that I’m not asking e.g. Kilgray to provide that functionality because I see that MemoQ doesn’t support such miltilingual Excels at all… so there is no point doing that.)
        My point was that ‘short-sightedness’ of the implementation of an actually great idea… it’s like if it was implemented with hardcoded columns to A and B, or something similarly stupid ;-).

        When I do my automation scripts, I always make them more generic than is the actual requirement… because the moment I see the requirements, I can already see that sooner or later someone would want e.g. use it on a different project with different folder structure, or will want to use ti with another tool with different way of automating it, etc.
        Adaptation is then just a matter of changing the configuration, not changing the scripts… or sometimes not even that, it simply works because it’s smart enough to detect this or that automatically…

        So it just drives me crazy when I see such unfinished thing, especially when the ‘right way’ is so obvious and so easy to see.
        It would be okay if the filetype is created this way to solve a particular project-specific problem (we all do quick’n’dirty solutions), but for generic use I can’t really accept that.

        In other words, as my dad says – if you start to do something, do it PROPERLY and FINISH it… otherwise better don’t start it at all.

        Like

      • How can I disagree with your Dad… he’s right of course!

        Like

    • In fact you might find the Excelling Studio application on the OpenExchange could resolve your problems now. This supports virtually limitless configurations to make your task easier.

      Like

      • Evzen said:

        Ummm, paying additional money for something what should be built-in? No, sorry, not an option for me.

        Like

    • walkqisky said:

      Evzen, I also encounter your case at times, and the solution is always memoQ, not Studio, as follows,
      Create a multilingual project first, and then import your excel files with options, change the file filter from Excel to Multilingual delimited text, play with the filter settings, and you get what you want.
      Next, you could either translate all the languages in memoQ, or export bilingual mqxliff files to Studio which could handle through a exchange app, link below,
      http://www.translationzone.com/openexchange/app/filetypedefinitionformemoqxliff-457.html#44208
      Cheers!

      Like

      • Indeed, the multilingual filter in memoQ is very nice. But worth mentioning a couple of things on this to keep it an even discussion. You need the Project Manager Edition of memoQ if you want to make proper use of the multilingual capability, and you have to accept that you’ll lose any formatting in the file.
        Also, the memoQ filetype is not an openexchange app in Studio 2015 which this article is about. It’s built into the product… but for Studio 2014 the app on the openexchange is good, and for previous versions just add *.mqxliff to the wildcard dialogue and they can be handled (but with less control over statuses etc.) in 2009 and 2011 too.

        Like

  4. alimayfield2015 said:

    Hi Evzen, hi Paul!

    Evzen, the Excelling Studio application on the OpenExchange is free.

    If you’ve never experienced the OpenExchange site and what it has to offer, take a look. It is brilliant! Full of useful apps to enhance Studio’s performance, many of which are free. The idea is that you choose and install only apps that are useful to you when you need them. This is better than than having all of them incorporated as default into the main program, which is already huge.

    Liked by 1 person

    • Of course… there are many ways. Just wanted to share a little excel workaround as the article was about Excel!

      Like

  5. Yes, and it is a brilliant little exercise involving a number of very useful skills and Studio insider knowledge. I think it would be a nice candidate article for the proverbial “there are more ways than one to skin a cat” as there are so many possible solutions to this puzzle.

    Like

  6. maria said:

    Good day,

    This isn´t the right place to ask, but I don´t know what to do any more. I saw a video where Mr Filkin used Microsoft Translator for automated translation. I downloaded it but it doesn´t appear as a choice in the panel of memories and automated translation, and neither can I find it in the Plug-Ins. What am I doing wrong?

    Regards,

    Maria Perry

    Like

    • Good day Maria, you’re right. The best place to ask these things is here : https://community.sdl.com/appsupport which is a part of the SDL Community dedicated to OpenExchange applications.
      I can’t answer your question without a bit more information anyway, and I think it would be more appropriate to have the discussion in the community because it also means the developer can see it too! So perhaps, if you don’t mind you could ask your question in there and when you do provide the following information:
      – the version of Studio you are using
      – the approx. date you downloaded the application (I know it changed recently)
      – confirm whether you looked in File -> Options or Project Settings
      – confirm which plugin folders you looked in
      Then you’ll get a lot more helpful response.
      Regards
      Paul

      Like

      • maria said:

        Thank you!

        Like

  7. Kirsty said:

    Hi Paul,

    I am trying to create a TM from a bilingual Excel file, but for a different reason to that mentioned here. Excel is the format that has been used for one of our software product translations. We’ve had internal staff working on these Excel files by themselves, and not using Trados Studio or any other CAT tool. So I’m trying to capture 35k of software phrases from this bilingual file.

    Perhaps a different use case to that mentioned so far ..
    Cheers!

    Like

    • A good usecase… thanks for sharing that Kirsty. Just in case you didn’t know it there is a great solution for you that is probably much easier, if you’re happy with the quality of these translations already? You can use the Glossary Converter from the openexchange to convert your excel file to a TMX with a drag and drop… simple as that.

      Like

      • Kirsty said:

        Thanks, Paul! I’ll give the Glossary Converter a try, as I’m getting crummy segmentation with the alignment tool, and it’s taking ages to even align 10k lines. Seems the Glossary Converter might be better and faster for what I’m trying to do. Cheers!

        Like

  8. Dana said:

    Hi Paul. First of all, thank you for you detailed explanations. I now have your blog as a constant reference to work with Studio.

    Now, I don’t know if I’m doing something wrong here, but I can’t seem to figure out how to properly configure the Bilingual Excel and/or Excel 2007-2013 filetypes. I work with Excel files that contain several columns (similar to the one you show here as an example of a game script), and I need to be able to a) choose the source and target columns for translation (which I do with the Bilingual Excel filetype) b) have the rest of the columns show in Studio as context information (can also be done with the bilingual excel filetype) c) convert all embedded xml or html content to tags or markers in order to clearly see the actual text to be translated in the Editor View (this can be done via the Excel 2007-2013 filetype as you’ve explained in your blog entry “Handling taggy Excel files in Studio”). My problem is that I have to choose either one filetype or the other, as Studio won’t let me use both and have them active with some sort of priority configuration. Am I missing an obvious solution to do what I need? Please, say yes 🙂

    Thank for reading!

    Like

    • Hi Dana, good question and you’re not missing anything. You could create a couple of project templates, one that always uses the Bilingual Excel and one that doesn’t. This would resolve the problem of having to switch the order or deselect one and select the other. But… it will not resolve the problem of having to process the file twice in order to handle the tag in a smart ways. Basically we need the embedded content processor in the bilingual excel filetype.

      Like

  9. Oh, darn! Well, I guess we’ll have to keep waiting.
    Thank you, Paul!

    Like

  10. Yes, embedded content processing is sorely missing from the bilingual excel file type. I created a regex in the Regex Autosuggest plugin for inserting “plain text” tags as the next best thing.

    Like

  11. Kolja Fuchs said:

    Is it just me, or can you not just open a file in SDL Studio 2015? I used your csv method to update TMs from translated Excel files (saved in csv format) in Studio 2011, but now can’t figure out how to do it in 2015. What am I missing?

    Like

    • You do it exactly the same way. maybe it’s the naming throwing you off? Used to be called “Open Document” now it’s called “Translate single document” or something like that. If you ask your question in the SDL Community you’ll have more space and resources to explain and receive a sensible response.

      Like

  12. latrotteuse said:

    I’m using Studio 2017 and bilingual excel files, I understand when there is only one column per language, but Is it possible to translate an excel file containing multiple columns in the source language and in target?

    Like

    • Yes. You specify the source and target languages in the file.

      Like

      • latrotteuse said:

        Thank you for replying, but I tried to add three column letters for the source language (three was the max) and three for target, it still only gave me the first one in the sdlxliff file. Can you tell me exactly how to enter more than one column in the source language and in the target language?

        Like

      • Hi.. many apologies. I think I misunderstood your question. I thought you meant multiple languages in the file but only select a bilingual pair, still one column per language. I think your best option is to create several files for translation from the first one and then merge the contents back into one at the end; or just work with the same file one column at a time. So translate the first one, save target and then run it for translation again on the second column.

        Like

      • latrotteuse said:

        Thanks Paul, I was afraid of that, just had to make sure.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: