Recording Translation Memory metadata

001Back in July 2013 I wrote an article called “Fields and Attributes in Studio” which was all about adding different types of metadata to your Translation Units every time you confirmed a segment to make it easier, or more complex depending on what you’ve done, to manage your Translation Memories.  If you’re not sure what I mean by this take a look at the article as I won’t repeat a lot of that here… at least I’ll try not to!  This capability in Studio is probably quite familiar to most users of the old SDL Trados 2007 and earlier, and was even essential to some extent because you could only use a single Translation Memory at a time.

The same features were also available in SDLX, which was the lesser known but in many ways more functional product, that was provided by SDL prior to bringing Trados into their portfolio.  In fact over the years it’s proven much harder to replace SDLX in some environments, but today I think it’s safe to say Studio is working in ways to satisfy even the most hardened SDLX and Trados user… yes, I know you’ll still find die hards out there but I do believe the capabilities of Studio far outweigh those of these legacy applications.  Not all of the features are out of the box, but a big advantage Studio has is the OpenExchange which provides a platform for developers to add their own features into Studio and this often fills a few gaps in addition to introducing many innovative and exciting capabilities.

A good example of this is related to TM metadata.  So in the screenshot below you see one segment in the SDLX editor, with a 100% match from the TM underneath it, and then metadata that is recorded by the system every time you save a segment to the Translation Memory (this is also configurable, I just put these views together for the screenshot):

002

If I handle the very same file in Studio then the system metadata is as follows:

003

Very similar aren’t they, but there are a few noticeable differences:

  • Context : SDLX recognises the filetype as TEXT, as the source was a simple txt file, and it stamps the TU with TEXT.  Not terribly useful in this context, but I believe it can be useful when handling HTML/XML files where the context is taken automatically from the elements.
  • TM : SDLX provides the full path to the TM used as opposed to the name of the TM only which is used by Studio.  I can see how this could be very useful, especially when trying to see why results are missing from your TMs as you can tell immediately if you have duplicate names in different locations.
  • Source File : SDLX stamps the TU with the full path and filename of the file you are translating.  This could be very useful if you wanted to remove some of the TUs that were stored in your TM when just one or two of the translated files were wrong for example, or even if you just wanted to be able to analyse exactly where your translations had been used in the past.

Of these it’s the last one “Source File” that I have seen people asking for since adopting Studio.  Now if you read “Fields and Attributes in Studio” you’d know that you could easily create a Field for “Source File” and you could manually add the path to the file when you translate it like this:

004

Now you can see the full path and filename stamped onto the TU as a custom field whenever you confirm the segment.  This is great if you only have a few files and if you always remember to change the value of the field whenever you open a different file for translation.  But if you have a project with hundreds of small files that you open in a virtual merge for example, then the practical aspects of having to copy the path of the source file to your clipboard, and then manually go into the Project Settings and paste to update the “Update” value every time you translate a different file start to kick in.  If it was me I’m sure I’d forget to do this within the same project never mind over a period of time working with multiple projects.  I imagine it would be even worse if you were reviewing the files.  So pretty soon my great idea of using custom fields to replace the SDLX system field approach doesn’t look so good.  Yes you can do it… but it’s not so practical for this application.

Fortunately we do have the OpenExchange and help is at hand by using the RecordSourceTU application which is a free plugin developed by the SDL Community Developers to resolve this problem.  Even more fortunately the OpenExchange developers rarely do things by halves so this is what we get:

005

You have the opportunity to have the following, one, two or/and three options automatically supplying information to every segment you send to your translation memory:

  1. Record source filename
  2. Record source file complete path
  3. Record source project name

The second option would mirror SDLX.  The other two are enhancements, and quite useful because they probably cover the most likely needs based on the files and projects you work on.  If I use these by allowing the app to create new fields for each of these items then my segment would record the data as follows:

006

But it will do this automatically for every Project and every file I work on from now on… if I’m using this TM.  Pretty cool… and I think this is cool not just because of what it’s doing, but because of what the developer was able to do without having to wait for this to be added into the product.  Even cooler is that the source code is opensource so if you liked this idea but wanted to have additional, or just different dynamic metadata stamped to your TUs without you having to manually control it then most of the work is already done for you.  That’s pretty cool!

But that’s not all of course.  If you’d read the original article and decided to create a field for these things yourself and are now happily using them with the manual update route, then this is also catered nicely for.  In my earlier example I created a field and called it “Source File” which is different to the automatically created one.  So if I wanted to use this application with my existing TM I just select that field like this:

007

Now the existing field would be automatically updated and I won’t have to add the new filename into the update field every time I move onto a new file.  Very helpful feature.

Now the one thing I didn’t mention is how do you invoke this solution in the first place?  Well, this is also fairly simple as the app is basically a new TM provider.  So first you download the RecordSourceTU application from the OpenExchange and install it.  Then restart Studio and when you add your TM to your Project you do it with this option:

008

It works the same way as AnyTM, you would select RecordSourceTU and then pick the TM you want.  This will immediately present you with the window of options so you can confirm which metadata you would like to see added to your TUs as you work.  Simple.  If you want to change the options at any time you just go back to your list of TMs and will see that the RecordSourceTU TM is also conveniently annotated so you can see which one it was, and click on Settings and then OK:

009

This will invoke the options again and you can change them as required.  Please note that once you have been using fields created or maintained with this application that the rules and restrictions associated with these features are the same as they were in the original article.

But still a great feature and timesaver if you wish to be able to store this additional useful information without even having to think about it as you work!

0 thoughts on “Recording Translation Memory metadata

  1. Hi Paul. I’m a big fan of fields in TM, which really allow to filter properly when correcting segments, importing new tmx, etc. This RecordSourceTU solution would be nice to avoid manually field stamp during updates, but I have a major concern on this: duplicates and inconsistencies. Will the recording of source file cause a duplicate in TM if same TU is used along different projects (different source files), despite the translation is the same? To avoid multiple translations for the same sentence, we overwrite previous translations, and try to let unconfirmed short segments whose translation is purely contextual, so those translations don’t get uploaded. We use a text field “Project” which allows multiple values too, which is right now manually added. I am happy to automate TM updates at most, but don’t want to add new duplicates which can affect leverage. Are two identical TUs which different source files still the same TU in TM, and will the value added through app get merged?

    And also a question: could Groupshare TMs benefit from this app too? That would be also really nice 🙂

    Thanks in advance!

    1. Hi Almudena, I did a few tests and if I use the TM Update batch task it is not behaving the same way. In fact I think it needs looking at in a couple of areas. Thanks for making me look at this, I’ll report this to the developer and ensure it works the same way as the manually added fields.
      It’s fine when working interactively, but not when using the batch tasks. Quite interesting that this has not been reported as this app has been out for a while and is heavily used by many users.
      On your GroupShare question… the source code is all available so if anyone would like to add this capability it would be interesting!
      Thank you

  2. Good morning Paul,

    I wrote to you on March 15 as shown below

    I have since discussed my issue with Vlad and Adrian, and posted it on https://community.sdl.com/, no replies so far.

    I have clients who use memoQ and Wordbee. So either already get the mqxliff files or converter them to xliff (Wordbee) to be able to work in Studio 2015.

    However, xliff files do not activate the Formatting and QuickInsert under File>Review. This results in translation errors as foreign words used in the target language must be in Italics, and other formatting requirements as to bold, quotes, etc.

    I was told by support this is done on purpose. I sent a support request to memoQ, and see that when working with their files as mqxlz, in memoQ, all the formatting options are available. However, if the file is a mqzliff, they cease to be available. So I tried to work with a mqzliff directly in Studio, as it is one of the File Types available now in Studio 2015. But the same thing happens, all formatting options are lost.

    Trados/Studio has been my tool of preference from the very beginning. I work with a team of translators who use Studio. But unfortunately not all my clients do. And my major client is now using memoQ only.

    Please do give this issue special consideration. And do not hesitate to call or email me if you need further information.

    Sincerely,

    Ines

    1. The community is the best place to ask this… certainly this has nothing to do with the topic of this article. I’ve been on leave for a couple of weeks so will get to your post shortly I think.
      In general however, the reason this is deliberate is not to make your life difficult and perhaps “deliberate” is a poor choice of words. You can transfer tags from the source to the target, but in order to introduce tagging that is not in the source, such as bold or italic then the filetype has to know how memoQ would like to see these tags. If we don’t know this then the quick inserts will not be available, and I guess this is the case here.

  3. What I miss in TMs and TMX files is a field for comments. In my preferred tool DVX, I can add comments for both source and target segments, so helpful during translation. But when the project is sent to TM, they are not stored. All those valuable information unfortunately have no room in TMs

    1. Hi Selcuk, I guess this would be possible in Studio with a custom TM provider… but as you say, it would only be useful for Studio because even if we add it to the TMX nobody else will know what to do with it, or be able to display it. WorldServer has this capability already, but I do wonder how many people use a feature like this?
      Perhaps another way in Studio would be to have a button (or keyboard shortcut) that directed you to a text entry field used for normal metadata, but allowed you to change the value quickly and type in what you wanted for that TU.

  4. Hi Paul, it’s really cool feature which I have so much missed in Studio. But now while using this app I have noticed that not every TU is stamped with source file name. And I cannot figure out by which rule it is stamped – every 3rd, then every 15th TU, then 20 TU are again not stamped and so on. One TU was stamped with source project name, where this wasn’t even checked in the settings. Do you have any idea what’s wrong here?

    1. Hi Jack, probably to good post this question in here: http://community.sdl.com/appsupport
      You can share more information including images to help and it’s easier to answer your question. You should also clarify whether this is the same project where this is happening or different projects as my immediate reaction would be you have different settings for each project.

  5. It all happens within one and the same project. But which setting may prevent the app from stamping each TU?

    1. I’ve no idea! I asked the question to rule out project settings. I think we’d need to investigate when it stamps and when it doesn’t by looking at how you’re working. This would be best done by you to see if you could establish a pattern perhaps or notice anything odd about the status of segments being confirmed, what was already in the TM etc? A little trial and error to be able to put a finger on it.

Leave a Reply