Tag Archives: SDL appstore

I’m back on the topic of PDF support!  I have written about this a few times in the past with “I thought Studio could handle a PDF?” and “Handling PDFs… is there a best way?“, and this could give people the impression I’m a fan of translating PDF files.  But I’m not!  If I was asked to handle PDF files for translation I’d do everything I could to get hold of the original source file that was used to create the PDF because this is always going to be a better solution.  But the reality of life for many translators is that getting the original source file is not always an option.  I was fortunate enough to be able to attend the FIT Conference in Brisbane a few weeks ago and I was surprised at how many freelance translators and agencies I met dealt with large volumes of PDF files from all over the world, often coming from hospitals where the content was a mixture of typed and handwritten material, and almost always on a 24-hr turnaround.  The process of dealing with these files is really tricky and normally involves using Optical Character Recognition (OCR) software such as Abbyy Finereader to get the content into Microsoft Word and then a tidy up exercise in Word.  All of this takes so long it’s sometimes easier to just recreate the files in Word and translate them as you go!  Translate in Word…sacrilege to my ears!  But this is reality and looking at some of the examples of files I was given there are times when I think I’d even recommend working that way!

But there were files I saw that looked as though they should be possible to handle in a proper translation environment.  We tested a few and the results were more often than not pretty poor.  So even though we could open them up it was still better to take the DOCX that Studio creates when you open a PDF and then tidy up the Word file for translation.  At least this is some progress… now we’re able to handle the content in a translation environment and not have to recreate the entire file.  But it would be even better if the OCR software could make a better job of it.  And this is where I want to get to… better OCR!

SDL Trados Studio 2017 continued to provide the same PDF filetype that uses technology from SolidDocuments in earlier versions of Studio, and this does a fairly good job of extracting the translatable text with OCR for many files.  But it could use improvement.  SDL Trados Studio 2017 SR1 has introduced another option for OCR using a software called ReadIris that is part of the Canon Group.

Out of the box, according to the documentation, Iris supports 134 languages for OCR which is pretty impressive.  They don’t quite match the languages supported by Studio however, but a rough count and compare suggests there are some 95 shared languages… and they even support Haitian Creole which Studio does not as we know 😉  Still impressive however and it easily beats the 14 languages supported by Solid Documents in Studio 2017 prior to the introduction of Iris.  Additionally this opens the possibilities for handling scanned PDF files in Asian languages, Arabic, Hebrew and many others that were previously difficult, if not impossible, to handle.

Using the new options

So let’s take a look at where you can find this new option and how you use it.  First of all you need to go to your options:

File -> Options -> File Types -> PDF

Then navigate down to “Converter“.  Down near the bottom you’ll see the “Recognize PDF text” group as shown below and the option to activate this new feature is at the end:

Check the box and you’ll be presented with this screen:

It’s an App!  You may be wondering why you need to do this and why it was not just integrated into Studio?  The reason is simple… not everyone will want this option and the underlying software requires a 150Mb download which would have increased the size of the Studio installer to over half a gigabyte.  So it was made optional.  If you want it you click on the “Visit AppStore” link in the message above, or the one I just wrote, and download and install the plugin just as you would any plugin from the appstore.  If you don’t do this then Studio won’t be using the software.  There are no warnings, and the option remains checked, but you won’t be using it.  So when I open the Chinese PDF I just created by copying some text as an image and saving it to a PDF all I’ll get is this:

None of the text is extracted for translation at all.  But if I install the plugin and try again I see this:

Now we’re cooking!  Would be useful to get rid of the tags though as these seem to be aesthetic only, just colours and font changes where the OCR picked up a few minor differences and then introduced tags to control them.  As these are formatting tags only I could just ignore then, or press Ctrl+Shift+H to hide them in the editor.  But if I want to remove them altogether I can do this with another app. called Cleanup Tasks that I have written about before.  These three options do the job for this file:

Now I have this and can translate without any tags at all:

Nice… and if all of that sounds complicated it wasn’t really.  I created a short…ish video below putting this all together so you have an idea of how it works.

Approx. length : 16.26 mins

After all of that I don’t want you to get the impression I’m a converted believer in the possibilities of PDF translation… I’m not.  We’re unlikely to see the back of PDFs for translation any time soon, so I am happy to see the technology to support this workflow improving all the time.  I also don’t want to give the impression this is going to help with every PDF you ever see.  It won’t!  The problems of PDF quality don’t go away because of the way they been created in the first place, so source is always best.  You’re also quite likely to find PDFs you can’t handle even with Iris, and you might even find that the more basic option without Iris does a better job of your PDF conversion.  So it’s horses for courses… you have the tools and can apply the most appropriate one for your job.

If you have any questions after reading this post or watching the video then I’d recommend you visit the SDL Community and ask in there… or just post into the comments below.

SDL Trados Studio is up to Studio 2017 which is the fifth major version since Studio 2009 was first released some eight years ago now.  During these eight years I think it’s fair to say we have seen less and less requirement for the old Trados features, yet despite that we do see some interesting tools making an appearance in the SDL AppStore that mirror some of the old functionality.  In fact some of these apps are quite recent and seem to have been driven by requests from users who miss some of the things you could do in Trados but still cannot do in the out of the box Studio solution.  So I thought it might be fun to take a look at some of these apps and if you are one of those translators who remembers all the good things Trados could do… and can I say forgotten the things it could not… then perhaps you’ll find these apps useful!

Read More

There’s been a few ups and downs getting SDL Analyse off the ground, but it’s finally there and it’s worth it!  If you have no idea what I’m referring to then perhaps review this article first for a little history.  This app was actually released as the 200th app on the SDL AppStore in February this year, but in addition to the applause it received for its functionality there has been less positive aspects for some users that needed to be addressed.

But first, what does it do?  Quite simply it allows you to get an analysis of your files without even having to start Studio, or without having to create a Project in Studio.  If you’re a regular reader of this blog you may recall I wrote an article in 2014, and in 2011 before that, on how to do an analysis in Studio by using a dummy project.  In all that time there has been only one app on the appstore that supports the analysis of files without having to use Studio and this is goAnalyze from Kaleidoscope.  In fact goAnalyze can do a lot more than SDL Analyse but there is one significant difference between these apps that makes this one pretty interesting… you don’t require the Professional version of Studio to use it.  But it’s also this difference that has been the cause of the ups and downs for some users since SDL Analyse was released.  In order to resolve the problem of needing to use the Project Automation API, which needs the Professional version of Studio, the app needed to use a windows service that was hooked into Studio.  For the technically minded we had a few things to resolve:

Read More

001In 2013 I wrote an article called “Solving the Post-Edit Puzzle” which was all about finding a way to measure, and pay for post-editing translations in a consistent way.  Then in 2015 I wrote another called “Qualitivity… measuring quality and productivity” that was all about everything Post-Edit Compare could do but then added many layers of detail and complexity through Qualitivity to support Quality Measurement including a TAUS DQF integration, and incredible metrics that are still not matched by any tool today that I am aware of, and are so good that they are often used to support academic research into translating and post-editing behaviour.

This is all great stuff and I have always been a huge fan of the work that Patrick Hartnett has done on all of the applications he developed over the years.  You don’t often find experienced developers with indepth domain knowledge like this and his apps have always been really relevant to solving problems in the localisation workplace.  So I wanted to bring up and discuss the app that was actually the predecessor to these great apps I just mentioned.  It was also an app that was no longer supported once it’s first successor, Post-Edit Compare, was released.  The app was released around 2011 I think and was called SDLXLIFF Compare.

Read More

001It’s been a while since I wrote anything about the SDLXLIFF Toolkit.. in fact I haven’t done since it was first released with the 2014 version of Studio.  Now that we have added a few new things such as SDLPLUGINS so that apps are better integrated and can be more easily distributed with Studio we have launched a new version of the toolkit for Studio 2017.  What’s new?  To be honest not a lot, but there are a couple of things that I think warrant this visit.

First of all, the app is now a plugin and this means it loads faster, is always available and there are a few tricks to being able to get the most from this.  Secondly, there are a few fixes to the search & replace features that make it possible to complete tasks that Studio will fail with and to do this the API team completely rebuilt the regex engine.  So whilst you won’t see too many changes, there are a few under the hood.

The best way to illustrate this is to show you so I have created a short video below where I have tried to explain how best to use the toolkit now it’s a plugin and not a standalone application, and I used the problems described below to demonstrate how it works.  If you want to know what else it can do I have reproduced part of the original guide below the video as that seems to have been lost over the years.  This might be helpful for a few of the more obscure features you may not have realised were possible.

Read More

001Probably you’re all far more educated than me and when you read COTI you probably didn’t think “chuckling on the inside” did you?  I googled it and looked at four acronym websites, none of which found the correct definition… but two of them returned the title of this article so it must be right!!  Oh how I wish it was… just to bring a little levity to the ever so serious tasks of interoperability.  But no, it stands for Common Translation Interface (COTI).  This is a project pioneered by DERCOM which is the “Association Of German Manufacturers Of Authoring And Content Management Systems”… so nothing to be amused about there!

The subject of interoperability is in fact a serious one and many tools like to claim they are more interoperable than others as a unique selling point for anyone prepared to listen.  It’s also a big topic and whilst I am always going to be guilty of a little bias I do believe there isn’t a tool as interoperable as the SDL language Platform because it’s been built with support for APIs in mind.  This of course means it’s possible for developers outside of SDL to hook their products into the SDL Language Platform without even having to speak to SDL.  Now that’s interoperability!  It’s also why I probably hadn’t heard of COTI until the development was complete and I was asked to sign a plugin for SDL Trados Studio by Kaleidosope… outside of SDL I think they are the Kings of integration between other systems and the SDL language portfolio.
Read More

001“More power to the elbow”… this is all about getting more from the resources you have already got, and in this case I’m talking about your Translation Memories.  In particular I’m talking about enabling them for upLIFT.  upLIFT, in case you have not heard about this yet despite all the marketing activity and forum discussions since August this year, is a technology that is being used in SDL Trados Studio 2017 to enable some pretty neat things.  I’m not going to devote this article to what upLIFT is all about as Emma Goldsmith has written a really useful article today that does a far better job than I could have done.  You can find Emma’s article here, called “SDL Trados studio 2017 : fragment recall and repair“.  But a quick summary to get us started is that upLIFT enables things like this:

  • fragment matching
    • whole Translation Units
    • partial Translation Units
  • fuzzy match repair
    • from fragment matching
    • from your termbase
    • from Machine Translation

Read More

%d bloggers like this: