Iris Optical Character Recognition

August 17, 2017December 2, 2021 ~ Paul Filkin ~ Leave a comment

I’m back on the topic of PDF support! I have written about this a few times in the past with “I thought Studio could handle a PDF?” and “Handling PDFs… is there a best way?“, and this could give people the impression I’m a fan of translating PDF files. But I’m not! If I was asked to handle PDF files for translation I’d do everything I could to get hold of the original source file that was used to create the PDF because this is always going to be a better solution. But the reality of life for many translators is that getting the original source file is not always an option. I was fortunate enough to be able to attend the FIT Conference in Brisbane a few weeks ago and I was surprised at how many freelance translators and agencies I met dealt with large volumes of PDF files from all over the world, often coming from hospitals where the content was a mixture of typed and handwritten material, and almost always on a 24-hr turnaround. The process of dealing with these files is really tricky and normally involves using Optical Character Recognition (OCR) software such as Abbyy Finereader to get the content into Microsoft Word and then a tidy up exercise in Word. All of this takes so long it’s sometimes easier to just recreate the files in Word and translate them as you go! Translate in Word…sacrilege to my ears! But this is reality and looking at some of the examples of files I was given there are times when I think I’d even recommend working that way!

Continue reading “Iris Optical Character Recognition” →

Handling PDFs… is there a best way?

July 28, 2016December 2, 2021 ~ Paul Filkin ~ 30 Comments

001 We all know, I think, that translating a PDF should be the last resort. PDF stands for Portable Document Format and the reason they have this name is because they were intended for sharing with users on any platform irrespective of whether they owned the software used to create the original file or not. Used to share so they could be read. They were not intended to be editable, in fact the format is also used to make sure that the version you are reading can’t be edited. So how did we go from this original idea to so many translators having to find ways to translate them?

I think there are probably a couple or three reasons for this. First, the PDF might have been created using a piece of software that is not supported by the available translation tool technology and with no export/import capability. Secondly, some clients can be very cautious (that’s the best word I can find for this!) about sharing the original file, especially when it contains confidential information. So perhaps they mistakenly believe the translator will be able to handle the file without compromising the confidentiality, or perhaps they have been told that only the PDF can be shared and they lack the paygrade to make any other decision. A third reason is the client may not be able to get their hands on the original file used to create the PDF.

Continue reading “Handling PDFs… is there a best way?” →

I thought Studio could handle a PDF?

February 4, 2013December 3, 2021 ~ Paul Filkin ~ Leave a comment

Update: Studio 2015 does have a built in OCR facility for PDF, so whilst this article is still useful, keep that in mind! Also worth reviewing the solution from InFix using XLIFF.
Studio has a PDF filetype, and it can do a great job of translating PDF files… BUT… not all PDF files!
So what exactly do I mean by this, surely a PDF is a PDF? Well this is true, but not all PDF files have been created in the same way and this is an important point. PDF stands for Portable Document Format and was originally developed by Adobe some 20-years ago. Today it’s even a recognised standard and for anyone interested you can find them here… at least the ones I could find:
Continue reading “I thought Studio could handle a PDF?” →