Handling PDFs… is there a best way?

001We all know, I think, that translating a PDF should be the last resort.  PDF stands for Portable Document Format and the reason they have this name is because they were intended for sharing with users on any platform irrespective of whether they owned the software used to create the original file or not.  Used to share so they could be read.  They were not intended to be editable, in fact the format is also used to make sure that the version you are reading can’t be edited.  So how did we go from this original idea to so many translators having to find ways to translate them?

I think there are probably a couple or three reasons for this.  First, the PDF might have been created using a piece of software that is not supported by the available translation tool technology and with no export/import capability.  Secondly, some clients can be very cautious (that’s the best word I can find for this!) about sharing the original file, especially when it contains confidential information.  So perhaps they mistakenly believe the translator will be able to handle the file without compromising the confidentiality, or perhaps they have been told that only the PDF can be shared and they lack the paygrade to make any other decision.  A third reason is the client may not be able to get their hands on the original file used to create the PDF.

Paying it forward with MS Publisher files

001If you’ve never come across Microsoft Publisher before then here’s a neat explanation from wikipedia.

Microsoft Publisher is an entry-level desktop publishing application from Microsoft, differing from Microsoft Word in that the emphasis is placed on page layout and design rather than text composition and proofing.”

It’s actually quite a neat application for newbies to desktop publishing like me, but it’s a difficult tool to handle if you receive *.pub files (the format used by MS Publisher) and are asked to translate them.    And I do see requests from translators from time to time asking how they can handle them.  The file itself is a binary format and even with Office 2016 (which includes Publisher if you have the Professional version) the only export formats of PDF, XPS and HTML are not importable.  So very tricky indeed if you need to be able to provide your client with a translated version of the pub format.

Good bugs… bad bugs!

01What the heck is a good bug? I don’t know if there is an official definition for this so I’m going to invent one.

An unintended positive side effect as a result of computer software not working as intended.

I reckon this is a fairly regular occurrence and I have definitely seen it before.  So for example, in an earlier version of Studio you could do a search and replace in the source and actually change the source content.  This was before “Edit source” was made available… sadly it was fixed pretty quickly and you can no longer do this unless you use the SDLXLIFF Toolkit or work in the SDLXLIFF directly with a text editor.  In the gaming world it happens all the time, possibly the most famous being the original Space Invaders game where the levels got faster and faster as you killed more aliens.  This was apparently not by design but it was the result of the processor speed being limited, so as you killed the aliens the number of graphics reduced and the rendering got faster and faster… now all games behave this way!  Another interesting example in the Linux/Unix world is using a dot at the start of a filename to hide it from view.  This was apparently a bug that was so useful it was never “fixed”.

Read this and I may have to shoot you!

01Chapter One

“Gabriela descended from the train, cautiously looking around for signs that she may have been followed. Earlier in the week she’d left arrangements to meet László at the Hannover end of Platform 7, and after three hours travelling in a crowded train to get there was in no mood to find he hadn’t got her message. She walked up the platform and as she got closer could recognise his silhouette even though he was facing the opposite direction. It looked safe, so she continued to make her way towards him, close enough to slip a document into the open bag by his side. She whispered ‘Read this and I may have to shoot you!’ László left without even a glance in her direction, only a quick look down to make sure there was no BOM.”

A little Learning is a dang’rous Thing;

01Drink deep, or taste not the Pierian Spring:
There shallow Draughts intoxicate the Brain,
And drinking largely sobers us again.

I’m quoting Alexander Pope in 1709, rightly or wrongly, for hitting the nail on the head when it comes to the truly intoxicating mix of language and technology.  A little knowledge is indeed a dangerous thing and it’s something I know I’ve been guilty of all my life… I learn a little something new and now I’m an expert.  That is of course until I learn a bit more, and then a little more after that, and before I know it I realise I know nothing at all!  Translation technology is great for dropping us all into this trap… Trados user since Trados 5, translator for over 20-years… can handle any type of file.  Falling into this trap is pretty easy in fact, especially when the tools available for translation today take a lot of the effort out of the tasks at hand.  But not everything is what it seems and sometimes it takes a mistake or three to sober us up again!  There’s a reason why well organised and successful translation companies, dealing in all kinds of content, have Project Managers, Translators and Localization Engineers within their midst.

ATA56 – SDL Trados Studio Advanced

01I ran a beginners and an advanced workshop at the ATA56 pre-conference day in Miami this year.  A really fun day for me as we start the day with no specific agenda or pre-defined course and then try to shape the session to suit the needs of the attendees.  The beginner tends to be a little more prescribed, to start off with at least, and the intention is to try and cover the basics of how Studio and MultiTerm work.

The advanced is a lot different… after all, what is advanced?

Bilingual Excel… and stuff!

Copyright Rudall30 | Dreamstime.comI’ve written about how to handle bilingual excel files, csv files and tab delimited files in the past.  In fact one of the most popular articles I have ever written was this one “Creating a TM from a Termbase, or Glossary, in SDL Trados Studio” in July 2012, over three years ago.  Despite writing it I’m still struggling a little with why this would be useful other than if you have been given a glossary to translate or proofread perhaps… but nonetheless it doesn’t really matter what I think because clearly it was useful!

So, why am I bringing this up three years later?  Well, the recent launch of Studio 2015 introduced a new filetype that seems worthy of some discussion.  It’s a Bilingual Excel filetype that allows you to handle excel files with bilingual content in a similar fashion to the way it used to be possible in the previous article.  There are some interesting differences though, and notably the first would be that you won’t lose any formatting in the excel file which is something that happened if you had to handle files like these as CSV or Tab Delimited Text.  That in itself mught be interesting for some users because this was the first thing I’d hear when suggesting the CSV filetype as a solution for handling files of this nature.  Most of the time I don’t think this is really an issue but for those occasions where it is this is a good point.

Comments… chapter and verse!

The ability to work with comments in a managed localisation process is an important part of communication between translator, reviewer, project manager and the end client… and not necessarily in that order!  Comments are used to clarify misunderstandings in the source text, questioning completed translations you’ve been told to ignore that just don’t look right, suggesting improved terminology, explaining why you translated something in a particular way, clarifying why you changed a translation in review, providing additional context from the client, adding notes to the target file for an in country review, they could even be comments that are just there to be translated, or ignored… the list of reasons could be pretty long and so could the comments.  So it’s very important to be able to keep them linked to the context so it’s easy to deal with the referred text, and also to be able to get to the comments quickly when they might only relate to a couple of segments in three files that are part of a five hundred file project nested within a complex folder structure.

So this post is going to deal with two things… first of all the places where comments can be used in Studio out of the box and secondly a very neat OpenExchange plugin that I reckon many project managers, and translators, have wished for and didn't know it was there already!

The JSON files…

01Update Sept 2016: You can find an excellent filetype plugin for JSON files on the SDL AppStore if you don’t want to tackle this yourself.

The JSON files… not really related to Jason Voorhees of course, but for some users who have received these file types for translation the problem of how to handle them and extract the appropriate text may well seem like an episode of Friday the 13th!  I’ve seen a few threads in the last couple of weeks sharing various methods for handling these files ranging from opening them in MSWord and applying a hidden style to the parts you don’t want, to asking vendors to create variations on javascript filetypes.  But I think Studio offers a much simpler mechanism for handling them out of the box.

So what are these file types and how can you handle them with Studio 2014, or even 2009/2011?  In this article I'm going to look at the regex filetype as this is very well suited to files like this, but before we get into that detail let's take a look at what they are.

Why do we need custom XML filetypes?

20_smallerMy son asked me how my day had gone and before I could answer he said in a slightly mocking tone “blah blah blah… XML… blah… XML … blah blah”.  Clearly I spend too much time outside of work talking about work, and clearly his perception of what I do is tainted towards the more technical aspects I like the most!  Aside from the note to self “stop talking about this stuff after I leave the office!” it got me thinking about why I probably think about XML as much as I apparently do and how I could help others avoid the very same compulsion!  I’ve written articles in the past about how to use regular expressions in Studio, and an article on using XPath, and I’ve probably touched on handling XML files from time to time in various articles.  But I don’t think I’ve ever explained how to create an XML filetype in the first place, or why you would want to… after all Studio has default filetypes for XML and this is just another filetype that the CAT tool should be able to handle… right?

