If you’ve never come across Microsoft Publisher before then here’s a neat explanation from wikipedia.
“Microsoft Publisher is an entry-level desktop publishing application from Microsoft, differing from Microsoft Word in that the emphasis is placed on page layout and design rather than text composition and proofing.”
It’s actually quite a neat application for newbies to desktop publishing like me, but it’s a difficult tool to handle if you receive *.pub files (the format used by MS Publisher) and are asked to translate them. And I do see requests from translators from time to time asking how they can handle them. The file itself is a binary format and even with Office 2016 (which includes Publisher if you have the Professional version) the only export formats of PDF, XPS and HTML are not importable. So very tricky indeed if you need to be able to provide your client with a translated version of the pub format.
In the past I would have suggested T-Windows for Clipboard which is installed with Studio 2015 and this would allow you to translate the text (if you have a copy of MS Publisher) using your Studio Translation Memories. There has also been an application on the OpenExchange for around a year that can create an XML file from MS Pub (for the 2010 version only) and this does the job quite nicely and again requires a copy of MS Publisher to be installed. But now there is a new application available, pub2xml, which also supports the latest versions of MS Publisher and it also provides some nice touches making it far easier to use. It’s also free, but still requires MS Publisher to be on your machine.
But there is a catch! The developer created the bones of this application and it seems to work really well for most files. But it’s not 100% complete. The things that are missing relate to the handling of internal formatting, like tags for font changes midway through a sentence, or hyperlinked text. It is a good catch though because the developer has created a solution to a problem faced by many users and then made the code available as OpenSource on his Github site. This means that any developer could make a pull request to get the code and could make changes, enhancements, fix bugs etc. and share this back into the source for others to use. This is something we also do with every application we have created through the SDL OpenExchange (now RWS AppStore) development team and you can find the source code for these here. In fact the developer of pub2xml, Patrick Hartnett, has shared the source code for several apps and a few other things on his Github site too, so it’s great to see other developers following suit and helping to grow the developer community with shared resources… I guess it’s a sort of “paying it forward” approach and I like it. Another Patrick, Patrick Porter, has done a similar thing with his code for machine translation plugins to Google and Microsoft and you can find them here as well as a few other things.
I’m really hoping that as the development community becomes more established in 2016 we’ll start to see more of these community initiatives with more developers “paying it forward” by investing in sharing a little of their knowledge to benefit everyone.
But I digress… back to MS Publisher!
The basic idea is that this is a standalone application which makes the content available (text, formatting and images) for localization. The application itself is not complicated and has two screens, one for export and one for import. A simple overview of the entire application is that it’s a drag and drop interface like this:
So you would drag and drop all the files you wish to convert into the interface of the Export tab, set your options and click “Export”… it’s as simple as that. The export options here are very interesting:
- Export translatable content : pulls the translatable text out of the file and inserts it into a simple XML file
- Export pictures : pulls the images from the pub file and stores them in a separate folder where they can be localized if necessary
- Create PDF file : creates a PDF rendition of the pub file making it easy to see the format in context without MS Publisher available
- Create Pseudo file : creates a new pub file during export with the extension .pseudoTranslation.pub and it replaces all the vowels from the translatable content with % or $ characters. This allows the Project Manager to quickly confirm that all the translatable content was in fact exported… so a quick sanity check
- Markup internal font information : this relates to part of what’s incomplete with the app. If you select this option then bold, italic, strikethrough, superscript and subscript formatting will be honoured in the XML with appropriate tagging. Any other type of formatting is currently ignored.
The import options are very similar and of course make sense as they are aligned to the export:
- Import translatable content : pushes the translated text in the XML target file back into the MS Publisher file
- Import pictures : pushes the exported images (which could now be different) back into the MS Publisher file
- Create backup file : renames the original source pub file as .BAK so it can easily be recovered if needed
- Create PDF file : creates a PDF rendition of the localized pub file making it easy to see the format in context without MS Publisher available
That’s essentially it… very nice application and easily to use!
The XML file
Well… ok, that’s nearly it. How about the XML file and how do you handle that in Studio? The format is very simple, so simple that the default AnyXML filetype would be sufficient to translate the file. However, there are a few internal tags here and there that are extracted as translatable text, so it makes sense to create a custom XML filetype for this to ensure that the tags are properly protected. The example files I have tested so far all seem to make it very simple and I created one which is available in the zip you can download from the OpenExchange, as well as a sample pub file in case you need one to get started. Looks like this and I highlighted the translatable text that was extracted. All between the <text> elements as you can see, so very straightforward:
In Studio, using a bit of text with all the currently supported internal tags catered for, it looks like this:
But to avoid this being an unnecessarily long post I created a short video showing the process from end to end so you can see what it looks like in practice.
Approx. playing time: 15 mins
Developers “paying it forward”… an excellent concept for 2016! I’m looking forward to seeing more of these, so perhaps we can review this with the last article I write at the end of this year. I hope it’s going to be a full one! Now just one more thing I forgot to mention… you can download the current version free from the SDL OpenExchange (now RWS AppStore) via this link.