Paying it forward with MS Publisher files

001If you’ve never come across Microsoft Publisher before then here’s a neat explanation from wikipedia.

Microsoft Publisher is an entry-level desktop publishing application from Microsoft, differing from Microsoft Word in that the emphasis is placed on page layout and design rather than text composition and proofing.”

It’s actually quite a neat application for newbies to desktop publishing like me, but it’s a difficult tool to handle if you receive *.pub files (the format used by MS Publisher) and are asked to translate them.    And I do see requests from translators from time to time asking how they can handle them.  The file itself is a binary format and even with Office 2016 (which includes Publisher if you have the Professional version) the only export formats of PDF, XPS and HTML are not importable.  So very tricky indeed if you need to be able to provide your client with a translated version of the pub format.

In the past I would have suggested T-Windows for Clipboard which is installed with Studio 2015 and this would allow you to translate the text (if you have a copy of MS Publisher) using your Studio Translation Memories.  There has also been an application on the OpenExchange for around a year that can create an XML file from MS Pub (for the 2010 version only) and this does the job quite nicely and again requires a copy of MS Publisher to be installed.  But now there is a new application available, pub2xml, which also supports the latest versions of MS Publisher and it also provides some nice touches making it far easier to use.  It’s also free, but still requires MS Publisher to be on your machine.

But there is a catch!  The developer created the bones of this application and it seems to work really well for most files.  But it’s not 100% complete.  The things that are missing relate to the handling of internal formatting, like tags for font changes midway through a sentence, or hyperlinked text.  It is a good catch though because the developer has created a solution to a problem faced by many users and then made the code available as OpenSource on his Github site.  This means that any developer could make a pull request to get the code and could make changes, enhancements, fix bugs etc. and share this back into the source for others to use.  This is something we also do with every application we have created through the SDL OpenExchange (now RWS AppStore) development team and you can find the source code for these here.  In fact the developer of pub2xml, Patrick Hartnett, has shared the source code for several apps and a few other things on his Github site too, so it’s great to see other developers following suit and helping to grow the developer community with shared resources… I guess it’s a sort of “paying it forward” approach and I like it.  Another Patrick, Patrick Porter, has done a similar thing with his code for machine translation plugins to Google and Microsoft and you can find them here as well as a few other things.

I’m really hoping that as the development community becomes more established in 2016 we’ll start to see more of these community initiatives with more developers “paying it forward” by investing in sharing a little of their knowledge to benefit everyone.

But I digress… back to MS Publisher!

Overview

The basic idea is that this is a standalone application which makes the content available (text, formatting and images) for localization.  The application itself is not complicated and has two screens, one for export and one for import.  A simple overview of the entire application is that it’s a drag and drop interface like this:

002

So you would drag and drop all the files you wish to convert into the interface of the Export tab, set your options and click “Export”… it’s as simple as that.  The export options here are very interesting:

  • Export translatable content : pulls the translatable text out of the file and inserts it into a simple XML file
  • Export pictures : pulls the images from the pub file and stores them in a separate folder where they can be localized if necessary
  • Create PDF file : creates a PDF rendition of the pub file making it easy to see the format in context without MS Publisher available
  • Create Pseudo file : creates a new pub file during export with the extension .pseudoTranslation.pub and it replaces all the vowels from the translatable content with % or $ characters.  This allows the Project Manager to quickly confirm that all the translatable content was in fact exported… so a quick sanity check
  • Markup internal font information : this relates to part of what’s incomplete with the app.  If you select this option then bold, italic, strikethrough, superscript and subscript formatting will be honoured in the XML with appropriate tagging.  Any other type of formatting is currently ignored.

The import options are very similar and of course make sense as they are aligned to the export:

  • Import translatable content : pushes the translated text in the XML target file back into the MS Publisher file
  • Import pictures : pushes the exported images (which could now be different) back into the MS Publisher file
  • Create backup file : renames the original source pub file as .BAK so it can easily be recovered if needed
  • Create PDF file : creates a PDF rendition of the localized pub file making it easy to see the format in context without MS Publisher available

That’s essentially it… very nice application and easily to use!

The XML file

Well… ok, that’s nearly it.  How about the XML file and how do you handle that in Studio?  The format is very simple, so simple that the default AnyXML filetype would be sufficient to translate the file.  However, there are a few internal tags here and there that are extracted as translatable text, so it makes sense to create a custom XML filetype for this to ensure that the tags are properly protected.  The example files I have tested so far all seem to make it very simple and I created one which is available in the zip you can download from the OpenExchange, as well as a sample pub file in case you need one to get started.  Looks like this and I highlighted the translatable text that was extracted.  All between the <text> elements as you can see, so very straightforward:

003

In Studio, using a bit of text with all the currently supported internal tags catered for, it looks like this:

004

But to avoid this being an unnecessarily long post I created a short video showing the process from end to end so you can see what it looks like in practice.

Approx. playing time: 15 mins

Developers “paying it forward”… an excellent concept for 2016!  I’m looking forward to seeing more of these, so perhaps we can review this with the last article I write at the end of this year.  I hope it’s going to be a full one!  Now just one more thing I forgot to mention… you can download the current version free from the SDL OpenExchange (now RWS AppStore) via this link.

Enjoy!

 

18 thoughts on “Paying it forward with MS Publisher files

  1. Thank you so much for your post. I installed version 2.1.0 of Publisher Converter from the open exchange but when I try to export a .pub file, I get the following error message “you cannot complete this operation because a modal dialog is active.” Any idea what could be causing it? I have no applications open, other than Trados Studio.

    1. It does sounds as though something is still open in the background. Did yo check your task manager to see if there is a process running even though there is no visible window? If there is just kill the task.

      1. Thank you for your response. Trados is installed on a different computer and I have to remote connect to this server maybe that is why it’s not working. I installed the converter on my computer and it’s working fine.

  2. Hello!

    I extracted the xml file from the pub source without any problems but when I tried to import back the translation (pseudo-translate testing scenario) I get the following message:

    The paragraph counts are different for shape id: 123

    So the import wasn’t successful. Any ideas?

      1. Now that I saved the PUB as DOC and retranslated the text with the existing TM, I noticed that some chunks of text have not been exported into the XML. This is probably what the app missed when importing.

      2. I know this is an old subject, but I thought this may be helpful to someone researching this problem like I was. What was causing the problem in my case was that there was a .pub file from a previous import remaining in the same folder as the xml file being imported.

        So if you have done previous imports to the same file, make sure you reinstate the original .pub file used to export the xml before importing the target xml back to .pub.

    1. I have the same issue. When digging into it deeper it turned out that text box had more paragraphs that were not exported. In my case, box 341 had only 2 out of 3 exported and failed on the import. Pseudo translate shows the 3rd paragraph as having been picked up; but when searching for the text in the .xml it never shows up.

      I’ve tried changing font, changing the formatting, saving down to 98, made sure all text boxes were text boxes (not image shapes) with the same settings – all failed.

      The only way that I can get it to pick up this 3rd paragraph is if I use a soft return or combine the 3rd with the 2nd. What’s extra confusing is that in this Pub file I have many text boxes with multiple paragraphs and all of those worked fine. This is the only text box with 3 paragraphs the tools skips on page 3 out of 4…

      I can find no reason that pub2xml would skip this one paragraph. I am using Pub 2016, I don’t have 2013 to test with, and I’m beginning to wonder if that is the issue? In all other testing this did not come up as an issue and once merged, the target *.xml will import back into the Pub file. What worries me is the fact that pub2xml can see the missing paragraph, but doesn’t pull it. The other difference I’ve seen between my files and this wonderful guide, is that on the Pseudo translate file the images show as enlarged (like blown up to 150%). Doesn’t happen on the real import just the Pseudo translate pub file.

      Does anyone have any ideas on what I can do to avoid this issue in the future? I don’t want to find out after everything is done that there is an issue…and have to translate more…

  3. Hi! This was very useful 🙂
    However, even though it worked perfectly with the sample file, when trying to repeat the process with my own file, I got a message “hexadecimal error value 0x1F is an invalid character” I tried to reset Trados and repeat the conversion steps but the issue still appears. Any idea on how to fix it?
    Thanks!!

  4. Hi! Thanks so much for this video! 🙂
    I’m not at all familiar with Publisher and my SDL knowledge is limited.
    I’ve had a few issues and was wondering if you would be able to assist in any way?
    Everything was working fine until I try to import the xml file back to Publisher format. When I import it, I get the message “Unable to locate Publisher file” in the Processing Message column, so I’m not able to save it back into the desired format. Any idea as to why this is happening?
    Thanks so much!

Leave a Reply