Upgrading your legacy resources – filetypes

When you upgrade from Trados to SDL Trados Studio there are a number of things you can take with you.  Translation Memories, Termbases, AutoText lists, custom variable lists, customised segmentation rules for example.  These are all discussed quite a lot in the public forums and in blog articles, but what we don’t see a lot of information on is how to update your file types.  As a result I think many users convert files to TTX unnecessarily just so they can use the old *.INI files they’ve had for years.
So, this article is just a quick explanation of how to do this, starting off with what we did in TagEditor to create these *.INI files in the first place.  So, in TagEditor when you created a custom filetype it all revolved around the same process where you run the wizard and at the start specify whether the settings will be for SGML/HTML or XML based files:

This is important to note because in TagEditor there was only one filetype for custom files and its use was determined through settings.  In SDL Trados Studio we have two.  There is a proper XML filetype that has the potential to do a lot more than you could ever achieve with the old TagEditor settings, and there is a custom HTML capability.  You can see these in the list when you first try to create a new filetype in Studio:

There are of course more than 60 filetypes that Studio will support out of the box, but I’m talking specifically about the filetypes you create that have different parser rules.  So filetypes you can create that extract specific text from the file based on rules you create.  For the purposes of this blog article I’m specifically referring to HTML and XML as these are filetypes that you may have an *.INI for that was created in Trados.
Upgrading this *.INI file is not always the best approach because Studio can do things in a better way, and because many *.INI files have not been maintained very well and contain a lot of unnecessary information that could cause performance issues… or may be simply unclear when you try to understand what the rules are doing.  However, not everyone has the necessary skills, time, or inclination to rewrite these rules so Studio provides a mechanism to upgrade them.  This mechanism is similar in both cases, but you need to know what the purpose of the file is first.  Your client should be able to tell you this, but if they don’t and you have the *.INI and the files for translation then you need to look at the file for translation and see what it is before you create your new filetype.
Generally if it’s XML then it will start with a declaration similar to this:
<!–?xml version=”1.0″ encoding=”ISO-8859-1″?>
It may also have a reference to a DTD or an XML schema like this that are used to define the structure of the file, and in Studio you can validate the files against the relevant DTD or Schema:
<!–DOCTYPE note SYSTEM “multifarious.dtd”>
or
<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”>
In this case when you upgrade your *.INI file you will select XML in the Select Type dialogue box above where you will be presented with options, all optional, for the filetype properties:

As a general rule of thumb I like to complete the four options shown here because they ensure you always know what the file was for and also have a double check when translating that the correct filetype was chosen:

  1. This is the name of the Filetype that will appear in the File Types list in Studio
  2. This identifier is used to ensure that Studio picks the correct filetype when saving the target or previewing with a custom stylesheet (so if you share an SDLXLIFF with another translator they will also need this filetype on their computer to do these things in addition to translating), and can also be used to check the correct filetype was used by checking against the TagID:

    So when you select this mode the orange tab at the top of your translation will show the filetype ID that was used instead of the filename:
  3. The file dialog expression can be set to whatever the extension of the file you are using is.  In this case it’s just XML so I left it that way.
  4. Finally, the description is to ensure you know exactly what the filetype was for, and can contain client details, dates you got it… whatever is helpful to ensure you always know exactly what this filetype was for.

The next dialogue in the wizard is the important one here:

I can base this new XML file on various things… one of them being the *.INI file I use for completing translations for a particular client.  So I select the *.INI and this brings in all the parser rules… I’m happy with them (or don’t know any different) so I click on Next until I see Finish and that’s it… just a few clicks.
If the files I had were HTML, and I’d know this because on opening the file it would probably start off with an HTML declaration and then some code surrounded by the HTML element:
<!–DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>
<HTML>
… some stuff in here…
</HTML>
Then I would repeat the process above and create the new filetype after selecting HTML instead of XML.  In this example of what an HTML file looks like I have also referred to a DTD because HTML 4.01 was based on SGML and this is why when you look at the options for creating a filetype in TagEditor you see SGML and HTML as a single option; they are both based on the HTML filetype.  However SGML probably won’t have the HTML element in there… just a DOCTYPE reference that if you’re lucky will say something like:
<!SGML  “ISO 8879:1986 (WWW)”  … etc
But more often than not it won’t… so the absence of the HTML element, and the likelihood that the file extensions you are provided with for translation are *.SGM may be a giveaway.  Hopefully you won’t need to worry as your client should tell you what the file was for anyway… but if not maybe this small amount of information will be helpful.
So, I complete the first screen like this:

Just the same as before except that this time the template is based on HTML and not XML, and I amended the file dialog expression to be *.sgm to match an SGML filetype (I’d have left the default File dialog expressions for an HTML file).  Once complete the next step is to add the *.INI and I am again presented with the option for this on the next screen:

That’s it… so pretty simple.  The resulting effect is that I have new filetypes upgraded from an old Trados *.INI for XML, HTML or SGML files:

So no reason to convert these files to TTX first; you can upgrade these old filetypes to Studio as well and enjoy a better experience when translating the files as well as removing the additional steps required to go to TTX first and then converting the target files from TTX at the end.

Leave a Reply