All I want is a simple analysis!

01If this title sounds familiar to you it’s probably because I wrote an article three years ago on the SDL blog with the very same title.  It’s such a good title (in my opinion ;-)) I decided to keep it and write the same article again, but refreshed and enhanced a little for SDL Trados Studio 2014.

Something I only occasionally hear these days is “When I used Workbench or SDLX it was simple to create a quick analysis of my files. Now I have to create a Project in Studio and it takes so long to do the same thing.”  I do think this is something you’re more likely to hear from experienced users of the older products because they initially find that getting a quick report out of Studio is a far more onerus process than it used to be.  What they might not think of is how you can use the Projects concept to make this easy for you once you become just as experienced with the new tools.

The basic concept behind getting a quick and simple analysis report before you take on a job is that you create a Project that you only use for analysis and then you leave this permanently in the Projects View:

02
Once you’ve done this the process becomes quite simple.  All you have to do is activate the Project by double clicking it whenever you want to analyse a file, or files, then drag and drop them onto the Files View.  If you prefer you can also use the Add Files button from the File Actions group in the ribbon or Right-Click from the Files View window:

03

The one difference between these methods is that if you select Add Files in either of the latter methods they will always be treated as translatable files as long as they are recognised filetypes.  If you drag and drop without changing to the source language first then you will be presented with an option:

04

If you change to the source language then you won’t get this option and they will be added as translatable files.  If you decide you want to change the purpose of a file after adding it then you can change this by selecting the files and then clicking on this icon in the ribbon:

05

In this example I’m just adding three files, and once they are in the list you can right click on a file (or folder), or select several files together using the ctrl key and choose the Analyze Only batch task, or select it from the Batch Task icon on the ribbon:

06

Once the task has completed you can select the Reports View and the reports at the top of the list will be the reports for this analysis. Note that the default Analyze Only task also produces a Translation Count report; if you have the Professional Edition you can create a custom task that only runs the analysis report:

07

You can set the Project up to run without a Translation memory at all, or you can set as many Analysis projects up as you need all preconfigured with your usual Translation Memories and then the time it takes to run the analysis is as fast, if not faster than it was using Workbench or SDLX.  A bigger advantage of Studio is that if you work with one source language and multiple target languages then you can analyse all the languages at the same time which is something you couldn’t do easily before and of course you can also use as many Translation Memories as you like… something else you could not do in Trados 2007.

A quick note on carrying out the analysis without a Translation Memory. “Why would I do this at all?” you may ask.  Many SDLX users made use of a feature that reported the internal repetitions.  This feature allowed you to report on what benefits you would receive from translating based on the contents of the document you are working on rather than the Translation Memory.  This allows a more realistic report on the effort required to translate and you can achieve this without a Translation Memory at all.  In Studio we have an option called Report internal fuzzy match leverage which does the same thing and this can be activated under Batch Processing -> Analyze Files.  The help guide in Studio covers this feature quite nicely, and of course you can still run with this option when using one or more Translation Memories as you see fit.  In addition you can also lock segments that you don’t want to see included in the report and these can be reported as a separate item.  This is found in the same location as the internal fuzzy match leverage:

08

I thought it would be handy to see the complete analysis process in one go in a video because it often takes longer to read an explanation of how to use things than it does to watch it, especially when I try to explain the various things you might come across as you go.  Certainly I think this will show the process is pretty simple and certainly doesn’t take as long as you might think!

Finally, I just want to mention the OpenExchange. There are a number of applications available now that can use the analysis in Studio to make it easier for you to prepare reports for your customer, and there are a few that can be used to create an analysis without using Studio at all… and I’m sure there will be more as time goes by.  Here’s a few examples that have been written to use the analysis in some way.  I have linked to the OpenExchange pages and used the descriptions in these pages below:

SDL Studio InQuote : SDL Studio InQuote takes an analysis from any SDL Trados Studio Project, and creates a quote.

  • The quote can be created in MS Word, MS Excel or copied to your clipboard.
  • Rates can be added as simple word analysis, standard lines or grouped analysis.
  • Client details can be saved in the app and assigned specific rates.
  • Customize your own MS Word or MS Excel template.
  • Localize the app yourself in your chosen language.

Post-Edit Compare : Post-Edit Compare is a tool designed to report translation modifications during the post-edit phases of a translation workflow.

The tool works by comparing two versions of the same SDLXLIFF file (before and after changes were applied); the files with changes are highlighted and all modifications are included in a comparison report.

The comparison report is formatted in a way that simplifies the understanding of what changes were applied from one version of the file to the other, along with a full break down of the modifications and related cost analysis.

SDL Trados Studio – Export Analysis Reports: SDL Trados Studio – Export Analysis Reports allows the user to export the SDL Trados Studio Analysis report into the CSV format.

goAnalyze: goAnalyze is a Trados 2009/2011 add-on which allows you to perform an analysis of several or even zipped files without having to open them in Trados Studio.
Before running the first analysis, you create profiles in the user interface of goAnalyze. These profiles consist of the language combination, the translation memories to be used plus optional analysis settings and even costing figures.

Free Online CAT Weighting Tool : Free Online CAT Weighting Tool used to upload the SDL Trados Studio analysis file from your computer to calculate the cost to quote.

So keep an eye on the OpenExchange because you never know who will create a tool to simplify the analysis further or use it in some clever way for added value.  I don’t think it’s difficult to use Studio for this, and I hope this article clarifies it, but there’s always room for more innovation!

21 comments
  1. Anamary said:

    Thank you for helpful articles and guides!
    I’m FL french-albanian translator and occasionally from clients I receive only SDL Trados analyze file as xml file (for info about upcoming project and accordingly I prepare bid and planing my time). When I import xml analyze report into Excel, result is confusing tabele. I’m wondering if is possible to import analyze xml report into excel with some more clear results and useful for preparing a bid. I tried to ask client to export analyze report to xls or html, but apparently it’s bothering for them :( (Online CAT Weighting Tool is not best option for me, because I’m not online all the time)

    KR,
    Anamarija

    • Hi Anamarija, do you receive the full project package too? If so then you just open that and you will be able to read the analysis in Studio. I’ve never tried to use the XML in Excel outside of Studio but I imagine it would be possible with a little work… I’m just struggling with why this should even be necessary? Seems a little odd, especially when it’s not hard to save the analysis as excel in the first place.

      • Anamarija said:

        Hi Paul,

        Thank you for a quick replay.
        It is odd – I agree with you :), but scenariom is: client sends to me a xml file with analysis, and after I confirm thw job (WC and due date), he sends me a package :)

  2. So basically, if all you want is a simple analysis, the answer is: no dice. It’s a real pain to go through all this rigmarole for single documents if all the customer wants is a quote. I wish there was just some simple option to analyse a document without have to set up a project every time. Your solution is a pretty good workaround, I guess. But I’m going to try this GoAnalyze tool and see if it’s any good.

  3. Marty Nush said:

    Hi,
    Why do I get a set of figures when I run the analysis of a file against an empty TM (with internal analysis unchecked in Settings), and a different one when I check internal analysis and run the Analysis? Shouldn’t they be the same?
    Thanks

    • Hi Marty, if the difference is the internal analysis then no they should not be the same. The standard analysis is based on the contents of your TM. So if the TM is empty they should probably be all no match. The internal analysis attempts to forecast the leverage based on you translating the file, so you get some idea of the benefits of using a TM as you work.

      • Marty Nush said:

        Hello Paul, these are the steps I followed:
        -Created a new project with a pdf to translate an empty TM,. Then, batch anlaysis (unchecked show internal uanalysis). I do get matches, as follows (word counts):

        678 repetitions,
        72:100%,
        2:95-99%,
        New 369,
        Total 1121

        As I say, the TM is newly created and empty. My first issue is, aren’t repetitions and 100% the same thing here?.

        I then go to Settings, check internal analysis, batch analyze the file, and I get these:
        651 Repetitions
        72:100%
        2: 95-99%

        Internal:
        95-99: 28
        85-94: 39
        75-84: 45
        50-74; 22
        New 262
        Total: 1121

        I can’t make sense of ithis situation, can you help me?

  4. Marty said:

    Sorry about my poor explanations. I was in a rush! I found that if the TM you create for the project has ‘lookup’ enabled, the top fuzzy bands (with matches against the TM) will have some segment matches. If you disable lookup, nothing will appear there. Why is that? Also, checking or unchecking the ‘lookup’ option and checking or unchecking the ‘report internal fuzzy match leverage’ results in different repetition counts. Thanks for your help

  5. Marty said:

    There aren’t any tickets in the KB related to the issue I described above and I don’t have the Premium Software Maintenance Agreement. The analysis tool is not so simple after all! Why does SDL Trados Studio 2014 returns fuzzy matches when you analyze a text against an empty TM? I’m not talking about internal analysis. Quoting Paul: “So if the TM is empty they should probably be all no match.” Eppur si muove…

  6. Miroslav said:

    Hi,
    I have a question. Some projects can be very big, including dozens of files. It could turn out that the files can actually be seperated into sets. For example, the group analysis of 50 files show 30% match, nice. But in reality there are 5 sets of 10 files within which the files are 80% similar (figures are artificial). Is there a way a figure that out without having to read them manually? And the second, what if not all of the files are words but there are for example excels as well? Thanks!

    • Hi, the only way to figure that out is to run the analysis and look at the report that is generated. It makes no difference what the files are, they can all be analysed together.

      • So the report will contain the match values between all document pairs?

      • The report is not comparing documents. It compares the segments in each document to what’s in your TM. If you run an “internal” analysis which is not turned on by default then this will compare the segments to what will eventually be in your TM if you translated the files; so in effect showing you the leverage you could get as you translated. This is pretty unscientific as it also depends on the order in which you take the files, so the internal analysis is a guide only… or should I say more of a guide than the standard analysis is.

      • I understand. Then aside Studio, do you perhaps know of some app that can e.g. analyze multiple word documents?

      • All CAT tools can analyse multiple Word documents. What specifically do you want to see? I must be misunderstanding your requirement.

  7. Say for example that there are 9 word documents. I would like to see a table of similiarities between all of the pairs like this:
    1 2 3 4 5 6 7 8 9
    2 100 n n n n n n n
    3 n 100
    4 100
    5 10 80 75 100 15 6 85 12
    6
    7
    8
    9

    So, the numbers 1-9 are word documents. The numbers 100 are obviously 100%matches between identical documents (1-1, 2-2…). n is the match between all other pairs and it is these n’s that I’m interested in. For example, look at the row 5. It is immediately visible that words 5, 3, 4 and 8 are very similar, while the other words are not similiar to 5. Similiar meaning that the numbers represent the similiarity of content. It isn’t important is it characters, segments, what is important to get the general idea how the words relate to oneanother.

    • So you would basically need 36 different sets of analysis to cover each possible pair. You could do this with a CAT but you would have to do this manually. So like this:
      1. Open doc #1, copy source to target and update into TM #1
      2. Open doc #2, copy source to target and update into TM #2
      3. etc… to doc #9 and TM #9
      4. Open doc #2 – doc #9 and analyse all 8 docs against TM #1
      5. Open doc #1, doc #3 – doc #9 and analyse all 8 docs against TM #2
      6. etc…

      You would have some repetition of analysis here as this would give you 72 reports, but could ignore the duplicates. So 9 sets of analysis would get you what you wanted and then you’d have to do a manual interpretation of the changes.

      Maybe there is a cleverer way to handle this and it’ll be interesting to see if anyone chips in, but it’s not really what a CAT is designed for. You have asked for an analysis to compare files with each other as opposed to what a CAT is designed to do which is tell you how much of a document you have translated before so you can establish the effort.

      What is the purpose of an exercise like this?

      • Oftentimes there are subsets of similar documents within the whole set due for translation. If you share the work with someone, it would make sense to give to one person all of the files within a single subset. That way he/she can use virtual merge for auto-propagation or, if virtual merge is not used, simply build up on his/her TM and, in the end, translate faster. The purpose is to avoid the situation where the same person translates documents that are not from the same subset. Obviously you could use shared memories etc., but it is far more convinient to split the documents properly. I am curios is there a way to do this without the need to manually examine the documents (which can e.g. be very large).

      • I’m not aware of one, but it seems to me it would an excellent tool/feature that recommended the most appropriate translation order before you started.
        One possible alternative way to handle this is to use the “Export frequently occurring segments” feature. This can create a separate sdlxliff that contains only segments that repeat themselves more than a defined number of times in the project. You then translate this first, and then use the TM created from this to pre-translate the full Project. This way it won’t matter what the order is because all the repetitions are handled before you start. You will lose context this way, but for the right type of source material this might be the answer.

      • I know of that option, but losing the context is too big of a thing most of the time. Oh well. Thanks a lot for your help, if you stumble upon on a solution, please let me know.

  8. The formatting is messed up but hopefully you understand what I mean :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,056 other followers

%d bloggers like this: