Two questions came up on ProZ today which Studio can handle very nicely. Despite this I often see some very clever and amazing workarounds that are probably not necessary at all. So I thought I’d write this quick post for two reasons… the first just to share these great and easy to use features in Studio, and secondly because I thought I’d use FastStoneCapture to record a video to explain the process when I answered both questions on ProZ this afternoon, and I blogged about this brilliant little tool last week.
Using the Regex Filetype in Studio
So, the first question was about how to extract the text after the equals sign when translating LNG files. The sample text provided was this:
OptimizingPars=Optimizing paragraph structure
GettingParData=Getting paragraph data
AnalyzingPars=Analyzing paragraph structure
Studio has a great regex filetype that has been in the software since the early Trados days as well, but it’s even better now. So my video is here:
Translating columns of text prepared in Microsoft Excel using Studio
Update 21 Dec 2015
Studio 2015 introduced a bilingual Excel filetype which could be used in preference to the CSV filetype.
Update 1 Sept 2017
Studio 2017 added the ability to apply embedded content rules using regular expressions in the bilingual excel filetype.
The second question that came up related to being able to translate a column of text in excel, but place the translated text into a different column. Then, just to add to this problem, some of the text was already translated and was already in the second column.
Studio can also handle this a couple of ways really nicely, so I did another quick video to address this as well:
0 thoughts on “A couple of little known gems in SDL Trados Studio”
Hi Paul, thanks for this nice tip about partly translated Excel files. I wanted to try this out and created a short and simple 2 column Excel file with some empty target cells and 2 already translated target cells. I created the file in Excel 2007 and saved it as a csv file. Now, when trying to open it in Studio 2011 (SP1), I get the error message “The file type is not supported.” In Tools – Options – File Types the file type has a check mark and I changed the Format tab according to your video above. What could be the problem here?
Hi Marita, my best guesses would be to make sure you saved the csv as the correct type (you want comma delimited CSV and not MS-DOS or Macintosh CSV) and also that you have the correct parameters set in the properties of the csv filetype in Studio. If you still can’t resolve it after that drop me an email and we can take a look next week when I’m back in the office – email@example.com
Hi Paul, Thanks for posting this. I just tried this in Studio 2014 (Comma delimited CSV option, Commas checked) and did not find the “Minimum number of columns” option and the “Text is enclosed in double-quotes option”. I clicked the lock translated segments box, but translated segments were not locked, so they did not populate with the already translated text. Files were saved as CSV (delimited, not Mac or DOS). I had, however, copied two columns from a sheet that contained macros and when saving clicked yes to disable any incompatible features. Could this be the problem?
Hi Ann, the 2014 CSV filetype has changed slightly and been improved. It no longer needs these options so this is why you can’t see them. I’m not sure why your file didn’t work… perhaps you selected the wrong columns? If you can share the file with me I’ll gladly take a look and see if I can explain the problem?
Dead easy regex to prepare in Word (it will save you all the trouble), using wildcards:
Replace: tw4winExternal style
Ready to import in Studio.
Sometimes low tech solutions are faster -:)
The CSV bi-lingual file type trick is brilliant and has the potential to save us SO MUCH work with at least one of our clients.
However, the source text in our Excel file (now saved as a csv file) contains commas within it, so Studio is segmenting the texts at these “comma separators”, so that I get the source language on both the source and target sides (i.e. half the sentence in the source column and the other half in the target column).
Our source text also contains some embedded content that we’d dearly love Studio to convert to tags. Here a sample source cell with both these issues:
Lato destro codone in fibra di carbonio “Strada”. Realizzato in autoclave, perfettamente intercambiabile con il pezzo di serie. Verniciato a mano.
Any ideas how to overcome this?
I’ve fixed the main problem myself, by changing the delimiter to a semi-colon. Pretty obvious really!
Now it’s just the formatting code such as , “ and ° (that weirdly didn’t appear in the example above when I copied it in!) that I’d very much like to appear as what it represents in Studio.
The only way to handle the formatting is to copy the file, remove source from one, remove target from the other, and then align them as Excel files. Should be pretty fast, although not as convenient as this. The Glossary Converter is even faster in case you have not tried that one? Just drag the excel file into the Glossary Converter interface and you can have this converted directly to TMX when you release it… it’s very fast too! But still won’t handle formatting. You need to use the align method for that.
Thanks for your reply, Paul. In the end I was able to strip out all the HTML text as well as convert all encoded characters to their normal characters by using this genius Excel add-in: http://www.asap-utilities.com/index.php
I now have a clean csv file that opens perfectly in Studio with everything where it should be and no nasty HTML code!
I have a weird JSON file (exported from Transifex) that looks the following way:
I thought applying the delimited schema, but got an error. What would you advise for such file structure, taking into account that many “TARGET” strings actually contain translation?
Hi Thomas, probably best to use Passolo for this.
I have such file with extension .etx
// ┌─┤ Lang ├─────────────────────────────────────────────────────────────────┐
// │ └──────┘ │
// │ │
// │ Multilanguage │
// │ │
An unknown language was selected (‘%1’).
can you please advice how to extract text for translation (from this I need only “An unknown language was selected (‘%1’).”).
Hi Rytis, looks simple enough. Use the regex filetype and just create inline rules to ignore the stuff you don’t want, leaving behind what you want. So for your example three rules would probably be exactly what you need:
With that last one I have a suspicion they should be straight quotes and not what you put in here so be careful with that.
Hope that helps.
You could also, just for fun, use one structure rule instead:
But you’d probably still want the placeholder inline for this:
But of course these are all based on your sample only.
Do you have any ideas how to translate such file, extension is lspkg:
Source is 1st Val after Str. Target is Val after Tgt. Val in Prev is previous source of current translation. Need to replace Val in Tgt but to filter according different values in Custom1 or Custom2, they are different each time. In many cases there is not Tgt and his Val, that’s mean there is no previous translation, like below:
Hi Ryciokas, can you share a sample file with me? I don’t think I really understand what you mean. Feel free to email me at firstname.lastname@example.org
Thank you very much.