Wow… how time flies! Over three years ago I wrote an article called AutoCorrect… for everything! which explained how to use AutoHotkey so you had a similar functionality to Microsoft Word for autocorrect, except it worked in all your windows applications. This was, and still is, pretty cool I think and I still use autohotkey today for many things, and not just autocorrect. Since writing that article we released Studio 2015, and in fact Studio 2017 is just around the corner, so it was a while back and some things have moved on. For example, Studio 2015 introduced an autocorrect feature into Studio which meant things should be easier for all Studio users, especially if they had not come across autohotkey before.
At least I thought it would be easier! I was reminded yesterday with a comment to my earlier article that if you wanted to edit and maintain the autocorrect list somewhere else, somewhere it might be more accessible then the process is not so simple. Yes there is a import/export feature for autocorrect which you can find here:
But this list is not a simple text file… rather it’s XML. This of course is not insurmountable, but it’s not immediately obvious as the comment to my previous article made clear. So here’s one way to tackle it using regular expressions and excel. I used this combination because after playing around with excel for a little while I realised that getting the word lists from the XML was far easier with regular expressions, whilst reconstructing the XML from Excel was a fairly simple task. So one more reason to learn a little more about how to use regular expressions!
I have created a 14 minute video below that covers the whole process from start to finish. But for those of you who just want a hint, or just want to check whether this is new to you or not, here’s what I did in a nutshell:
What we needed to get from the XML?
When you export the autocorrect file from Studio you will have a number of entries like this which you’d really rather have as text without the XML syntax:
All the autocorrect entries sit between the <AutoCorrectEntries> node. The text you’re interested in is inside the attribute entries for the incorrect entry, src=“abbout”, and the correct entry, trg=“about”. The easiest way I found to get these as simple text entries was to use regular expressions which you can apply in Notepad++, or EditPad which I use for example. So I searched for this (slightly different in the video as I used s here in case you wanted to copy paste and didn’t realise you needed the spaces at the start, which you do):
ssssss<Entry src="(.*?)" trg="(.*?)" />
Then replaced with:
$1
This gave me the “incorrect entries” as a list. I then undid the results of the search replace operation and immediately searched again with:
$2
That gave me the “corrected” entries. I pasted them both into excel so now I have something like the image on the left with a separate column for each entry in the autocorrect list.
That’s it for the regular expression part… pretty straightforward I think but if you want to learn a little more there are plenty of resources available on the web in addition to a few articles I’ve written in the past specifically on the use of regular expressions for Studio. It might be helpful to see where you can use these sorts of techniques in Studio if you need more of an incentive to learn.
So now you have your Excel spreadsheet which you can use to add/edit/amend the entries as you see fit.
What we have to do to get the Excel back into XML?
The task now, once you’re ready to re-import your Excel into the autocorrect settings in Studio, is to get your XML syntax back. So to do this I just laid the Excel sheet out like this:
This is all text. Note that my autocorrect entries are now in column B and D. The other columns contain the syntax that makes up this line in XML:
<Entry src=“abbout“ trg=“about“ />
To put this back together I just concatenated the cells with this expression in cell G1:
=$A$1&B1&$C$1&D1&$E$1
Looks complicated but it’s actually very simple as you’ll see in the video. I can copy this formula down the spreadsheet and I then have all the text items expressed in the correct XML syntax so they can be copied back into the XML file. It’s a good idea to keep a copy of the original XML export and don’t mess around with that just in case you break something and can’t import your entries back in, or corrupt the autocorrect file itself.
That’s it, not too tricky, and it will allow you to work on your autocorrect entries in a more productive environment than one at a time in Studio. The full end to end process is in the video below… hope this is useful!!
Excellent as usual Paul. It gives ideas for other things to do, using that same technique.
Excellent… do share your ideas!
Hi Paul,
As you know I am not a Studio user, but I think there is a quicker and safer way to get the data in and out of Excel. Would you mind sending me a copy of the xml file so that I can run a couple of test to check if I am right. Thank you.
Hello Hacène, absolutely. Always interested to learn about an easier way to do something without corrupting the files. I put a copy of the file here.
Hello Paul,
Thank you for the file.
As I suspected, you can import directly the xml file into Excel.
Export is a bit trickier as you might need to write a small filescripting macro (I can see if I can write a quick one for you)
To import into Excel:
Open Excel and create a new workbook. Then go to Data > From Text (in Get External Data).
Change the type of file in the dialog to *.*
Select your file.
In the dialog Page 1 select “Delimited”
Page 2 Select Other and put ”
Page 3 do nothing
Click on FInish
This will import the integrity of the file. As such, the first segments will contain the XML header
Sdl.TranslationStudio.AutoCorrect
1033
en-US
<![CDATA[
1033
en-US
true
true
true
false
But then you will have the autocorrect entries (without quotes) spread other 5 columns:
You will also see after the autocorrect entries that there are other useful data
a.
abbr.
abs.
and after that you will find the end of the xml
true
]]>
So, if you want to separate/split the cells containing ”
EndString = “</"
For i = 1 To LastRow
If InStr(1, Ws.Cells(i, 1).Value, " “) 0 Then
If InStr(1, Ws.Cells(i, 1).Value, StartString) 0 And InStr(1, Ws.Cells(i, 1).Value, EndString) 0 Then
MyLen = InStr(1, Ws.Cells(i, 1).Value, EndString) – InStr(1, Ws.Cells(i, 1).Value, StartString) – 1
Ws.Cells(i, 3).Value = Right(Ws.Cells(i, 1).Value, Len(Ws.Cells(i, 1).Value) – InStr(1, Ws.Cells(i, 1).Value, EndString))
Ws.Cells(i, 2).Value = Mid(Ws.Cells(i, 1).Value, InStr(1, Ws.Cells(i, 1).Value, StartString) + 1, MyLen)
Ws.Cells(i, 1).Value = Left(Ws.Cells(i, 1).Value, InStr(1, Ws.Cells(i, 1).Value, StartString))
End If
End If
Next
End Sub
The next two codes that can be copied and pasted in a module in the VBA are made to
1) concatenate the split content taking care of adding back the quote mark where necessary:
Public Sub ConcatenateEntries()
Dim Wb As Workbook
Dim Ws As Worksheet
Dim i As Long
Dim LastRow As Long
Set Wb = ThisWorkbook
Set Ws = Wb.Sheets(1)
LastRow = Ws.Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To LastRow
If InStr(1, Ws.Cells(i, 1).Value, “<string") = 0 Then
With Ws
For j = 1 To 5
If .Cells(i, j).Value vbNullString And j 1 Then
.Cells(i, 1).Value = .Cells(i, 1).Value & Chr(34) & .Cells(i, j).Value
.Cells(i, j).Value = vbNullString
Else
.Cells(i, 1).Value = .Cells(i, 1).Value
End If
Next
End With
Else
With Ws
For j = 1 To 5
If .Cells(i, j).Value vbNullString And j 1 Then
.Cells(i, 1).Value = .Cells(i, 1).Value & .Cells(i, j).Value
.Cells(i, j).Value = vbNullString
Else
.Cells(i, 1).Value = .Cells(i, 1).Value
End If
Next
End With
End If
Next
End Sub
2) Write the file to a text file. For this code, you will need to add a specific path. And post export, you will need to do 2 operations:
a) remove all “” and replace them with ”
b) remove the initial ” (first character in the file) and the last 2 (last character in the file)
c) save the file as xml
Public Sub FileScripting()
Dim Wb As Workbook
Dim Ws As Worksheet
Dim FilePath As String
Dim CellData As String
Dim LastCol As Long
Dim LastRow As Long
Set Wb = ThisWorkbook
Set Ws = Wb.Sheets(1)
LastRow = Ws.Cells(Rows.Count, 1).End(xlUp).Row
‘CHANGE THE PATH TO WHATEVER YOU NEED. THE FILE DOES NOT HAVE TO EXIST
FilePath = “C:UsersMYNameDesktopXML testmultifariousautocorrectTest.txt”
Open FilePath For Output As #1
For i = 1 To LastRow
If i 1 Then
CellData = CellData & vbNewLine & Ws.Cells(i, 1).Value
Else
CellData = Ws.Cells(i, 1).Value
End If
Next
Write #1, CellData
End Sub
Another solution is to create an XML schema and import the file directly as an XML into Excel.
HTH,
Hacène
Thanks Hacène, that is very interesting… but to be honest I think I like my method better. It’s more focused to the task at hand and I can see exactly what it’s doing. Looks clever though and perhaps this will be useful for others. Thank you for sharing this!!
HI Paul,
My pleasure. It is also possible to write a macro to pick only the data you need from the xml file and to relocate the data into the original file (still being cautious with the double quote issue from VBA scripting).
The method I suggested can be used in a special Excel file used solely for the purpose of handling AutoCorrect entries.
As I pointed out, the XML file exported from Studio contains much more than only AutoCorrect entries.
You can also post import write a macro to export to a new worksheet only autocorrect entries something like:
Public Sub CopyAutoCorrectEntriesToNewSheet()
Dim Wb As Workbook
Dim Ws1 As Worksheet, Ws2 As Worksheet
Dim i As Long, j As Long
Dim LastRow As Long
Set Wb = ThisWorkbook
Set Ws1 = Wb.Sheets(1)
If Wb.Worksheets.Count > 1 Then
Set Ws2 = Wb.Worksheets(2)
Else
Set Ws2 = Wb.Worksheets.Add
End If
LastRow = Ws1.Cells(Rows.Count, 1).End(xlUp).Row
j = 1
For i = 1 To LastRow
If InStr(1, Ws1.Cells(i, 1).Value, “<Entry src=") 0 Then
Ws2.Cells(j, 1).Value = i
Ws2.Cells(j, 2).Value = Ws1.Cells(i, 2).Value
Ws2.Cells(j, 3).Value = Ws1.Cells(i, 4).Value
j = j + 1
End If
Next
End Sub
And one when you have finished working on your AutoCorrect Entries like:
Public Sub RelocateAutoCorrectEntriesIntoSheetOne()
Dim Wb As Workbook
Dim Ws1 As Worksheet, Ws2 As Worksheet
Dim i As Long
Dim LastRow As Long
Set Wb = ThisWorkbook
Set Ws1 = Wb.Sheets(1)
If Wb.Worksheets.Count < 2 Then
Exit Sub
Else
Set Ws2 = Wb.Worksheets(2)
End If
LastRow = Ws2.Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To LastRow
Ws1.Cells(Ws2.Cells(i, 1).Value, 2).Value = Ws2.Cells(j, 2).Value
Ws1.Cells(Ws2.Cells(i, 1).Value, 4).Value = Ws2.Cells(j, 3).Value
Next
End Sub
There is ALWAYS more than one way to skin a cat-(tool)
BR,
Hacène
Hi Paul, thanks for this interesting article! It’s good to be able to see what’s going on in the background.
I’ve noticed that the entire Autocorrect feature in Studio appears not to support hyphens. You can add a hyphenated word, but it won’t actually make the correction in the Editor when you type. What’s more, in the “Exceptions” list for correcting incorrect capitalisation the system won’t even let you add a hyphenated word in the first place (the “Add” button is greyed out).
Will SDL be improving this function in the future?
Best regards,
Luke