correcting autocorrect!

001Wow… how time flies!  Over three years ago I wrote an article called AutoCorrect… for everything! which explained how to use AutoHotkey so you had a similar functionality to Microsoft Word for autocorrect, except it worked in all your windows applications.  This was, and still is, pretty cool I think and I still use autohotkey today for many things, and not just autocorrect.  Since writing that article we released Studio 2015, and in fact Studio 2017 is just around the corner, so it was a while back and some things have moved on.  For example, Studio 2015 introduced an autocorrect feature into Studio which meant things should be easier for all Studio users, especially if they had not come across autohotkey before.

At least I thought it would be easier!  I was reminded yesterday with a comment to my earlier article that if you wanted to edit and maintain the autocorrect  list somewhere else, somewhere it might be more accessible then the process is not so simple.  Yes there is a import/export feature for autocorrect which you can find here:

002

But this list is not a simple text file… rather it’s XML.  This of course is not insurmountable, but it’s not immediately obvious as the comment to my previous article made clear.  So here’s one way to tackle it using regular expressions and excel.  I used this combination because after playing around with excel for a little while I realised that getting the word lists from the XML was far easier with regular expressions, whilst reconstructing the XML from Excel was a fairly simple task.  So one more reason to learn a little more about how to use regular expressions!

I have created a 14 minute video below that covers the whole process from start to finish.  But for those of you who just want a hint, or just want to check whether this is new to you or not, here’s what I did in a nutshell:

What we needed to get from the XML?

When you export the autocorrect file from Studio you will have a number of entries like this which you’d really rather have as text without the XML syntax:

003

All the autocorrect entries sit between the <AutoCorrectEntries> node.  The text you’re interested in is inside the attribute entries for the incorrect entry, src=“abbout”, and the correct entry, trg=“about”.  The easiest way I found to get these as simple text entries was to use regular expressions which you can apply in Notepad++, or EditPad which I use for example.  So I searched for this (slightly different in the video as I used s here in case you wanted to copy paste and didn’t realise you needed the spaces at the start, which you do):

ssssss<Entry src="(.*?)" trg="(.*?)" />

Then replaced with:

$1

This gave me the “incorrect entries” as a list.  I then undid the results of the search replace operation and immediately searched again with:

$2

004That gave me the “corrected” entries.  I pasted them both into excel so now I have something like the image on the left with a separate column for each entry in the autocorrect list.

That’s it for the regular expression part… pretty straightforward I think but if you want to learn a little more there are plenty of resources available on the web in addition to a few articles I’ve written in the past specifically on the use of regular expressions for Studio.  It might be helpful to see where you can use these sorts of techniques in Studio if you need more of an incentive to learn.

So now you have your Excel spreadsheet which you can use to add/edit/amend the entries as you see fit.

What we have to do to get the Excel back into XML?

The task now, once you’re ready to re-import your Excel into the autocorrect settings in Studio, is to get your XML syntax back.  So to do this I just laid the Excel sheet out like this:

005

This is all text.  Note that my autocorrect entries are now in column B and D.  The other columns contain the syntax that makes up this line in XML:

<Entry src=abbout trg=about />

To put this back together I just concatenated the cells with this expression in cell G1:

=$A$1&B1&$C$1&D1&$E$1

Looks complicated but it’s actually very simple as you’ll see in the video.  I can copy this formula down the spreadsheet and I then have all the text items expressed in the correct XML syntax so they can be copied back into the XML file.  It’s a good idea to keep a copy of the original XML export and don’t mess around with that just in case you break something and can’t import your entries back in, or corrupt the autocorrect file itself.

That’s it, not too tricky, and it will allow you to work on your autocorrect entries in a more productive environment than one at a time in Studio.  The full end to end process is in the video below… hope this is useful!!

0 thoughts on “correcting autocorrect!

  1. Hi Paul,
    As you know I am not a Studio user, but I think there is a quicker and safer way to get the data in and out of Excel. Would you mind sending me a copy of the xml file so that I can run a couple of test to check if I am right. Thank you.

  2. Hello Paul,

    Thank you for the file.
    As I suspected, you can import directly the xml file into Excel.
    Export is a bit trickier as you might need to write a small filescripting macro (I can see if I can write a quick one for you)

    To import into Excel:
    Open Excel and create a new workbook. Then go to Data > From Text (in Get External Data).
    Change the type of file in the dialog to *.*
    Select your file.
    In the dialog Page 1 select “Delimited”
    Page 2 Select Other and put ”
    Page 3 do nothing
    Click on FInish

    This will import the integrity of the file. As such, the first segments will contain the XML header

    Sdl.TranslationStudio.AutoCorrect
    1033
    en-US
    <![CDATA[

    1033
    en-US

    true
    true
    true
    false

    But then you will have the autocorrect entries (without quotes) spread other 5 columns:

    You will also see after the autocorrect entries that there are other useful data

    a.
    abbr.
    abs.

    and after that you will find the end of the xml

    true

    ]]>

    So, if you want to separate/split the cells containing ”
    EndString = “</"
    For i = 1 To LastRow
    If InStr(1, Ws.Cells(i, 1).Value, " “) 0 Then
    If InStr(1, Ws.Cells(i, 1).Value, StartString) 0 And InStr(1, Ws.Cells(i, 1).Value, EndString) 0 Then
    MyLen = InStr(1, Ws.Cells(i, 1).Value, EndString) – InStr(1, Ws.Cells(i, 1).Value, StartString) – 1
    Ws.Cells(i, 3).Value = Right(Ws.Cells(i, 1).Value, Len(Ws.Cells(i, 1).Value) – InStr(1, Ws.Cells(i, 1).Value, EndString))
    Ws.Cells(i, 2).Value = Mid(Ws.Cells(i, 1).Value, InStr(1, Ws.Cells(i, 1).Value, StartString) + 1, MyLen)
    Ws.Cells(i, 1).Value = Left(Ws.Cells(i, 1).Value, InStr(1, Ws.Cells(i, 1).Value, StartString))
    End If
    End If
    Next
    End Sub

    The next two codes that can be copied and pasted in a module in the VBA are made to
    1) concatenate the split content taking care of adding back the quote mark where necessary:

    Public Sub ConcatenateEntries()

    Dim Wb As Workbook
    Dim Ws As Worksheet
    Dim i As Long
    Dim LastRow As Long

    Set Wb = ThisWorkbook
    Set Ws = Wb.Sheets(1)

    LastRow = Ws.Cells(Rows.Count, 1).End(xlUp).Row

    For i = 1 To LastRow
    If InStr(1, Ws.Cells(i, 1).Value, “<string") = 0 Then
    With Ws
    For j = 1 To 5
    If .Cells(i, j).Value vbNullString And j 1 Then
    .Cells(i, 1).Value = .Cells(i, 1).Value & Chr(34) & .Cells(i, j).Value
    .Cells(i, j).Value = vbNullString
    Else
    .Cells(i, 1).Value = .Cells(i, 1).Value
    End If
    Next
    End With
    Else
    With Ws
    For j = 1 To 5
    If .Cells(i, j).Value vbNullString And j 1 Then
    .Cells(i, 1).Value = .Cells(i, 1).Value & .Cells(i, j).Value
    .Cells(i, j).Value = vbNullString
    Else
    .Cells(i, 1).Value = .Cells(i, 1).Value
    End If
    Next
    End With
    End If
    Next

    End Sub

    2) Write the file to a text file. For this code, you will need to add a specific path. And post export, you will need to do 2 operations:
    a) remove all “” and replace them with ”
    b) remove the initial ” (first character in the file) and the last 2 (last character in the file)
    c) save the file as xml

    Public Sub FileScripting()

    Dim Wb As Workbook
    Dim Ws As Worksheet
    Dim FilePath As String
    Dim CellData As String
    Dim LastCol As Long
    Dim LastRow As Long

    Set Wb = ThisWorkbook
    Set Ws = Wb.Sheets(1)

    LastRow = Ws.Cells(Rows.Count, 1).End(xlUp).Row

    ‘CHANGE THE PATH TO WHATEVER YOU NEED. THE FILE DOES NOT HAVE TO EXIST
    FilePath = “C:UsersMYNameDesktopXML testmultifariousautocorrectTest.txt”

    Open FilePath For Output As #1

    For i = 1 To LastRow
    If i 1 Then
    CellData = CellData & vbNewLine & Ws.Cells(i, 1).Value
    Else
    CellData = Ws.Cells(i, 1).Value
    End If
    Next
    Write #1, CellData

    End Sub

    Another solution is to create an XML schema and import the file directly as an XML into Excel.

    HTH,

    Hacène

    1. Thanks Hacène, that is very interesting… but to be honest I think I like my method better. It’s more focused to the task at hand and I can see exactly what it’s doing. Looks clever though and perhaps this will be useful for others. Thank you for sharing this!!

      1. HI Paul,
        My pleasure. It is also possible to write a macro to pick only the data you need from the xml file and to relocate the data into the original file (still being cautious with the double quote issue from VBA scripting).
        The method I suggested can be used in a special Excel file used solely for the purpose of handling AutoCorrect entries.
        As I pointed out, the XML file exported from Studio contains much more than only AutoCorrect entries.
        You can also post import write a macro to export to a new worksheet only autocorrect entries something like:

        Public Sub CopyAutoCorrectEntriesToNewSheet()

        Dim Wb As Workbook
        Dim Ws1 As Worksheet, Ws2 As Worksheet
        Dim i As Long, j As Long
        Dim LastRow As Long

        Set Wb = ThisWorkbook
        Set Ws1 = Wb.Sheets(1)

        If Wb.Worksheets.Count > 1 Then
        Set Ws2 = Wb.Worksheets(2)
        Else
        Set Ws2 = Wb.Worksheets.Add
        End If

        LastRow = Ws1.Cells(Rows.Count, 1).End(xlUp).Row

        j = 1
        For i = 1 To LastRow
        If InStr(1, Ws1.Cells(i, 1).Value, “<Entry src=") 0 Then
        Ws2.Cells(j, 1).Value = i
        Ws2.Cells(j, 2).Value = Ws1.Cells(i, 2).Value
        Ws2.Cells(j, 3).Value = Ws1.Cells(i, 4).Value
        j = j + 1
        End If
        Next

        End Sub

        And one when you have finished working on your AutoCorrect Entries like:

        Public Sub RelocateAutoCorrectEntriesIntoSheetOne()

        Dim Wb As Workbook
        Dim Ws1 As Worksheet, Ws2 As Worksheet
        Dim i As Long
        Dim LastRow As Long

        Set Wb = ThisWorkbook
        Set Ws1 = Wb.Sheets(1)

        If Wb.Worksheets.Count < 2 Then
        Exit Sub
        Else
        Set Ws2 = Wb.Worksheets(2)
        End If

        LastRow = Ws2.Cells(Rows.Count, 1).End(xlUp).Row

        For i = 1 To LastRow
        Ws1.Cells(Ws2.Cells(i, 1).Value, 2).Value = Ws2.Cells(j, 2).Value
        Ws1.Cells(Ws2.Cells(i, 1).Value, 4).Value = Ws2.Cells(j, 3).Value
        Next

        End Sub

        There is ALWAYS more than one way to skin a cat-(tool)

        BR,

        Hacène

  3. Hi Paul, thanks for this interesting article! It’s good to be able to see what’s going on in the background.
    I’ve noticed that the entire Autocorrect feature in Studio appears not to support hyphens. You can add a hyphenated word, but it won’t actually make the correction in the Editor when you type. What’s more, in the “Exceptions” list for correcting incorrect capitalisation the system won’t even let you add a hyphenated word in the first place (the “Add” button is greyed out).
    Will SDL be improving this function in the future?
    Best regards,
    Luke

Leave a Reply