As I’m getting lost in my own thoughts around just what to talk about next with regard to AI technologies and in particular ChatGPT… and as I’m pondering about the effect this is going to have on our industry I recalled a couple of questions around the use of XPath in the community. One of these questions was yesterday and it related to how to use XPath to extract one of the languages in a TMX file using the XML filetype in Trados Studio. Not a particularly tricky thing to do, and I imagined the user was just editing the content or maybe changing the language pair by translating one of the languages into something else, or something like that. But what struck me was the XPath expression he used.
It was quite a sophisticated use of XPath, and definitely used syntax I wouldn’t have used for this simple task:
//seg[contains(ancestor-or-self::*/@xml:lang,’en-US’)]
I would have just used this:
//tuv[@xml:lang=’en-US’]/seg
I tend to think simple first… but really liked the expression he came up with. So today this got me thinking about how well ChatGPT could be used to help with stuff like this, especially since there are always multiple ways to tackle things like this. So I asked this question in the ChatGPT interface:
Create five different XPath expressions to select the content of the seg element when the xml:lang attribute in the seg element is equal to en-US. An example XML file is this:
<?xml version=”1.0″ encoding=”utf-8″?>
<tmx version=”1.4″>
<header creationtool=”SDL Language Platform” creationtoolversion=”8.1″ o-tmf=”SDL TM8 Format” datatype=”xml” segtype=”sentence” adminlang=”en-US” srclang=”en-US” creationdate=”20230203T142912Z” creationid=”GLOBAL\pfilkin”>
<prop type=”x-Recognizers”>RecognizeAll</prop>
<prop type=”x-IncludesContextContent”>True</prop>
<prop type=”x-TMName”>en-ro (tmx project)</prop>
<prop type=”x-TokenizerFlags”>DefaultFlags</prop>
<prop type=”x-WordCountFlags”>DefaultFlags</prop>
</header>
<body>
<tu creationdate=”20230203T141938Z” creationid=”GLOBAL\pfilkin” changedate=”20230203T142257Z” changeid=”GLOBAL\pfilkin” lastusagedate=”20230203T142257Z”>
<prop type=”x-LastUsedBy”>GLOBAL\pfilkin</prop>
<prop type=”x-Context”>0, 0</prop>
<prop type=”x-Origin”>TM</prop>
<prop type=”x-ConfirmationLevel”>Translated</prop>
<tuv xml:lang=”en-US”>
<seg>Official Journal L 333/2022</seg>
</tuv>
</tu>
I actually gave it two TUs and not just the one above, I just shortened it here for brevity. The answers a couple of seconds later were these:
-
Using a predicate:
//seg[@xml:lang=’en-US’]/text()
This expression selects all seg elements that have an xml:lang attribute equal to ‘en-US’, and then returns the text content of the selected seg elements. -
Using the descendant:: axis:
//tuv[@xml:lang=’en-US’]/descendant::seg/text()
This expression selects all tuv elements that have an xml:lang attribute equal to ‘en-US’, and then selects all descendant seg elements of the selected tuv elements. Finally, it returns the text content of the selected seg elements. -
Using the ancestor:: axis:
//seg[ancestor::tuv[@xml:lang=’en-US’]]/text()
This expression selects all seg elements that have an ancestor tuv element with an xml:lang attribute equal to ‘en-US’, and then returns the text content of the selected seg elements. -
Using the self:: axis:
//tuv[@xml:lang=’en-US’]/self::*/seg/text()
This expression selects all tuv elements that have an xml:lang attribute equal to ‘en-US’, and then selects the seg elements that are children of the selected tuv elements. Finally, it returns the text content of the selected seg elements. -
Using the following-sibling:: axis:
//tuv[@xml:lang=’en-US’]/following-sibling::tuv[1]/seg/text()
This expression selects all tuv elements that have an xml:lang attribute equal to ‘en-US’, and then selects the seg element that is the first following sibling of each selected tuv element. Finally, it returns the text content of the selected seg elements. Note that this expression assumes that there is only one tuv element with the xml:lang attribute equal to ‘en-US’ in each tu element. If there are multiple such tuv elements, only the first one will be selected.
Pretty smart huh? Well almost… the first one is actually wrong as it doesn’t return anything because it’s expecting the xml:lang attribute to be in the seg element when it’s not. But the other four are all correct and provide a very nice lesson in how to use XPath to create a parser rule for this task.
But this brings me onto a very cool feature of this technology and why it’s called ChatGPT. You’ll only really see this properly if I share the “conversation”:
It took two more goes to correct itself, which is quite interesting given this was the easiest of all the expressions it attempted. I’m not sure why it struggles with this, but being able to go backwards and forwards in this way is very helpful. I even find myself wanting to be pleasant towards it as if I was talking to someone I know who is helping me out.
So of course I can also reciprocate…
Which of course is the same as the one our friend in the community forum came up with… so interesting that it never found the simpler solution. I think it’s probably because of my prompt… or original question since I didn’t mention the tuv element at all. The answers it provides are absolutely dependent on how useful the question is in providing the right context.
This is something we’ve been finding as we have been working on using AI to help answer questions accurately and quickly in the RWS Community by training the engine with relevant content. This has many challenges that I’ll write about once we’ve released our first solution, but one of them is users not providing enough information in the question to be able to help, or worse cause the provision of an answer that’s completely wrong! Users posting things like “Help, I have an error.”, or “My software doesn’t work” are extreme examples that we do actually see and engaging in a bit of detective work is all part of the task… but I hope you can see how this presents a challenge to a computer. ChatGPT does actually provide a really helpful answer without judgement at all… but in terms of actually helping with the real problem it’s not helpful at all. Try it!
That TMX was a basic challenge!
Of course I have to stretch things a little… so I recalled the most challenging question on XPath I have had, and it was actually one I was unable to answer myself and I had to seek help from stackoverflow. If you’re interested the community question was this one, and the stackoverflow solution was here.
I hope you can also see the difference in how to ask the question… I couldn’t ask for help over there without making sure I provided enough information the first time around. Stackoverflow experts are brilliant… but not always known for their patience if the question isn’t understandable!
So, I took my question and tested it against ChatGPT.
Create an XPath expression to satisfy the following criteria:
1. Extract text from <P> where countries=”AR”, other values are always ignored
2. Extract text from <P> where it’s parent element (in this example but it’s not always the case) contains AR in the countries attribute (countries=”AR,GB,US” for example)
3. Extract text from current element (<P> in this example, not always) when there is no countries attribute present in the current element or it’s ancestorsA sample XML to extract the text from is this:
<?xml version=”1.0″ encoding=”UTF-8″?>
<rootelement>
<Description>
<Note countries=”AR,GB,US” >
<P countries=”AR” >We want this one as it’s AR.</P>
<P countries=”US” >We don’t want this one as it’s not AR.</P>
<P countries=”GB” >We don’t want this either as it’s not AR.</P>
</Note>
</Description>
<Description>
<Note countries=”AR,GB,US” >
<P>Everyone in AR, GB and US gets to buy.</P>
<P>No restrictions for this product in these countries.</P>
</Note>
</Description>
<Description>
<Note>
<P>No country, that’s because it will be treated as AR.</P>
</Note>
</Description>
</rootelement>The result should be to extract the text from these elements:
<P countries=”AR” >We want this one as it’s AR.</P>
<P>Everyone in AR, GB and US gets to buy.</P>
<P>No restrictions for this product in these countries.</P>
<P>No country, that’s because it will be treated as AR.</P>
I won’t bore you with all the details, but suffice it to say it didn’t get it right the first time. For example, attempt #1:
//P[@countries=’AR’] | //P[contains(ancestor-or-self::*[@countries], ‘AR’) and not(@countries)]
This didn’t satisfy the 3rd criteria. Attempt #2:
//P[@countries=’AR’ or contains(ancestor-or-self::/@countries, ‘AR’) or not(ancestor-or-self::/@countries)]/text()
This didn’t satisfy the 1st criteria. Attempt #3:
//P[(@countries=”AR”) or (ancestor::[@countries and contains(concat(‘,’, @countries, ‘,’), ‘,AR,’)]) or (not(@countries) and not(ancestor::[@countries]))]/text()
This didn’t satisfy the 1st or the 3rd criteria. Attempt #4:
//P[@countries=’AR’ or ancestor::*[@countries][contains(@countries, ‘AR’)]=’AR’ or not(@countries)]
This one was correct!! Now it’s quite likely my question was still lacking detail, or formatted appropriately to help ChatGPT. This of course mirrors my earlier comment about providing the appropriate information to be able to receive good help. So I asked how I should have done this:
Sure! When asking a question like this, it’s important to be as specific and clear as possible about what you want the XPath expression to do, and what the desired output should look like. Here are a few tips that could help:
- State the problem you’re trying to solve: Give a clear explanation of what you’re trying to accomplish with the XPath expression. This will help the person answering your question to better understand your needs and provide a more tailored response.
- Provide a sample XML document: Include an example XML document that represents the type of data you’re working with. Make sure the example document covers all the different scenarios that you want the XPath expression to handle.
- Specify the criteria for selecting nodes: Be clear about the criteria you want to use to select nodes from the XML document. Include details such as which attributes should be considered, whether the search should include descendants or only immediate children, and any other relevant information.
- Give clear examples of desired output: Include examples of the type of output you expect from the XPath expression. This will help the person answering your question to ensure that their solution meets your needs.
By following these tips, you’ll be able to provide clear and specific information that will help ensure that you get the correct answer the first time.
Great advice… and the egotostical parts of me think I already did that… but I’m starting to have enough confidence in this technology for these sort of tasks that I’m pretty sure I didn’t do it well enough. Either way the advice is excellent and is something I think everyone can take on board when asking for help! You have to help others to help yourself!
Summary
Getting back to the point of this article I think this demonstrates another fantastic way to help every user get more from a sophisticated tool like Trados Studio. Most of the time the use cases are relatively simple for an experienced user to solve on their own, but with the help of AI technology like this even a relative novice is able to tackle some pretty sophisticated problems in their localization process.
I was thinking as I wrote this article how we are right now at a point where we are no longer just talking about the dangers of AI, we are all actually able to witness and experience its potential to change our lives in many areas. Yesterday, late afternoon, I was trying to help get an agenda completed for an event we’re planning and needed to write several interesting titles and synopsis for presentations that hadn’t been prepared yet. This same technology helped me do this in minutes, and I have to say far better than I could have done it myself!
It’s very difficult not to get excited about the capabilities of this technology but I think a balanced view is really important. We can’t ignore it and we have to think about the implications from all sides. Does it mean the technology is replacing me, or does it mean it’s simply making me more productive than I could possibly be without it? Could it do everything I do today? I think we’re still a long way off the sort of intelligence that could replace me completely… and in some ways I hope I don’t live to see it… and I’m not ready to go any time soon! At least not until I see more evidence of us being more responsible about peoples lives in the future and how we ensure everyone can live a fulfilling life when we have technology doing our work for us.
A few years ago I wrote an article called “Information 4.0… we’re all doomed!“. I think the ending paragraphs are still relevant today.
All in all I like to focus on the opportunity, and even though we read all the time about how these new technologies will steal our jobs I think it’s important to think about the new ones that will come around as a result of these advancements. There’s one thing for sure… we can’t stop the progress but we can help to shape it.
It’s always easy to complain about things and be negative… much harder to see the opportunity. But now’s the time to start looking and if we keep a sense of perspective and realism around the real capabilities of a machine I think there will be plenty… embrace the change!
I read this blog and found it interesting – although I have absolutely no idea what an XPath is, so a lot of it was lost on me. However, my experiences of ChatGPT have been similar when asking more complex issues. I also had to push the AI into the right direction a couple of times before getting the right answer. However, what I would really like to ask is if it will soon be possible to integrate ChatGPT into Studio as an MT service? At least working with Finnish, it does a better job than any other MT I have tried. I think Grammarly are working with them as well. I would even be willing to pay for such an add-on.
Maybe take a look at this article if you’re interested to learn what XPath is and how it’s relevant to Trados Studio?
On ChatGPT as an MT service… the answer is yes. We are working on a plugin you can learn about here that will have the ability to also work as an MT provider. But also note you could use it already courtesy of these two apps… Custom.MT and/or Intento.