The elusive regex with GPT-4

A DALL E generated image of running digitsWhilst the solving of regular expressions with ChatGPT seems like a great way to give yourself superpowers I have stayed away from writing about this usecase till now.  Yes, ChatGPT is great for those simple things that anyone with some basic knowledge could probably write themselves in the time it took to explain what was wanted.  But I like regular expressions… I’m definitely not a real expert, but I do like to play around with them and would consider myself above an average user.  So when I decided to test ChatGPT with a regular expression I asked it to solve something I have never been able to achieve on my own.  In fact I have never seen anyone else do this either… although I’m certain there are many people out there who would be very capable of doing it.  But when I’ve asked I have never had a satisfactory solution without using code, or without using multiple search & replace operations.

So why am I writing about this now?  Well, this evening ChatGPT was upgraded to GPT-4.  What does this mean?  In a nutshell it means it has become a lot smarter and is now multimodal which means it can handle various types of data, such as text, images, audio, video, or other sensor-based inputs.  How is it smarter?  The OpenAI website reported that ChatGPT scored in the 10th percentile on a uniform bar exam ( a standardized bar exam designed to test the knowledge and skills that are necessary to practice law in a wide range of jurisdictions in the United States.), whilst GPT-4 scored in the 90th percentile.  In what’s known as the Biology Olympiad (a series of tests, practical exams, and theoretical problems designed to challenge students’ knowledge, problem-solving skills, and understanding of biological concepts), the vision-powered GPT-4 scored in the 99th percentile, while ChatGPT finished in the 31st percentile.

There are other improvements of course, one significant one is that it is now capable of handling over 25,000 words of text which will support the creation of much longer content, extended conversations, and improved document search and analysis.  There’s more too, but the bit I’m interested in was my simple little test!

Not so simple test

This was my test.

Create a regular expression using .NET syntax suitable for use in EditPad Pro that can add thousand separators, using a comma, to any number that may or may not have a decimal separator, using a period, already.

Example data would be:

Input: 123456789.5678
Expected Output: 123,456,789.5678

Input: 3456789.567843356
Expected Output: 3,456,789.567843356

Input: 6789
Expected Output: 6,789

How hard can that be?  I think very hard given the efforts to date by anyone I ask, and by my own failures to solve it.  I often see users in the forums asking for this seemingly simple expression and always getting a response that never really provides the one expression to solve it all.

In previous iterations of ChatGPT I have never been able to solve this either, and it often made syntax errors as well.  So this exercise became my acid test for regex success.  In GPT-4 it still failed to do this correctly for the first four attempts, but on the fifth it seems to have got it right!  I’ll provide the full response (as it corrected itself) as it is a great learning experience:

Here’s the corrected regex pattern for use in EditPad Pro that won’t add separators after the decimal place:

Find what: (?<=\d)(?=(\d{3})+(?!\d)(?<!\.\d*)) Replace with: ,$0

This regex pattern breaks down as follows:

  • (?<=\d): a positive lookbehind assertion that checks if there’s a digit immediately before the current position
  • (?=...): a positive lookahead assertion that checks if the pattern inside the parentheses is present, but doesn’t consume any characters
  • (\d{3})+: matches a sequence of one or more groups of exactly three digits
  • (?!\d): a negative lookahead assertion that checks that there isn’t another digit after the matched group of three digits
  • (?<!\.\d*): a negative lookbehind assertion that checks that there isn’t a decimal point followed by zero or more digits before the matched group of three digits

Using this regex pattern in the Find and Replace dialog in EditPad Pro will add thousand separators to the given numbers without adding separators after the decimal point.

I used EditPad Pro as the context because it seemed easier to use software it probably knows more about and because it’s my preferred editor.  I could also be sure the same expression should work in Trados Studio as I could test the exact same syntax flavour.

Conclusion

I know this is really a trivial test of a tool like this, but the recent upgrade of ChatGPT to GPT-4 has led to significant improvements in its capabilities, including its performance in solving complex regular expressions.  While previous versions struggled to provide a satisfactory solution for the given regex problem, GPT-4 was ultimately able to generate the correct regex pattern for adding thousand separators without affecting the decimal point.  The successful solution of this problem may not really showcase the impressive advancements in GPT-4’s capability and its potential for providing valuable assistance in various fields, including text and data manipulation, but it does demonstrate it’s usefulness in supporting yet another task that we are regularly challenged with in localization projects.  As the AI continues to evolve, I think we can all expect even more refined and powerful results, making it an increasingly useful tool for a wide range of applications.

I think there is no doubt that AI is impacting the industry we work in and we are all going to experience a transformation in the nature of work, with a focus on higher-value tasks, post-editing, and collaboration with AI tools.  So I think that to stay competitive in this evolving landscape we do need to consider embracing AI technology and upskilling to handle more complex, specialized, or context-sensitive assignments.  Quite frankly, I have not witnessed anything so disruptive in my lifetime and the pace of change seems quite incredible in our digital world.

So if you’re not already using these type of AI resources and thinking about how you can use them to help you with your work, and also how you work and are less affected, then there’s no time like the present to get started.  Things will change with you or without you… better with you!

Social sharing, kindle or email....

Leave a Reply