Information 4.0… we’re all doomed!

All doomed?  What exactly does that mean and why am I writing about it?  Over the last year I’ve been back at school studying for the TCLoc Masters degree at the University of Strasbourg (an excellent program if you’re wondering!). A module we’re currently working through is Information 4.0 and this… I think I can safely say this… has provoked more discussion and emotion than any of us expected.  This is partially because Ray Gallon asked us at the start of the course how we felt about artificial intelligence and looked at it in the broader sense and not just within the localization arena.  Now, as interesting as it is I don’t propose to make this a really wide discussion, although you should feel free to continue the discussion in the comments if you have strong feelings about it, but I would like to explore a few things I’ve been thinking about that are related and perhaps closer to the topics I usually write about.

AI will replace all our jobs soon!

I also attended TEKOM in Stuttgart this week as part of the TCLoc course and the theme there was very much AI.  The couple of days I spent there helped put a few things into perspective for me… the first thing being many of the grand claims being made by many companies on display is little more than marketing hype.  After all, AI sells!  I am of course being very dismissive here, but an important take home for me is that today AI is actually split into two broad areas:

  • Narrow Artificial Intelligence
  • General Artificial Intelligence

Narrow AI is basically used to describe technology that supports the ability of a machine to out perform a human in a very specific task, or set of tasks.  This can be anything from simple automation in a translation workflow to very sophisticated tasks such as Neural Machine Translation, or self driving cars.  So we certainly do have this already.

General Artificial Intelligence on the other hand refers to the ability of a machine to have intelligence, learn for itself, deal with the unexpected etc.  The definition of what constitutes intelligence is also a discussion on its own… something else for the comments!  The headline here however is a comment made by Andy McDonald that we won’t see this within the next hundred years at least.  This is important context, if you agree with it, because it allows us to think more clearly and address progress for what it is, a natural evolution of some of the things we already do today.  Whether it matters if we ever get there or not is another discussion, because the impact these technologies could have on our lives is already apparent… and we may be sensible enough to never let it become too “intelligent.”

The opportunity

Thinking about this clearly is important because progress, particularly revolutionary progress, puts opportunities in our path that we probably didn’t even think about a year ago, and even more in the future that we haven’t thought about to date.  With this in mind where’s the opportunity today that will allay our fears?

Translation is all about content.  We regularly hear about the explosion in content and how there aren’t enough translators in the world to handle it all.  We probably discount some of these claims as marketing hype, and we probably worry that machine translation will swallow it all up anyway!  Well, I think this is partially true of course because there is always going to be huge amounts of content that wouldn’t ever get translated at all if it wasn’t for machine translation.  But there is also significant growth in content supporting the new businesses and technologies around AI where quality translation of the information being used to support a multilingual environment cannot be left entirely to machine translation as this could lead to serious failure in the future.

What am I talking about exactly… well given how far we are away from General AI we need to feed our automation and narrow AI solutions with a lot of information to help it handle the range of capabilities a human can offer.  At TEKOM I attended an excellent lecture from Alex Masycheff on feeding content for chatbots.  You may be thinking “so what?”… but chatbots today are just the tip of the iceberg.  Thinking about them gives you some indication of just how much content, good quality content, is needed to achieve the futuristic claims of companies delivering AI today.  For a chatbot to work “intelligently”, and not just be some simple, limited, prescribed question and answer solution that can just as easily irritate you as provide a solution, the amount of content required is staggering.

There is an initiative called Cyc which has been building a common sense knowledgebase for the last 35-years.  This isn’t like your typical knowledgebase where you find articles on how to solve a problem.  This one is based on an ontology with 1.5 million concepts, 25 million rules and assertions, all using these concepts, and then some domain specific extensions… all in all providing trillions of molecules of usable information.  I read a good quote here that gives you some idea of how hard it is to build something like this, “the sentence, “Napolean died in 1821 – Wellington was saddened” required two months to enter into the knowledge base all the information needed to explain the concepts of life and death!“.  This starts to put 35-years into context as well.

Now add to this the amount of information that would be required to ensure that the language being used to ask questions could be understood by a chatbot built on Cyc.  Local variations, poorly written or spoken language, spelling mistakes, omissions in answers that a human could see immediately etc. and the amount of content that is needed to deal with a problem touching several subjects has increased by a magnitude I wouldn’t even try to guess.

Finally make it multilingual!

So this is a lot of content, and it doesn’t even come close to what would be required to match the complexity of a humans ability to reason… and in multiple languages.  Simply translating these concepts and rules with machine translation isn’t enough.  So to get back to my point in this section, about opportunity.  We don’t really know what opportunities await us as these new technologies roll in, but we should be thinking about what they might be and we should be trying to learn as much as we can to try and find the opportunity and run with the technology rather than fight it.

At Tekom this year I saw a lot on ontology based terminology.  This seems an interesting opportunity and it’s clear why they are needed when you think about the development of AI and how it’s being used in every industry.  So far I don’t see any translation tool integrated with a solution like this… possibly Coreon although the plugin they have built for SDL Trados Studio has not been released yet… and they might not given they think the TM is dead 😉  It’s quite exciting to think about the opportunities involved here alone.  Certainly when I think about Multiterm… perhaps it tries to do too much (for translators), and yet not enough (for ontologists)!  What does it do for machine translation… not enough yet!  There’s already a growing demand for terminology to ensure the correct context in an automated way and several of the large machine learning vendors already support this.  Funny how this part of language technology has evolved.

I also wonder where this places the future of standards that people work so hard to agree upon.  We still hear a lot around TBX and yet I rarely hear discussion around the use of a more semantic based exchange such as T-Mint for example.  Probably because of the lack of tools to handle it… but surely it’ll come and when it does we’ll need human linguists and terminologists to manage it.  There are many very extensive terminology databases around and given the time it’s taken Cyc to produce the ontology they have I doubt people will start from scratch.  So it was all the more surprising for me at Tekom that the ontology vendors I spoke to, with the exception of Coreon obviously, didn’t seem to be focused on the multilingual aspect to the solutions they were selling… or maybe it was just the sales guys I got to speak with.  It’s always surprising how the language message takes so long to get through!

Coming back to machine translation.  Post-editing of machine translation for many languages is a different ball game than it was only a few years ago.  Today the fluency can be so good it’s harder to spot the mistakes, and if there are any it might be a complete mistranslation rather than a poorly worded sentence.  Relying on machine translation completely unchecked could be a mistake… so an opportunity might be how do we get through more content for review?  Today’s translation environments are pretty much the same as they were when they were first introduced… just more bells and whistles.  But the fundamental design, and process, is unchanged.  I thought about this after speaking to Samah Ragab and Iryna Lebedyeva at the UTIC conference earlier this year… Samah has been pushing me for foot pedal integration into Studio for a couple of years, and Iryna showed me how she uses Microsoft Read Aloud to review material more efficiently.  All this gave rise to another opportunity and SDL TTS was born!  We combined the two into one application that supports many usecases, but for me it was the opportunity to review more content in less time and with less stress.  In fact UTIC was a breeding ground for ideas and we’re working on another at the moment!

The retired

While we’re talking about speech… all of these innovations have made me rethink the use of speech to text technology in this industry.  Typically the main use at the moment is in dictating the translations. I think the number of translators doing this today is still small, although perhaps significant, and the most vocal proponents like to talk about how many more words a day they can translate using these methods.  Well.. I think this has no future at all!  I think this is a technology that has a great future in general, but perhaps not as a productivity aide to translating.

The advancements in machine translation, well adopted these days by translators who wouldn’t of dreamed about using it a few years ago, means you can start your work with a fully translated set of documents.  Post-editing isn’t easy using voice and it’s faster with a keyboard.  There’s also limited language support for voice, although this is slowly improving, but nowhere close to the speed at which machine translation vendors add new languages to their repertoire.

I understand there may be medical reasons for preferring to use voice and this won’t go away… I understand there is content that may not translate well (yet) with machine translation… and I understand there are translators who are set in their ways.  But with the advancements we are seeing in machine learning I don’t see a future for speech to text as a productivity boost.  Only the reverse as with SDL TTS.  What’s your opinion on this?

To sum up…

All in all I like to focus on the opportunity, and even though we read all the time about how these new technologies will steal our jobs I think it’s important to think about the new ones that will come around as a result of these advancements. There’s one thing for sure… we can’t stop the progress but we can help to shape it.

It’s always easy to complain about things and be negative… much harder to see the opportunity.  But now’s the time to start looking and if we keep a sense of perspective and realism around the real capabilities of a machine I think there will be plenty… embrace the change!

0 thoughts on “Information 4.0… we’re all doomed!

    1. I think embracing the change means understanding the good and bad potential that AI brings to the table, and trying to realise the good. It’s important to think about them both. There’s a good article in the Guardian published earlier this year that addresses the potential for bad, and how important it is to deal with this aspect, and touches on some of the things we need to remember such as AI is what we make it. I also think it’s hugely important to be thinking about this bigger picture. But in the meantime we all have very short term (comparatively speaking) objectives and in order to carry on working and making a living we have to think about the opportunity around us to benefit from it.

  1. Hi Paul,
    I would like to comment on the position of Coreon that the TM is dead – I don’t buy that. A large part of my translation work consists in updating technical service documents where large parts of the document (sometimes more than 90%) rests unchanged, others are made up as a combination of parts of some prior documents, and I would be at a big loss if the translation of all that unchanged text would not be immediately available from a TM, but had to be retranslated and reviewed again and again – not to mention the consistency requirements posed by the technical industry.

    1. Me neither Henk… I added that tongue in cheek and I’m pretty sure Jochen Hummel didn’t mean it in the way so many take it either. Some of what he said makes sense and I’d agree that the tools we provide today need addressing to support post-editing more effectively. But we’re nowhere close to being able to rely on NMT entirely for all language combinations, and of course the usecase you provide is solid… at least until MT learns well enough from what you have done that it doesn’t need the traditional TM anymore because it sort of is one. But we don’t have that yet either. He certainly made the headlines though… probably his main aim.

  2. Hi Paul,
    nice article, with lots of different aspects and truths!

    Regarding your conclusions, you would have met a lot of like-minded people at the BDÜ conference last weekend in Bonn ( I was really positively surprised to hear and see how many translators have stopped demonising MT and start looking for and seeing alternative opportunities, like SEO translation (which combines creativity and the use of web technologies).
    MultiTerm was mentioned to be an important tool in SEO translation, and as SEO is also related to ontologies I also think it could also do more here – in addition to some bugs and shortcomings being fixed;-)

    In his presentation at Asling 2018, Arle Lommel from CSA stated “Terminology management remains stuck in the 1990s” – this somehow complements your view on the lacking multilinguality of ontology systems (“It’s always surprising how the language message takes so long to get through!”). Marrying the two is not easy, this had already been realized in the 1990s as I remember from my studies in computational linguistics back then.

    And regarding machine translation and the fact that language technology has evolved regarding “terminology to ensure the correct context in an automated way”: I would not say that all this is really a new evolution given the fact that MT vendors like Systran and Lucy LT (former METAL) had developed and maintained sophisticated rule-based MT dictionaries which they are now re-using for such purposes (introducing linguistic knowledge) in their NMT preprocessing pipelines.

    Kind regards

    1. Thanks Christine… I think your experience at the BDÜ mirrors what I see every day. I rarely come across the sort of response to MT we used to see a few years ago. On the NMT and dictionaries… I think the difference is that these features are being asked for and provided to everyone. The use of dictionaries has been around and mainly aimed at Enterprise use, but this is changing as we not only see people being more comfortable with MT but also more comfortable working in the cloud. I think the latter has a way to go and offline working needs to be available as much as possible, but the introduction of powerful new technologies which are only accessible online is slowly dragging the older ways of working into the future.

Leave a Reply