Information 4.0… we’re all doomed!

All doomed?  What exactly does that mean and why am I writing about it?  Over the last year I’ve been back at school studying for the TCLoc Masters degree at the University of Strasbourg (an excellent program if you’re wondering!). A module we’re currently working through is Information 4.0 and this… I think I can safely say this… has provoked more discussion and emotion than any of us expected.  This is partially because Ray Gallon asked us at the start of the course how we felt about artificial intelligence and looked at it in the broader sense and not just within the localization arena.  Now, as interesting as it is I don’t propose to make this a really wide discussion, although you should feel free to continue the discussion in the comments if you have strong feelings about it, but I would like to explore a few things I’ve been thinking about that are related and perhaps closer to the topics I usually write about.

AI will replace all our jobs soon!

I also attended TEKOM in Stuttgart this week as part of the TCLoc course and the theme there was very much AI.  The couple of days I spent there helped put a few things into perspective for me… the first thing being many of the grand claims being made by many companies on display is little more than marketing hype.  After all, AI sells!  I am of course being very dismissive here, but an important take home for me is that today AI is actually split into two broad areas:

  • Narrow Artificial Intelligence
  • General Artificial Intelligence

Narrow AI is basically used to describe technology that supports the ability of a machine to out perform a human in a very specific task, or set of tasks.  This can be anything from simple automation in a translation workflow to very sophisticated tasks such as Neural Machine Translation, or self driving cars.  So we certainly do have this already.

General Artificial Intelligence on the other hand refers to the ability of a machine to have intelligence, learn for itself, deal with the unexpected etc.  The definition of what constitutes intelligence is also a discussion on its own… something else for the comments!  The headline here however is a comment made by Andy McDonald that we won’t see this within the next hundred years at least.  This is important context, if you agree with it, because it allows us to think more clearly and address progress for what it is, a natural evolution of some of the things we already do today.  Whether it matters if we ever get there or not is another discussion, because the impact these technologies could have on our lives is already apparent… and we may be sensible enough to never let it become too “intelligent.”

The opportunity

Thinking about this clearly is important because progress, particularly revolutionary progress, puts opportunities in our path that we probably didn’t even think about a year ago, and even more in the future that we haven’t thought about to date.  With this in mind where’s the opportunity today that will allay our fears?

Translation is all about content.  We regularly hear about the explosion in content and how there aren’t enough translators in the world to handle it all.  We probably discount some of these claims as marketing hype, and we probably worry that machine translation will swallow it all up anyway!  Well, I think this is partially true of course because there is always going to be huge amounts of content that wouldn’t ever get translated at all if it wasn’t for machine translation.  But there is also significant growth in content supporting the new businesses and technologies around AI where quality translation of the information being used to support a multilingual environment cannot be left entirely to machine translation as this could lead to serious failure in the future.

What am I talking about exactly… well given how far we are away from General AI we need to feed our automation and narrow AI solutions with a lot of information to help it handle the range of capabilities a human can offer.  At TEKOM I attended an excellent lecture from Alex Masycheff on feeding content for chatbots.  You may be thinking “so what?”… but chatbots today are just the tip of the iceberg.  Thinking about them gives you some indication of just how much content, good quality content, is needed to achieve the futuristic claims of companies delivering AI today.  For a chatbot to work “intelligently”, and not just be some simple, limited, prescribed question and answer solution that can just as easily irritate you as provide a solution, the amount of content required is staggering.

There is an initiative called Cyc which has been building a common sense knowledgebase for the last 35-years.  This isn’t like your typical knowledgebase where you find articles on how to solve a problem.  This one is based on an ontology with 1.5 million concepts, 25 million rules and assertions, all using these concepts, and then some domain specific extensions… all in all providing trillions of molecules of usable information.  I read a good quote here that gives you some idea of how hard it is to build something like this, “the sentence, “Napolean died in 1821 – Wellington was saddened” required two months to enter into the knowledge base all the information needed to explain the concepts of life and death!“.  This starts to put 35-years into context as well.

Now add to this the amount of information that would be required to ensure that the language being used to ask questions could be understood by a chatbot built on Cyc.  Local variations, poorly written or spoken language, spelling mistakes, omissions in answers that a human could see immediately etc. and the amount of content that is needed to deal with a problem touching several subjects has increased by a magnitude I wouldn’t even try to guess.

Finally make it multilingual!

So this is a lot of content, and it doesn’t even come close to what would be required to match the complexity of a humans ability to reason… and in multiple languages.  Simply translating these concepts and rules with machine translation isn’t enough.  So to get back to my point in this section, about opportunity.  We don’t really know what opportunities await us as these new technologies roll in, but we should be thinking about what they might be and we should be trying to learn as much as we can to try and find the opportunity and run with the technology rather than fight it.

At Tekom this year I saw a lot on ontology based terminology.  This seems an interesting opportunity and it’s clear why they are needed when you think about the development of AI and how it’s being used in every industry.  So far I don’t see any translation tool integrated with a solution like this… possibly Coreon although the plugin they have built for SDL Trados Studio has not been released yet… and they might not given they think the TM is dead 😉  It’s quite exciting to think about the opportunities involved here alone.  Certainly when I think about Multiterm… perhaps it tries to do too much (for translators), and yet not enough (for ontologists)!  What does it do for machine translation… not enough yet!  There’s already a growing demand for terminology to ensure the correct context in an automated way and several of the large machine learning vendors already support this.  Funny how this part of language technology has evolved.

I also wonder where this places the future of standards that people work so hard to agree upon.  We still hear a lot around TBX and yet I rarely hear discussion around the use of a more semantic based exchange such as T-Mint for example.  Probably because of the lack of tools to handle it… but surely it’ll come and when it does we’ll need human linguists and terminologists to manage it.  There are many very extensive terminology databases around and given the time it’s taken Cyc to produce the ontology they have I doubt people will start from scratch.  So it was all the more surprising for me at Tekom that the ontology vendors I spoke to, with the exception of Coreon obviously, didn’t seem to be focused on the multilingual aspect to the solutions they were selling… or maybe it was just the sales guys I got to speak with.  It’s always surprising how the language message takes so long to get through!

Coming back to machine translation.  Post-editing of machine translation for many languages is a different ball game than it was only a few years ago.  Today the fluency can be so good it’s harder to spot the mistakes, and if there are any it might be a complete mistranslation rather than a poorly worded sentence.  Relying on machine translation completely unchecked could be a mistake… so an opportunity might be how do we get through more content for review?  Today’s translation environments are pretty much the same as they were when they were first introduced… just more bells and whistles.  But the fundamental design, and process, is unchanged.  I thought about this after speaking to Samah Ragab and Iryna Lebedyeva at the UTIC conference earlier this year… Samah has been pushing me for foot pedal integration into Studio for a couple of years, and Iryna showed me how she uses Microsoft Read Aloud to review material more efficiently.  All this gave rise to another opportunity and SDL TTS was born!  We combined the two into one application that supports many usecases, but for me it was the opportunity to review more content in less time and with less stress.  In fact UTIC was a breeding ground for ideas and we’re working on another at the moment!

The retired

While we’re talking about speech… all of these innovations have made me rethink the use of speech to text technology in this industry.  Typically the main use at the moment is in dictating the translations. I think the number of translators doing this today is still small, although perhaps significant, and the most vocal proponents like to talk about how many more words a day they can translate using these methods.  Well.. I think this has no future at all!  I think this is a technology that has a great future in general, but perhaps not as a productivity aide to translating.

The advancements in machine translation, well adopted these days by translators who wouldn’t of dreamed about using it a few years ago, means you can start your work with a fully translated set of documents.  Post-editing isn’t easy using voice and it’s faster with a keyboard.  There’s also limited language support for voice, although this is slowly improving, but nowhere close to the speed at which machine translation vendors add new languages to their repertoire.

I understand there may be medical reasons for preferring to use voice and this won’t go away… I understand there is content that may not translate well (yet) with machine translation… and I understand there are translators who are set in their ways.  But with the advancements we are seeing in machine learning I don’t see a future for speech to text as a productivity boost.  Only the reverse as with SDL TTS.  What’s your opinion on this?

To sum up…

All in all I like to focus on the opportunity, and even though we read all the time about how these new technologies will steal our jobs I think it’s important to think about the new ones that will come around as a result of these advancements. There’s one thing for sure… we can’t stop the progress but we can help to shape it.

It’s always easy to complain about things and be negative… much harder to see the opportunity.  But now’s the time to start looking and if we keep a sense of perspective and realism around the real capabilities of a machine I think there will be plenty… embrace the change!

AdaptiveMT… what’s the score?

AdaptiveMT was released with Studio 2017 introducing the ability for users to adapt the SDL Language Cloud machine translation with their own preferred style on the fly.  Potentially this is a really powerful feature since it means that over time you should be able to improve the results you see from your SDL Language Cloud machine translation and reduce the amount of post editing you have to do.  But in order to be able to release this potential you need to know a few things about getting started.  Once you get started you may also wonder what the analysis results are referring to when you see values appearing against the AdaptiveMT rows in your Studio analysis report.  So in this article I want to try and walk through the things you need to know from start to finish… quite a long article but I tried to cover the things I see people asking about so I hope it’s useful.

Continue reading “AdaptiveMT… what’s the score?”

Spot the difference!

001I don’t know if you can recall these games from when you were a kid?  I used to spend hours trying to find all the differences between the image on the left and the one on the right.  I never once thought how that might become a useful skill in later life… although in some cases it’s a skill I’d rather not have to develop!

You may be wondering where I’m going with this so I’ll explain.  Last weekend the SFÖ held a conference in Umeå, Sweden… I wasn’t there, but I did get an email from one of my colleagues asking how you could see what changes had been made in your bilingual files as a result of post-editing Machine Translation.  The easy answer of course is to do the post-editing with your track changes switched on, then it’s easy to spot the difference.  That is useful, but it’s not going to help with measurement, or give you something useful to be able to discuss with your client.  It’s also not going to help if you didn’t work with tracked changes in the first place because you’d need some serious spot the difference skills to evaluate your work!

Continue reading “Spot the difference!”

Using the SDL Community

001Last week I spent a few days in Amsterdam talking community with a group of SDL people.  We were there to see how we can shape the community and make it a place where anyone using our products, or just thinking about using them, will be able to find what they need, talk about them or just share experiences in a safe friendly environment.  Actually it’s a lot more than a safe friendly environment… it’s the only place where you can say what you think and guarantee it’ll be seen by the right people in SDL.  This could be product managers, developers, support engineers, sales guys, marketing teams, the CEO of the company… and even I have a part to play!  It’s also full of real product experts… so your peers who have years of experience and know how the products behave.  Things don’t always work the way it says in the book, and the book definitely doesn’t cover everything that’s possible!  But if you have a question, more than likely it’ll be something your fellow community members have come across before, and if they haven’t there’s a good chance they’ll have something interesting to say about it! Continue reading “Using the SDL Community”

MT or not MT?

01Machine Translation or not Machine Translation… is this the question?  It’s a good question and one that gets discussed at length in many places, but it’s not the question I want to consider today.  Machine Translation has its place and it’s a well established part of the translation workflow for many professionals today.  The question I want to consider today is whether you should hide the fact you are using Machine Translation or not?

This is a question that comes up from time to time and it has consumed my thoughts this evening quite a bit, particularly after a discussion in a ProZ forum this afternoon, that’s still running after three years, so I decided to take a step back and think about my position on this question and whether I’m being unreasonable or not.  My position at the start of this article is that you should not hide the fact you are using Machine Translation. Continue reading “MT or not MT?”

The ins and outs of AutoSuggest

001The AutoSuggest feature in Studio has been around since the launch of Studio 2009 and based on the questions I see from time to time I think it’s a feature that could use a little explanation on what it’s all about.  In simple terms it’s a mechanism for prompting you as you type with suggested target text that is based on the source text of the document you are translating.  So sometimes it might be a translation of some or all of the text in the source segment, and sometimes it might be providing an easy way to replicate the source text into the target.  This is done by you entering a character via the keyboard and then Studio suggests suitable text that can be applied with a single keystroke.  In terms of productivity this is a great feature and given how many other translation tools have copied this in one form or another I think it’s clear it really works too!

AutoSuggest comes from a number of different sources, some out of the box with every version of the product, and some requiring a specific license.  The ability to create resources for AutoSuggest is also controlled by license for some things, but not for all.  When you purchase Studio, any version at all, you have the ability to use the AutoSuggest resources out of the box from three places: Continue reading “The ins and outs of AutoSuggest”

Language Cloud… word-counts… best practice?

001Best practice!  This is a phrase I’ve had a love/hate relationship with over the course of my entire career… or maybe it’s just a love to hate!  The phrase is something that should perhaps be called “Best Suggestions” and not “Best Practice” because all too often I think it’s used to describe the way someone wants you to work as opposed to anything that represents the views of a majority of users over a long period of time, or anything that takes account the way different people want to work.  In fact with new technology how can it be “Best Practice” when it hasn’t been around long enough in the first place?  I think for a clearly defined and well established process then “Best Practice” has it’s place… but otherwise it’s often the easy answer to a more complex problem, or just a problem that is considered too hard to address.

Continue reading “Language Cloud… word-counts… best practice?”