12 million Haitians can’t be wrong!

According to wikipedia there are some 9.6 to 12 million people speaking Haitian Creole worldwide.  I had no idea it was such a widely spoken language until I was asked a question this week about why the Google Translate machine translation provider in Studio returned French translations when the project was en(US) – fr(HT) (French-Haiti).

In fact I had no idea that French-Haiti was most likely intended to be the language that should be used in Studio for Haitian Creole as this isn’t a language I come across very often.

But before I can ask a developer to fix this problem I have to be able to understand it myself, so the first thing I wanted to know was whether French-Haiti was the same as Haitian Creole or not.  And for anyone interested, as I was, to read more on this I found these three interesting links below explaining how the language came around and it does have a very interesting history:

The next thing I did was see if it was really so different to French… using a few basic sentences.  I used Google Translate to check this since the request was to make Googles Haitian Creole available when fr(HT) is the target language in a Studio Project.  Based on this example alone it certainly looks like a different language and not just a few words here and there as we have between English US and English UK:

I guess Google would not have provided it if 12 million Haitians were wrong!  I can’t judge whether the translation is good or not, but that doesn’t matter as much as making sure that Haitian Creole is returned when you use fr(HT) in your project.  Perhaps it’s also wrong for the language to be called French-Haiti in the first place and not Haitian Creole?  Maybe we should have ht(HT), ht(BS), ht(CA) etc.  I don’t know, and more to the point I can’t do anything about that either as Studio uses the Windows Language Code Identifiers (LCID) which are all fully qualified with a country code and a language code.  There is no Haiti on its own, only fr(HT).  But based on Google and these basic sentences it certainly looks as though it could be a separate language altogether.

So the task was how to resolve the problem of the machine translation results being French since Studio does not have Haitian Creole available to it because of the LCIDs.   I turned to the SDL AppStore team, and Andrea-Melinda Ghişa in particular as she had done some work in bringing Google NMT and Microsoft NMT to the MT Enhanced plugin on the SDL AppStore.  In the space of an hour or so Andrea had the answer!  After a little research she found this site, Google Web Interface and Search Language Codes, which shows that Google is not using the fully qualified LCIDs, only a web interface code which in this case is ht.  When Studio used fr(HT) as the target language the code being sent to Google was fr which makes sense as there are 47 varieties of French in Studio where most are very similar to each other and Google certainly doesn’t have 47 corresponding varieties to apply machine translation to.  But fr(HT) is clearly an exception so Andrea changed the code in the MT Enhanced plugin so that when Studio presented fr(HT) the plugin would send ht to Google:

As simple as that, and the result really shows the benefit of having the API approach for these kinds of things because it means anyone who knows how to develop can implement a solution in their own time.  There’s no need to submit a request to the product management team for SDL Trados Studio and then work through the request lifecycle based on its position in the overall priority list and then all the process and controls that take place after that.  You can implement a solution right now!  Under your own control without the need to engage the development team for the core product at all.  This is a very powerful capability offered by the Studio platform… of course you do need to be, or have access to, an Andrea!!

In Studio, if I use the out of the box Google Translate alongside the updated MT Enhanced plugin (now v1.7) for my fr(HT) project you can see the difference:

Longer term the community API team would like to implement a mapping feature to this plugin so if Google adds more languages in the future, which is quite likely, then it will be simple for a user to change the mapping table in a simple UI themselves.  Now that would be cool!  But all good things have to wait their turn even in the appstore team.  But as the code is opensource, if you are a developer and would like to implement this idea and share it with others then please go ahead and do it… the sourcecode is here!  In fact you can find more information about the SDL OpenSource Community here:

https://sdl.github.io/Sdl-Community/

Looking forward to any contributions!!

11 comments
  1. Evzen said:

    What am I missing here?
    This assumption: “In fact I had no idea that French-Haiti was most likely intended to be the language that should be used in Studio for Haitian Creole” is just plain wrong… simply because “French” is not “Creole”.
    Just like “Swiss French” is not the same thing as “Swiss German” or “Swiss Italian”…

    Just seeing the fundamental difference between the IETF code for Haitian French and Haitian Creole clearly shows that these are two completely distinct languages, NOT just two dialects of the same language.

    Like

    • What you are missing, I believe, is that there is no Haiti available in Studio at all. Only fr(HT). So translators into this language use fr(HT) for Haitian Creole. This is fine when you are only using a TM, but when you want to use MT, and in particular use Google Translate as this is the only MT engine I’m aware of for Haitian Creole then you are stuck since fr(HT) goes to FR for Google Translate.
      So this article is really about being able to use the API to solve this problem in Studio since you can’t add additional languages at all via the API. So this means that even if you are correct and fr(HT) in practice is not the same as Haitian Creole, then you still have no solution in Studio without the ability to add Haitian Creole in the first place.
      I really have no idea whether you are right or not, although I didn’t say they were two dialects of the same language, but for the purposes of this problem it really doesn’t matter.

      Like

  2. Evzen said:

    Well, I know that the Haitian Creole is not available in Studio (since it’s not available in the underlying .NET Framework).
    But that doesn’t justify in any way using INCORRECT language as a replacement and then wondering why Google Translate returns results different language.

    Yes, I understand that this article is about showing the possibilities of using API… all that would be perfectly okay if that hack via hardcoding the locale code conversion would be done only on the particular user’s side “for personal use only”. But pushing this apparently incorrect change to the public repository is very bad move. This is NOT “Added support for Haitian Creole.” as the changeset description says, this is actually “Removed support for Haitian French”.

    Like

    • ok Evzen, I think I’m going to look at it this way. Right now there are zero ways to get MT through Google for Haitian Creole in Studio. There are 47 ways to get French. The translators I am aware of who are translating into Haitian Creole are using fr(HT) for their work. I am not aware of any using fr(HT) for the Haitian variance of French although apparently around 10% of the Haitian population do also speak French. The MT Enhanced plugin is a free optional download from the appstore and there are two built in MT providers that still return French when you use fr(HT) in your project, and multiple others available. I think the 100% who potentially benefit from this optional ability may be happy about this, particularly since we did it to enable the capability for people who wanted it in practice and were not just trying to prove a point.

      Like

      • Evzen said:

        As I said, this immediate help for particular people is perfectly okay as long as this hacky and obviously incorrect “fix” is provided ONLY TO THEM in a special build or so… but it should definitely not be part of the official codebase because it’s NOT a fix at all.
        And the optionality of the download is totally irrelevant… it’s simply WRONG to include such functionality-crippling changes (which it really IS, no matter if the people doing it realize that fact or not) in the official codebase.

        Well, all in all, it’s SDL’s source, not mine… i.e. it tells a lot about SDL’s knowledge/experience, not mine 😉

        Like

      • ok Evzen, you’re right. But I would rather be able to provide a solution to some users who actually need it right now rather than dig my heels in waiting for the solution to be implemented in the correct way. This whole article was about two things.
        1. Resolving a specific problem for translators using Studio to translate into Haitian Creole and wanting to use Google Translate machine translation for this, and
        2. Showing how people can use the API to resolve things themselves and not have to wait for a software vendor to do it for them
        In time I’m sure the proper solution will get implemented, but until then I think this “hack” is going to solve a problem and not really create one. If there was no alternative and if this wasn’t optional I’d agree with you, but there are, and it is.

        Like

  3. Evzen said:

    Oh, BTW, I actually believe that one could be able to add proper Haitian Creole language support to the OS using the Locale Builder:
    https://blogs.msdn.microsoft.com/shawnste/2015/08/27/locale-builder-and-finnish-or-other-locales/
    And then, if Studio just evaluates all available locales in the system (https://docs.microsoft.com/en-us/dotnet/api/system.globalization.culturetypes?view=netframework-4.7) it could be supported automatically…

    But I’m not a developer and don’t really understand all this .Net stuff so deeply, so I might be completely off here…

    Like

    • Thanks Evzen, I’m sure this will be useful to the Studio core development team in the unlikely event they weren’t aware of this already and if the need to address this was considered sufficiently high enough priority then they would do it. However, this article was about resolving a problem for users who needed a solution now and we facilitated this in an hour or so through the APIs using an optional solution. Maybe the solution you propose would make a nice app for users to add a missing locale, and in fact I know at least one customer who developed such a tool because they work with many exotic languages that are not supported by the windows locales. We could not have done this in an hour. I think the idea of adding a custom table to this app as I mentioned at the end of the article would be a nice way to handle this and would simplify mapping anything, even if the project languages used were incorrect which is something else I see from time to time. Ultimately I like a pragmatic solution that can help quickly.
      But thank you for all your comments and as I also mentioned in the article, perhaps the real solution is for Microsoft, or SDL, to add the appropriate variances of Haitian to the available locales in the first place.

      Like

  4. Hi Paul,
    you mentioned above that SDL is using Windows Language Code Identifiers. Now I just stumbled upon the language code for Croatian in my extraction of language pairs from IATE.
    IATE uses “hr”, which I used in my extractions, Windows has a lot of possibilities: bs, hr, or sr, hr-HR, and hr-BA (depending on Supported Version, whatever that may be) and Multiterm 2017 knows just 3 other flavours, “SH”, “SH-HR” and “SH-B1”.
    So the confusion is not yet over…
    Best regards,
    Henk

    Like

    • Indeed, I believe Studio uses hr-HR for Croatian (Croatia) and hr-BA for Croatian (Latin, Bosnia and Herzegovina) as Studio always requires fully qualified language codes. But Multiterm uses SH for Croatian, SH-HR for Croatian (Croatia) and SH-B1 for Croatian (Latin, Bosnia and Herzegovina). So confusion certainly reigns! I think the MultiTerm uses a lot of legacy codes that are mapped to the newer ones in Studio, and then more recently had an update with all the new languages introduced in W10.
      In fact MultiTerm does support Haitian where it uses HAT-HT for Haitian (Haiti) and HAT for Haitian… so this is funny since it can’t be used in Studio at all!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: