Multilingual language technology that goes beyond ChatGPT


In recent months, text generator ChatGPT has amazed the world with automatic writing of humanlike texts in all kinds of styles. The fundamental techniques of ChatGPT date from 2017, but it has scaled up computing power and training data to such an extent that this year’s results have astonished even experts in the field.

‘Scientists could see ChatGPT coming, but I was still surprised at how well it works. It is great to see how much interest there is now in language technology. That shows how close human thinking skills and language are and also how important language is to give the impression of an intelligent system.’

Christof Monz leads the Language and Technology Lab at LAB42, which goes beyond where ChatGPT ends. One of ChatGPT’s shortcomings is that it needs enormous amounts of data. Of the more than seven thousand languages spoken worldwide, however, most have so little digital data available that ChatGPT cannot understand, generate or translate these ‘smaller’ languages, many of which still have many millions of speakers. 

‘Google Translate works for something like 140 languages,’ says Monz, ‘and the European equivalent DeepL for something like twenty languages. From the point of view of inclusiveness though, you want to offer language technology for those smaller languages as well. There is a lot to be gained there, and that is an important part of what we do in our lab.’