Machine Learning Improves Arabic Transcription Capabilities

With advances in speech processing and natural language, it is hoped that one day you will be able to ask your virtual assistant what are the best salad ingredients. Currently, you can ask your home gadget to play music or open it by voice command, which is already available in many devices.

If you speak Moroccan, Algerian, Egyptian, Sudanese, or any of the other dialects of Arabic that vary greatly from region to region, where some are mutually incomprehensible, that is a completely different story. If your first language is Arabic, Finnish, Mongolian, Navajo, or any other language with a high level of morphological complexity, you may feel left out.

These intricate designs intrigued Ahmed Ali to find a solution. He is the Chief Engineer of the Arabic Language Technology Group at the Qatar Computing Research Institute (QCRI), affiliated with Hamad bin Khalifa University of Qatar, and the founder of ArabicSpeech, “a community that exists for the benefit of Arabic speech science and speech technology. … “

Qatar Foundation Headquarters

Ali was fascinated by the idea of ​​talking to cars, appliances and gadgets many years ago when he worked at IBM. “Can we build a machine that can understand different dialects — an Egyptian pediatrician to automate recipes, a Syrian teacher to guide the kids through the basics of a lesson, or a Moroccan chef to describe the best couscous recipe?” he claims. However, the algorithms that run on these machines cannot parse the roughly 30 varieties of Arabic, let alone understand them. Most speech recognition tools today only work in English and a few other languages.

The coronavirus pandemic has further exacerbated an already growing reliance on voice technology as natural language processing technologies have helped people comply with home rules and physical distancing measures. However, while we have used voice commands to assist with e-commerce shopping and household management, there will be more applications in the future.

Millions of people around the world are using Massive Open Online Courses (MOOCs) for open access and unlimited participation. Speech recognition is one of the main functions of MOOCs, where students can search specific areas of the oral content of courses and include translation using subtitles. Speech technology allows lectures to be digitized to display spoken words as text in university classrooms.

Ahmed Ali, Hamad bin Kalifa University

According to a recent article in Speech Technology magazine, the voice and speech recognition market is projected to reach $ 26.8 billion by 2025 as millions of consumers and businesses around the world will rely on voice bots for more than just interacting with their devices or cars. but also to improve customer service, drive innovation in healthcare, and increase accessibility and inclusion for people with hearing, speech or motor impairments.

In a 2019 survey, Capgemini predicts that by 2022, more than two in three consumers will prefer voice assistants over visiting stores or bank branches. a share that could reasonably rise given the home, physically remote life and commerce that the epidemic has imposed on the world for over a year and a half.

However, these devices cannot be delivered to the vast territories of the globe. For these 30 types of Arabic and millions of people, this is almost a missed opportunity.

Arabic for cars

English or French speaking voice bots are far from perfect. However, teaching machines to understand Arabic is especially difficult for several reasons. These are three well-recognized problems:

  1. Lack of diacritics. Arabic dialects are common as they are mostly spoken. Most of the available text does not contain diacritics, meaning it lacks accents such as sharp (´) or grave (`) that indicate the sound meaning of letters. Therefore, it is difficult to determine where the vowels are going.
  2. Lack of resources. There is a lack of labeled data for the various Arabic dialects. Collectively, they lack standard spelling rules that dictate how to write a language, including norms or spelling, hyphenation, word breaks, and accent. These resources are critical for training computer models, and the fact that there are too few of them hinders the development of Arabic speech recognition.
  3. Morphological complexity. Arabic native speakers often switch the code. For example, in areas colonized by the French — North Africa, Morocco, Algeria, and Tunisia — dialects include many borrowed French words. Hence, there are a large number of so-called out-of-vocabulary words that speech recognition technologies cannot understand because these words are not Arabic.

“But the field is moving at lightning speed,” Ali says. This is a collaborative effort by many researchers to make it even faster. Ali Arabic Language Technology Lab is leading the ArabicSpeech project to combine Arabic translations with dialects that are native to each region. For example, Arabic dialects can be divided into four regional dialects: North African, Egyptian, Arabic, and Levantine. However, given that dialects do not match boundaries, this can be as shallow as one dialect per city; for example, a native speaker of Egyptian can distinguish the Alexandrian dialect from his fellow citizens from Aswan (distance on the map is 1000 km).

Building a technological future for everyone

For now, machines are about as accurate as human transcribers, thanks in large part to advances in deep neural networks, a subset of machine learning in artificial intelligence that relies on algorithms based on how the human brain works, biologically and functionally. However, until recently, speech recognition has been a bit of a hack. This technology has experience in using various modules for acoustic modeling, construction of pronunciation lexicons and language modeling; all modules that need to be trained separately. More recently, researchers have created training models that transform acoustic characteristics directly into textual transcription, potentially optimizing all parts for the ultimate task.

Even with these advances, Ali still cannot voice commands to most devices in his native Arabic. “This is 2021, and I still cannot speak to many cars in my dialect,” he comments. “I mean, I now have a device that can understand my English, but machine recognition of multi-dialect Arabic speech hasn’t happened yet.”

Doing so is the goal of Ali’s work, culminating in the creation of the first transformer for recognizing Arabic speech and its dialects; one that has achieved unrivaled performance. Dubbed the QCRI Advanced Transcription System, the technology is currently being used by broadcasters Al Jazeera, DW and the BBC to transcribe online content.

There are several reasons why Ali and his team have succeeded in creating these speech machines right now. First of all, he says: “Resources are needed for all dialects. We need to accumulate resources to then train the model. ” Advances in computer processing mean that computationally intensive machine learning now happens on a GPU that can process and display complex graphics quickly. As Ali says, “We have great architecture, good modules and data that reflects reality.”

Researchers at QCRI and Kanari AI have recently built models that can achieve human parity in Arab news. The system demonstrates the impact of Aljazeera’s daily reports with subtitles. While the human error rate in English (HER) is around 5.6%, research has shown that the HER for Arabic is significantly higher and can be as high as 10% due to the morphological complexity of the language and the lack of standard spelling rules in dialectal Arabic. … With recent advances in deep learning and end-to-end architecture, the Arabic speech recognition engine manages to outperform native speakers in broadcast news.

While speech recognition in modern standard Arabic appears to work well, researchers at QCRI and Kanari AI are busy testing the boundaries of dialect processing and achieving excellent results. Since nobody speaks modern Standard Arabic at home, attention to dialect is what we need so that our voice assistants can understand us.

This content was written Qatar Research Institute of Computing, Hamad bin Khalifa University, member of the Qatar Foundation. This was not written by the editors of the MIT Technology Review.

Source link

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button