“Use of Arabic online has increased proportionally with the increase in Internet users. In comparison English remains essentially flat, 25% in 2013 and 28% in 2017, despite the increase in Internet use. “ (Mideast Media)
The concept of giving machines the ability to process human language in the western world is a concept that has existed since the birth of the computer itself. We’ve been watching language and speech technologies unfold and strengthen for the past couple of years, while those are still steadily improving and gaining more momentum, Arabic language and speech technologies are now battling to touch the level of maturity of their English counterparts in order to support the evident boom of Arabic language use online.
“Arabic-speaking Internet users are expected to reach 197M in 2017”
Can you imagine how many Arabic online interactions that could mean? “88% of the Middle East population uses social networking daily” (Go-Gulf). If you take 88% of 197 that will give you around 170m people making on “average 2 interactions a day”, giving you a total of 340 million interactions a day and 124 billion a year. However, the supply of technology available to capture, decipher and analyze this valuable data is not mature enough yet. Knowing that “89% of businesses are soon expected to compete mainly on customer experience, and that by 2020 85% of customer interactions will be managed without a human.” (Gartner) Adapting to the growing need for Arabic speech and language technologies and applications has become more than a necessity. Although, many are working on delivering technologies or applications such as machine translation, information retrieval and extraction systems as well as speech recognition and text to speech, the complexity and structure of the Arabic language is slowing down the progression within the Arabic NLP domain.
The aspect that differentiates language processing from other forms of data processing is simply their essential knowledge of language. The Arabic language has always been considered as an interesting language due to its complex linguistic structure. However, when it comes to NLP and speech technology this poses a major challenge. The complexity of the Arabic language makes it very difficult for machines to understand the intent or context of the language. There are many reasons why it is difficult to decipher the Arabic language but the main ones are:
1. Dissimilar Arabic Dialects
Levant, Gulf & Egyptian Arabic are the main dialects found in the Middle East and North Africa, although similar in sound, these dialects are extremely distinct from one another when it comes to spelling, vocabulary and grammar. This distinction makes it impossible to build NLP applications for the Arabic language in general, meaning that language models have to be built for each dialect and can be very difficult to combine them into one language model and offer Arabic language solutions catered for all.
2. Spoken Colloquial vs. Formal Language
Everyday spoken Arabic highly differs from formal standardised Arabic. Creating language solutions for informal everyday spoken Arabic can be challenging as this differs not only from one country to the next, but also from one generation to the other.
3. Grammar, Punctuation & Slang
When it comes to online interactions be it on social networks, reviews or even Google searches people tend to shorten words or write in slang making it close to impossible to extract context from these interactions or sometimes even make sense of them.
Moving forward, go for an integrated approach
With fewer amounts of data to process you’ll most likely obtain the most value from human-driven practices where you can maximize the accuracy of data processing through the rational judgment of the human. You should consider machine powered NLP applications and technology when operating at larger scales, dealing with large amounts of data or for time-sensitive activities: for example if you need to respond to customers in a timely manner.
Although, not all Arabic natural language-processing (ANLP) applications are advanced enough to process Arabic data with high accuracy, some areas or technologies are more mature than others. In order to overcome the downsides of these technologies think of a more integrated approach: People & technology must work together. Train people to exploit the benefits of the technologies available to gain the maximum value of both people and technology.
IST’s Language & Speech Innovation Center (LSIC) is currently working on advanced NLP applications and technologies, such as text to speech and sentiment analysis. For more information on Arabic speech & language technologies contact us here.
Source: IST Blogs