Towards the Automatic Generation of Arabic Lexical Recognition Tests.
Mohammad Khalil Ahmad Nassar
محمد خليل احمد نصار
Lexical Recognition Test (LRT) themes are one of the main methods that are widely used to measure language proficiency of some common languages such as German, English and Spanish. However, similar researches for Arabic language are at development stages, and existing proposals mainly use human-generated methods. In this thesis work, we suggested a new methodology, based on a newly developed algorithm that aimed to design and construct an Arabic LRT. The developed algorithm generates nonwords dynamically based on Arabic language special characteristics. The main four characteristics that this developed algorithm considers are: orthography (spelling), phonology (pronunciation), n-grams and the word frequency map, which is an important factor to create a multi-level test. The developed algorithm differs from previous approaches in the sense that the previous approaches used Markov models to create nonwords while the developed algorithm use some of Arabic language letter characteristics to create high quality nonwords. With the help of a large processed dataset of vocabularies (14,000,849), the developed algorithm was experimented. For this purpose, a Web-based application, following the suggested methodology, was designed and implemented to facilitate the process of setting up the LRT, and to manage and analyze learners’ responses. The experimental results have shown that the LRT questions that were automatically generated by the proposed system had confused the learners, this is clear from the output of the confusion matrix which showed that 1/3 of the generated nonwords were able to distract the learners. Each vocabulary item had (49) responses; responses for real words (48% correct answers, 52% in-correct answers). For the nonwords responses about 30% incorrect answers; this means the system was able to confuse the learner by selecting them as real words, and 70% correct answers; this means that the responses did not confuse the learner. Consequentially, the results of recall and precision have smaller values, 0.28 and 0.54, respectively. The study also analyzed other study dimensions towards achieving test scores. These dimensions are word length, word type, and knowledge of Arabic as the number of learning years, learner’s main language, and gender. The results have shown that the most affecting dimension was the type of generating the nonwords, especially the orthographical one, and it would be better when the replacement letter is located in the intersection of both orthographical and phonological similarity groups, since most of the confusing vocabularies (277) were belonged to this deterministic item. To validate the accuracy of the developed approach, we developed a version of the Arabic LRT. This version consisted of two sections: real words and nonwords. The nonwords section had been divided into two equal parts; vocabularies that were automatically generated from the developed algorithm, and the second part contained vocabularies that were generated manually by Arabic language expert, who used the same rules being implemented in the algorithm. The comparative study showed that results the accuracy of both methods is almost the same