Towards the Automatic Generation of Arabic Lexical Recognition Tests.

Mohammad Khalil Ahmad Nassar; محمد  خليل احمد نصار

Towards the Automatic Generation of Arabic Lexical Recognition Tests.

creativework.keywords	the Automatic Generation, Arabic Lexical, Recognition Tests.	en
dc.contributor.author	Mohammad Khalil Ahmad Nassar	en
dc.contributor.author	محمد خليل احمد نصار	ar
dc.date.accessioned	2023-01-31T13:02:14Z
dc.date.available	2023-01-31T13:02:14Z
dc.date.issued	2020-08-22
dc.description.abstract	Lexical Recognition Test (LRT) themes are one of the main methods that are widely used to measure language proficiency of some common languages such as German, English and Spanish. However, similar researches for Arabic language are at development stages, and existing proposals mainly use human-generated methods. In this thesis work, we suggested a new methodology, based on a newly developed algorithm that aimed to design and construct an Arabic LRT. The developed algorithm generates nonwords dynamically based on Arabic language special characteristics. The main four characteristics that this developed algorithm considers are: orthography (spelling), phonology (pronunciation), n-grams and the word frequency map, which is an important factor to create a multi-level test. The developed algorithm differs from previous approaches in the sense that the previous approaches used Markov models to create nonwords while the developed algorithm use some of Arabic language letter characteristics to create high quality nonwords. With the help of a large processed dataset of vocabularies (14,000,849), the developed algorithm was experimented. For this purpose, a Web-based application, following the suggested methodology, was designed and implemented to facilitate the process of setting up the LRT, and to manage and analyze learners’ responses. The experimental results have shown that the LRT questions that were automatically generated by the proposed system had confused the learners, this is clear from the output of the confusion matrix which showed that 1/3 of the generated nonwords were able to distract the learners. Each vocabulary item had (49) responses; responses for real words (48% correct answers, 52% in-correct answers). For the nonwords responses about 30% incorrect answers; this means the system was able to confuse the learner by selecting them as real words, and 70% correct answers; this means that the responses did not confuse the learner. Consequentially, the results of recall and precision have smaller values, 0.28 and 0.54, respectively. The study also analyzed other study dimensions towards achieving test scores. These dimensions are word length, word type, and knowledge of Arabic as the number of learning years, learner’s main language, and gender. The results have shown that the most affecting dimension was the type of generating the nonwords, especially the orthographical one, and it would be better when the replacement letter is located in the intersection of both orthographical and phonological similarity groups, since most of the confusing vocabularies (277) were belonged to this deterministic item. To validate the accuracy of the developed approach, we developed a version of the Arabic LRT. This version consisted of two sections: real words and nonwords. The nonwords section had been divided into two equal parts; vocabularies that were automatically generated from the developed algorithm, and the second part contained vocabularies that were generated manually by Arabic language expert, who used the same rules being implemented in the algorithm. The comparative study showed that results the accuracy of both methods is almost the same	en
dc.identifier.citation	Nassar، Mohammad Khalil. (2020). Towards the Automatic Generation of Arabic Lexical Recognition Tests[A published thesis, Al-Quds University, Palestine].Al-Quds University digital repository. https://arab-scholars.com/b3a2c7	en
dc.identifier.uri	https://dspace.alquds.edu/handle/20.500.12213/7796
dc.language.iso	en_US
dc.publisher	Al-Quds University	en
dc.title	Towards the Automatic Generation of Arabic Lexical Recognition Tests.	en
dc.title	نهج التوليد التلقائي لإختبارت التعرف اللغوي في اللغة العربية	ar
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MohammadNassar_Thesis معدلة.pdf
Size:: 2.95 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Computer Science علم الحاسوب