[Online]A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic

A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic
ID:18 Submission ID:2 View Protection:ATTENDEE Updated Time:2025-12-28 01:29:31 Hits:417 Online

Start Time:2025-12-29 17:00 (Asia/Amman)

Duration:15min

Session:[S5] Track 5: Emerging Trends of AI/ML » [S5-1] Track 5: Emerging Trends of AI/ML

Presentation File

Tips: The file permissions under this presentation are only for participants. You have not logged in yet and cannot view it temporarily.

Abstract
The Arabic language presents unique challenges for natural language processing due to its complex grammar, diverse dialects, and frequent omission of diacritics. This paper proposes a unified token-free model based on ByT5 that simultaneously performs spelling correction (including Jordanian dialect-to-Modern Standard Arabic (MSA) translation) and diacritization. Our approach uses task-specific prefixes (“correct:” for correction and “diacritize:” for combined correction and diacritization) to enable flexible multi-task learning. The model was fine-tuned on the JODA dataset (Jordanian dialect/MSA pairs) and high-quality Tashkeela subsets (Clean-50 and Clean-400), with synthetic errors injection to enhance robustness. Automatic evaluation showed an overall evaluation score of 78.06% on JODA and 92.45% on the combined test set of JODA and Tashkeela. Manual evaluation of 200 JODA samples revealed a character error rate of 4.41% and diacritic error rate of 1.32%, demonstrating practical efficacy in handling Arabic’s complexities.
Keywords
Arabic NLP,Dialect Translation,Jordanian Dialect,Diacritization,Spelling Correction,ByT5,Transformer Models,Multi-Task Learning
Speaker
Rabie Otoum
RAN Optimization and University of Jordan

Submission Author
Rabie Otoum University of Jordan
Gheith Abandah University of Jordan
Mohammad Abdel-Majeed University of Jordan
Comment submit
Verification code Change another
All comments

CONTACT US

Email: asiancomnet@usssociety.org

Website & IT Support: hi@aconf.org 

Registration Submit Paper