Classifying AI-Generated Text in Low-Resource Languages like Arabic

Document Type : Research Article

Authors

1 Computer Engineering Department, College of Alborz, University of Tehran, Tehran, Iran

2 Computer Engineering Department, College of Alborz, University of Tehran, Tehran, Iran School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

3 School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Abstract

AI-Generated Texts (AIGTs) refer to written content produced by artificial intelligence systems using technologies such as natural language processing and machine learning. The rise of AIGT has introduced new challenges in content authenticity, trustworthiness, and information integrity across digital platforms. In low-resource languages, like Arabic, AIGT detection is challenging because of their more complex structural features. Accurate identification of AI-generated versus human-written text is essential to combat misinformation, preserve credibility in communication, and enhance content moderation systems. In this study, we propose a novel framework for AIGT detection on the AutoTweet Dataset, an annotated corpus of Arabic tweets. To the best of our knowledge, this is the first work to leverage Large Language Models (LLMs) for AIGT detection in Arabic, addressing a critical gap in low-resource natural language processing. We introduce a dynamic few-shot prompting technique, powered by a retrieval-based Judge Prompter module, which selects semantically and stylistically relevant support examples to enhance the contextual understanding of LLMs. We conduct a comprehensive evaluation across multiple LLMs, including Mistral-7B, LLaMA-3.1-8B, and ALLaM-7B-Instruct-preview, under zero-shot, few-shot, and fine-tuning scenarios. Our best results were achieved using Mistral-7B with QLoRA fine-tuning and dynamic few-shot prompting, reaching an accuracy of 88.69% and an F1-score of 88.35%. These findings demonstrate the feasibility of adapting LLMs for AIGT detection in Arabic and highlight the effectiveness of context-aware prompting in low-resource settings, paving the way for future progress in text classification.

Keywords

Main Subjects


[1] J. Wu, W. Gan, Z. Chen, S. Wan, H. Lin, Ai-generated content (aigc): A survey, arXiv preprint arXiv:2304.06632,  (2023).
[2] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P.S. Yu, L. Sun, A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt, arXiv preprint arXiv:2303.04226,  (2023).
[3] S. Kumar, S. Garg, Y. Vats, A.S. Parihar, Content Based Bot Detection using Bot Language Model and BERT Embeddings, in:  2021 5th International Conference on Computer, Communication and Signal Processing (ICCCSP), 2021, pp. 285-289.
[4] A. Gao, From PGC to UGC to AIGC: Change of content paradigm, in:  SHS Web of Conferences, EDP Sciences, 2024, pp. 03017.
[5] R. Bommasani, D.A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M.S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, On the opportunities and risks of foundation models, arXiv preprint arXiv:2108.07258,  (2021).
[6] N. Prova, Detecting AI Generated Text Based on NLP and Machine Learning Approaches, arXiv preprint arXiv:2404.10032,  (2024).
[7] S. Chakraborty, A.S. Bedi, S. Zhu, B. An, D. Manocha, F. Huang, On the possibilities of ai-generated text detection, arXiv preprint arXiv:2304.04736,  (2023).
[8] K. Hayawi, S. Shahriar, S.S. Mathew, The imitation game: Detecting human and AI-generated texts in the era of ChatGPT and BARD, Journal of Information Science,  (2024) 01655515241227531.
[9] Y. Xie, A. Rawal, Y. Cen, D. Zhao, S.K. Narang, S. Sushmita, MUGC: Machine Generated versus User Generated Content Detection, arXiv preprint arXiv:2403.19725,  (2024).
[10] S. Yang, S. Yang, C. Tong, In-Depth Application of Artificial Intelligence-Generated Content AIGC Large Model in Higher Education, Adult and Higher Education, 5(19) (2023) 9-16.
[11] Y. Wang, Y. Pan, M. Yan, Z. Su, T.H. Luan, A survey on ChatGPT: AI-generated contents, challenges, and solutions, IEEE Open Journal of the Computer Society,  (2023).
[12] E.G. Said, Arabic Chatbots Challenges and Solutions: A Systematic Literature Review, Iraqi Journal For Computer Science and Mathematics, 5(3) (2024) 128-169.
[13] K. Darwish, N. Habash, M. Abbas, H. Al-Khalifa, H.T. Al-Natsheh, H. Bouamor, K. Bouzoubaa, V. Cavalli-Sforza, S.R. El-Beltagy, W. El-Hajj, M. Jarrar, H. Mubarak, A panoramic survey of natural language processing in the Arab world, Commun. ACM, 64(4) (2021) 72–81.
[14] A.A. ElSabagh, S.S. Azab, H.A. Hefny, A comprehensive survey on Arabic text augmentation: approaches, challenges, and applications, Neural Computing and Applications, 37(10) (2025) 7015-7048.
[15] E.H. Almansor, A. Al-Ani, F.K. Hussain, Transferring informal text in arabic as low resource languages: State-of-the-art and future research directions, in:  Complex, Intelligent, and Software Intensive Systems: Proceedings of the 13th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2019), Springer, 2020, pp. 176-187.
[16] Y. Saoudi, M.M. Gammoudi, Trends and challenges of Arabic Chatbots: Literature review, Jordanian Journal of Computers and Information Technology (JJCIT), 9(03) (2023).
[17] S. Feng, H. Wan, N. Wang, Z. Tan, M. Luo, Y. Tsvetkov, What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection, arXiv preprint arXiv:2402.00371,  (2024).
[18] H. Alshammari, K. Elleithy, Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges, Information, 15(7) (2024) 419.
[19] B.S. Leite, Generative Artificial Intelligence in chemistry teaching: ChatGPT, Gemini, and Copilot’s content responses, Journal of Applied Learning and Teaching, 7(2) (2024).
[20] Z. Lai, X. Zhang, S. Chen, Adaptive ensembles of fine-tuned transformers for llm-generated text detection, arXiv preprint arXiv:2403.13335,  (2024).
[21] A. Abdelali, H. Mubarak, S. Chowdhury, M. Hasanain, B. Mousi, S. Boughorbel, S. Abdaljalil, Y. El Kheir, D. Izham, F. Dalvi, Larabench: Benchmarking arabic ai with large language models, in:  Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 487-520.
[22] S. Alshahrani, N. Alshahrani, S. Dey, J. Matthews, Performance Implications of Using Unrepresentative Corpora in Arabic Natural Language Processing, in:  Proceedings of ArabicNLP 2023, 2023, pp. 218-231.
[23] H. Chouikhi, M. Aloui, C.B. Hammou, G. Chaabane, H. Kchaou, C. Dhaouadi, GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning, arXiv preprint arXiv:2407.02147,  (2024).
[24] E. Haque, A Beginner's Guide to Large Language Models, Enamul Haque, 2024.
[25] Y. Bengio, R. Ducharme, P. Vincent, A neural probabilistic language model, Advances in neural information processing systems, 13 (2000).
[26] F. Alhayan, H. Himdi, Ensemble learning approach for distinguishing human and computer-generated Arabic reviews, PeerJ Computer Science, 10 (2024) e2345.
[27] N.S. Alghamdi, J.S. Alowibdi, Distinguishing Arabic GenAI-generated Tweets and Human Tweets utilizing Machine Learning, Engineering, Technology & Applied Science Research, 14(5) (2024) 16720-16726.
[28] K.S. Kalyan, A survey of GPT-3 family large language models including ChatGPT and GPT-4, Natural Language Processing Journal,  (2023) 100048.
[29] F. Harrag, M. Debbah, K. Darwish, A. Abdelali, Bert transformer model for detecting Arabic GPT2 auto-generated tweets, arXiv preprint arXiv:2101.09345,  (2021).
[30] H. Alshammari, A. El-Sayed, K. Elleithy, AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture, Big Data and Cognitive Computing, 8(3) (2024) 32.
[31] H. Almerekhi, T. Elsayed, Detecting automatically-generated arabic tweets, in:  Information Retrieval Technology: 11th Asia Information Retrieval Societies Conference, AIRS 2015, Brisbane, QLD, Australia, December 2-4, 2015. Proceedings 11, Springer, 2015, pp. 123-134.
[32] T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, Qlora: Efficient finetuning of quantized llms, Advances in Neural Information Processing Systems, 36 (2024).
[33] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, OpenAI blog, 1(8) (2019) 9.
[34] H. Wang, X. Luo, W. Wang, X. Yan, Bot or human? detecting chatgpt imposters with a single question, arXiv preprint arXiv:2305.06424,  (2023).
[35] B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, Language models are few-shot learners, arXiv preprint arXiv:2005.14165, 1 (2020).
[36] S. Kadam, V. Vaidya, Review and analysis of zero, one and few shot learning approaches, in:  Intelligent Systems Design and Applications: 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018) held in Vellore, India, December 6-8, 2018, Volume 1, Springer, 2020, pp. 100-112.
[37] Y. Song, T. Wang, P. Cai, S.K. Mondal, J.P. Sahoo, A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities, ACM Computing Surveys, 55(13s) (2023) 1-40.
[38] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, arXiv preprint arXiv:1801.06146,  (2018).
[39] J. Dodge, G. Ilharco, R. Schwartz, A. Farhadi, H. Hajishirzi, N. Smith, Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping, arXiv preprint arXiv:2002.06305,  (2020).
[40] E.J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685,  (2021).
[41] A.Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D.S. Chaplot, D.d.l. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, Mistral 7B, arXiv preprint arXiv:2310.06825,  (2023).
[42] amitsangani, meta-llama/Llama-3.1-8B-Instruct,  (2024).
[43] M.S.B.a.Y.A.a.N.A.A.a.N.M.A.a.H.A.A.a.S.A.a.F.A.M.a.S.Z.A. and, ALLaM: Large Language Models for Arabic and English,  (2025).
[44] S.I. Hassan, L. Elrefaei, M.S. Andraws, Arabic Tweets Spam Detection Based on Various Supervised Machine Learning and Deep Learning Classifiers, MSA Engineering Journal, 2(2) (2023) 1099-1119.
[45] K. Gaanoun, I. Benelallam, Arabic dialect identification: An Arabic-BERT model with data augmentation and ensembling strategy, in:  Proceedings of the Fifth Arabic Natural Language Processing Workshop, 2020, pp. 275-281.
[46] D. Refai, S. Abu-Soud, M.J. Abdel-Rahman, Data augmentation using transformers and similarity measures for improving arabic text classification, IEEE Access, 11 (2023) 132516-132531.
[47] B. Mousi, N. Durrani, F. Ahmad, M.A. Hasan, M. Hasanain, T. Kabbani, F. Dalvi, S.A. Chowdhury, F. Alam, AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs, arXiv preprint arXiv:2409.11404,  (2024).
[48] S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Amatriain, J. Gao, Large language models: A survey, arXiv preprint arXiv:2402.06196,  (2024).