Automating Semantic Annotation in Low-Resource Languages: Evaluating GPT-4 for Urdu NLP

Gohar Rahman

doi:10.62408/ai-ling.v3i1.40

Rahman_2026_AI-Linguistica

DOI

https://doi.org/10.62408/ai-ling.v3i1.40

Keywords

GPT-4, semantic annotation, Urdu NLP, low-resource languages, prompt engineering, named entity recognition, sentiment analysis, semantic similarity

Published

May 17, 2026

Journal

Published in Vol. 3 No. 1 (2026) of AI-Linguistica. Linguistic Studies on AI-Generated Texts and Discourses.

AI-Linguistica. Linguistic Studies on AI-Generated Texts and Discourses is a new scholarly journal aiming at providing a publishing plateform for researchers from all areas of Linguistics (interfacing with neighboring fields: Communication Science, Media and Journalism Studies, Computational Linguistics) to reflect on generated texts from a variety of perspectives: theoretical, descriptive, and applied.

We understand ‘generated texts’ in a broad sense, including formats as diverse as texts generated by Large Language Models, AI-powered smart agents (i.e. chatbots, voice assistants, social bots etc.), writing assistance tools, template-based software, and neural machine translation services.

About the Journal

Abstract

Semantic annotation is a fundamental yet labor-intensive process essential for building effective Natural Language Processing (NLP) systems, particularly for low-resource languages such as Urdu. The limited availability of large, manually annotated datasets has constrained advancements in Urdu NLP. This study explores the potential of automating semantic annotation using GPT-4, a state-of-the-art large language model (LLM), through structured prompt engineering without task-specific fine-tuning. A corpus of 50,000 Urdu sentences spanning news articles, social media posts, and literary texts was used to evaluate three core tasks: Named Entity Recognition (NER), semantic similarity, and sentiment analysis. GPT-4 demonstrated strong performance, achieving an F1-score of 92% for NER, a Pearson correlation of 0.87 for semantic similarity, and an accuracy of 88% with a macro-F1 of 87% for sentiment classification. These results indicate that LLMs guided by instruction-based prompts can reliably perform complex NLP tasks in low-resource contexts. Nonetheless, challenges with idiomatic expressions, sarcasm, and rare entities highlight the need for carefully designed prompts and potential human-AI collaboration.

Rahman_2026_AI-Linguistica

References

Ali, S., Khan, M. A., & Hussain, S. (2021). Challenges in developing NLP resources for Urdu: A review. Journal of Language Engineering, 15(2), 45–59.

Anam, R., Anwar, M. W., Jamal, M. H., Bajwa, U. I., De La Torre Diez, I., Alvarado, E. S., Flores, E. S., & Ashraf, I. (2024). A deep learning approach for Named Entity Recognition in Urdu language. PLoS ONE, 19(3), e0300725. https://doi.org/10.1371/journal.pone.0300725

Arif, S., Azeemi, A. H., Raza, A. A., & Athar, A. (2024). Generalists vs. specialists: Evaluating large language models for Urdu. Findings of the Association for Computational Linguistics: EMNLP 2024, 426–435. https://aclanthology.org/2024.findings-emnlp.426.pdf

Awan, M. D. A., Ali, S., Samad, A., Iqbal, N., Missen, M. M. S., & Ullah, N. (2021). Sentence classification using N-Grams in Urdu language text. Scientific Programming, 2021, 1–11. https://doi.org/10.1155/2021/1296076

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shinn, N., Mazare, P., & Langston, P. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2005.14165

Ehsan, T., & Solorio, T. (2022). Automated semantic annotation for low-resource languages: Challenges and solutions. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), 1234–1245. https://aclanthology.org/2022.emnlp-main.123.pdf

Ehsan, T., & Solorio, T. (2025). Enhancing NER performance in low-resource Pakistani languages using cross-lingual data augmentation. arXiv. https://doi.org/10.48550/arXiv.2504.08792

Fatima, N., et al. (2024). Leveraging GPT-4 for semantic annotation in low-resource languages. Proceedings of the 2024 Conference on Natural Language Processing, 102–111. https://aclanthology.org/2024.nlp-conference.102.pdf

Fatima, S. M., & Fatima, S. H. (2024). Semantic change and drift in the Urdu language: A critical analysis of AI translation and large language models. Migration Letters, 21(S13), 685–694. https://migrationletters.com/index.php/ml/article/view/11141/7459

Khalid, S., & Anam, S. (2021). Developing NLP resources for Urdu: A comprehensive survey. Journal of Computational Linguistics, 37(4), 112–129.

Khan, S., Qasim, I., Khan, W., Khan, A., Khan, J. A., Qahmash, A., & Ghadi, Y. Y. (2024). An automated approach to identify sarcasm in low-resource language. PLoS ONE, 19(12), e0307186. https://doi.org/10.1371/journal.pone.0307186

Khurana, S., Dawalatabad, N., Laurent, A., Vicente, L., Gimeno, P., Mingote, V., & Glass, J. (2024). Cross-Lingual Transfer Learning for Low-Resource Speech Translation. 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 670–674. https://doi.org/10.1109/icasspw62465.2024.10626683

McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/BM.2012.031

Memood, F., Ghani, M. U., Ibrahim, M. A., Shehzadi, R., & Asim, M. N. (2020, March 11). A precisely Xtreme-Multi channel hybrid approach for Roman Urdu sentiment analysis. arXiv.org. https://arxiv.org/abs/2003.05443

Muhammad, K. B., & Burney, S. M. A. (2023). Innovations in Urdu sentiment analysis using machine and deep learning techniques for two-class classification of symmetric datasets. Symmetry, 15(5), 1027. https://doi.org/10.3390/sym15051027

Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69. https://doi.org/10.1145/1459352.1459355

Nguyen, X., Aljunied, S. M., Joty, S., & Bing, L. (2023). Democratizing LLMs for low-resource languages by leveraging their English dominant abilities with linguistically-diverse prompts. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.11372

OpenAI. (2023). GPT-4 technical report. https://openai.com/research/gpt-4

OpenAI. (2024). Best practices for prompt engineering with the OpenAI API. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api

Shafi, J., Iqbal, H. R., Nawab, R. M. A., & Rayson, P. (2022). UNLT: Urdu Natural Language Toolkit. Natural Language Engineering, 29(4), 942–977. https://doi.org/10.1017/s1351324921000425

Ulku, I., & Akagündüz, E. (2022). A survey on deep learning-based architectures for semantic segmentation on 2D images. Applied Artificial Intelligence, 36(1). https://doi.org/10.1080/08839514.2022.2032924

Zheng, W., Lee, R. K., Liu, Z., Wu, K., Aw, A., & Zou, B. (2025, July 17). CCL-XCoT: An efficient cross-lingual knowledge transfer method for mitigating hallucination generation. arXiv.org. https://arxiv.org/abs/2507.14239

Zuhra, F. T., & Saleem, K. (2025). Towards development of new language resource for Urdu: The large vocabulary word embeddings. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3748308

Details

DOI

https://doi.org/10.62408/ai-ling.v3i1.40

Published

May 17, 2026

Issue

Vol. 3 No. 1 (2026): Natural Language and AI. New Perspectives for Linguistic Studies

Section

Short-Length Article

Keywords

GPT-4, semantic annotation, Urdu NLP, low-resource languages, prompt engineering, named entity recognition, sentiment analysis, semantic similarity

How to Cite

Rahman, G. (2026). Automating Semantic Annotation in Low-Resource Languages: Evaluating GPT-4 for Urdu NLP. AI-Linguistica. Linguistic Studies on AI-Generated Texts and Discourses, 3(1). https://doi.org/10.62408/ai-ling.v3i1.40

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Automating Semantic Annotation in Low-Resource Languages: Evaluating GPT-4 for Urdu NLP

Authors

Files

Key Information

DOI

Keywords

Published

Journal

Abstract

References

Details

DOI

Published

Issue

Section

Keywords

How to Cite

License