Top
Women working on a large machine

Automating Semantic Annotation in Low-Resource Languages: Evaluating GPT-4 for Urdu NLP

Authors

Files

Rahman_2026_AI-Linguistica

Abstract

Semantic annotation is a fundamental yet labor-intensive process essential for building effective Natural Language Processing (NLP) systems, particularly for low-resource languages such as Urdu. The limited availability of large, manually annotated datasets has constrained advancements in Urdu NLP. This study explores the potential of automating semantic annotation using GPT-4, a state-of-the-art large language model (LLM), through structured prompt engineering without task-specific fine-tuning. A corpus of 50,000 Urdu sentences spanning news articles, social media posts, and literary texts was used to evaluate three core tasks: Named Entity Recognition (NER), semantic similarity, and sentiment analysis. GPT-4 demonstrated strong performance, achieving an F1-score of 92% for NER, a Pearson correlation of 0.87 for semantic similarity, and an accuracy of 88% with a macro-F1 of 87% for sentiment classification. These results indicate that LLMs guided by instruction-based prompts can reliably perform complex NLP tasks in low-resource contexts. Nonetheless, challenges with idiomatic expressions, sarcasm, and rare entities highlight the need for carefully designed prompts and potential human-AI collaboration.

Rahman_2026_AI-Linguistica

References

Ali, S., Khan, M. A., & Hussain, S. (2021). Challenges in developing NLP resources for Urdu: A review. Journal of Language Engineering, 15(2), 45–59.

Anam, R., Anwar, M. W., Jamal, M. H., Bajwa, U. I., De La Torre Diez, I., Alvarado, E. S., Flores, E. S., & Ashraf, I. (2024). A deep learning approach for Named Entity Recognition in Urdu language. PLoS ONE, 19(3), e0300725. https://doi.org/10.1371/journal.pone.0300725

Arif, S., Azeemi, A. H., Raza, A. A., & Athar, A. (2024). Generalists vs. specialists: Evaluating large language models for Urdu. Findings of the Association for Computational Linguistics: EMNLP 2024, 426–435. https://aclanthology.org/2024.findings-emnlp.426.pdf

Awan, M. D. A., Ali, S., Samad, A., Iqbal, N., Missen, M. M. S., & Ullah, N. (2021). Sentence classification using N-Grams in Urdu language text. Scientific Programming, 2021, 1–11. https://doi.org/10.1155/2021/1296076

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shinn, N., Mazare, P., & Langston, P. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2005.14165

Ehsan, T., & Solorio, T. (2022). Automated semantic annotation for low-resource languages: Challenges and solutions. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), 1234–1245. https://aclanthology.org/2022.emnlp-main.123.pdf

Ehsan, T., & Solorio, T. (2025). Enhancing NER performance in low-resource Pakistani languages using cross-lingual data augmentation. arXiv. https://doi.org/10.48550/arXiv.2504.08792

Fatima, N., et al. (2024). Leveraging GPT-4 for semantic annotation in low-resource languages. Proceedings of the 2024 Conference on Natural Language Processing, 102–111. https://aclanthology.org/2024.nlp-conference.102.pdf

Fatima, S. M., & Fatima, S. H. (2024). Semantic change and drift in the Urdu language: A critical analysis of AI translation and large language models. Migration Letters, 21(S13), 685–694. https://migrationletters.com/index.php/ml/article/view/11141/7459

Khalid, S., & Anam, S. (2021). Developing NLP resources for Urdu: A comprehensive survey. Journal of Computational Linguistics, 37(4), 112–129.

Khan, S., Qasim, I., Khan, W., Khan, A., Khan, J. A., Qahmash, A., & Ghadi, Y. Y. (2024). An automated approach to identify sarcasm in low-resource language. PLoS ONE, 19(12), e0307186. https://doi.org/10.1371/journal.pone.0307186

Khurana, S., Dawalatabad, N., Laurent, A., Vicente, L., Gimeno, P., Mingote, V., & Glass, J. (2024). Cross-Lingual Transfer Learning for Low-Resource Speech Translation. 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 670–674. https://doi.org/10.1109/icasspw62465.2024.10626683

McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/BM.2012.031

Memood, F., Ghani, M. U., Ibrahim, M. A., Shehzadi, R., & Asim, M. N. (2020, March 11). A precisely Xtreme-Multi channel hybrid approach for Roman Urdu sentiment analysis. arXiv.org. https://arxiv.org/abs/2003.05443

Muhammad, K. B., & Burney, S. M. A. (2023). Innovations in Urdu sentiment analysis using machine and deep learning techniques for two-class classification of symmetric datasets. Symmetry, 15(5), 1027. https://doi.org/10.3390/sym15051027

Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69. https://doi.org/10.1145/1459352.1459355

Nguyen, X., Aljunied, S. M., Joty, S., & Bing, L. (2023). Democratizing LLMs for low-resource languages by leveraging their English dominant abilities with linguistically-diverse prompts. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.11372

OpenAI. (2023). GPT-4 technical report. https://openai.com/research/gpt-4

OpenAI. (2024). Best practices for prompt engineering with the OpenAI API. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api

Shafi, J., Iqbal, H. R., Nawab, R. M. A., & Rayson, P. (2022). UNLT: Urdu Natural Language Toolkit. Natural Language Engineering, 29(4), 942–977. https://doi.org/10.1017/s1351324921000425

Ulku, I., & Akagündüz, E. (2022). A survey on deep learning-based architectures for semantic segmentation on 2D images. Applied Artificial Intelligence, 36(1). https://doi.org/10.1080/08839514.2022.2032924

Zheng, W., Lee, R. K., Liu, Z., Wu, K., Aw, A., & Zou, B. (2025, July 17). CCL-XCoT: An efficient cross-lingual knowledge transfer method for mitigating hallucination generation. arXiv.org. https://arxiv.org/abs/2507.14239

Zuhra, F. T., & Saleem, K. (2025). Towards development of new language resource for Urdu: The large vocabulary word embeddings. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3748308

Details