Prompt Engineering for evaluators: optimizing LLMs to judge linguistic proficiency

Lorenzo Gregori

doi:10.62408/ai-ling.v2i2.22

Prompt Engineering for evaluators: optimizing LLMs to judge linguistic proficiency

Authors

Lorenzo Gregori University of Florence https://orcid.org/0000-0001-9208-2311

DOI:

https://doi.org/10.62408/ai-ling.v2i2.22

Keywords:

Large Language Models, Prompt Engineering, LLM-as-a-judge, evaluation

Abstract

Prompt Engineering, the practice of optimizing the question made to a Large Language Model, is closely linked to the evaluation procedures. Depending on the type of task we are performing through LLMs, we can have an evaluation metric with high or low reliability, making Prompt Engineering more or less effective. LLM-as-a-judge represents a possible solution to perform Prompt Engineering in tasks that are hard to evaluate, although the reliability of this practice is not granted, depending on the task and the language model. This paper presents an evaluation of general purpose LLMs in an essay-scoring task using state-of-the-art small models. In particular, the ability of language models to assign proficiency levels to short essays written by Italian L2 learners is evaluated. Test data with expert annotations of CEFR scores are extracted from Kolipsi-II corpus. Several prompting techniques have been used to analyze the impact of Prompt Engineering on this task. Results show a wide difference in accuracy among the three LLMs and that choosing the right prompt radically changes their rating abilities.

Downloads

Gregori_2025_AI-Linguistica

Published

2025-07-25

How to Cite

Gregori, L. (2025). Prompt Engineering for evaluators: optimizing LLMs to judge linguistic proficiency. AI-Linguistica. Linguistic Studies on AI-Generated Texts and Discourses, 2(2). https://doi.org/10.62408/ai-ling.v2i2.22

Download Citation

Issue

Vol. 2 No. 2 (2025): AI-Driven Linguistic Studies: From Text Simplification to Implicit Content Detection

Section

Short-Length Article

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Prompt Engineering for evaluators: optimizing LLMs to judge linguistic proficiency

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

DOAJ-SEAL

Information

Make a Submission