Benchmarking AI acceptability and grammaticality in German: A study of ChatGPT and human judgments
Authors
Nicholas CatassoAbstract
The rapid development of large language models has opened new avenues for linguistic research, including areas traditionally reliant on native-speaker intuitions. One such domain is grammaticality and acceptability judgment, where speakers assess whether sentences are structurally well-formed and contextually appropriate. This study investigates the extent to which ChatGPT-4 can approximate human judgments in German, focusing on a diverse range of grammatical and usage-related phenomena. A carefully designed set of test items was presented to both the model and native speakers, allowing for a direct comparison. The results show a high degree of alignment in many cases, but also reveal systematic divergences, particularly in contexts involving gradience, sociolinguistic markedness or context-dependent acceptability. These findings demonstrate both the analytical potential and the current limitations of large language models in linguistic research, and contribute to ongoing discussions about their ability to approximate native speaker competence.
