How do DeepL and ChatGPT process information structure and pragmatics?

An exploratory case study on topicalized infinitives in Spanish (and Portuguese)

Authors

DOI:

https://doi.org/10.62408/ai-ling.v1i1.8

Keywords:

processing of pragmatic implicatures, automated translation, information structure, topicalized infinitive, Spanish, Portuguese, LLM

Abstract

This case study focuses on a specific construction that exists in both Spanish and Portuguese, but not in English: topicalized infinitives (=TI), e.g., Sp. comer no come ‘as for eating s/he does not eat’. We present three pilot experiments: the first one is a translation task which consists of translating sentences with TI from Spanish to Portuguese and vice versa. DeepL failed in most cases due to contamination by English as a pivot language. The second task is a continuation task: ChatGPT-3.5 was asked to complete sentences that start with a TI. In most cases, natural and adequate continuations starting with pero ‘but’ were generated. Since this task is based on predicting the most likely continuation, this result is not surprising, as this is exactly how the model works. Contrarily, ChatGPT-3.5 demonstrated a clear inability to perform well on the third task, which consisted of drawing pragmatic inferences from exactly the same examples containing a TI that encodes an adversative implicature.

References

Barattieri di San Pietro, Chiara & Frau, Federico & Mangiaterra, Veronica & Bambini, Valentina. 2023. The pragmatic profile of ChatGPT: Assessing the communicative skills of a conversational agent. Sistemi Intelligenti(2). 379–400. https://doi.org/10.1422/108136

Bastos, Ana Cláudia P. 2001. Fazer, eu faço! Topicalização de constituintes verbais em português brasileiro. Campinas, São Paulo: Universidade Estadual de Campinas (Master’s Thesis).

Bechara, Evanildo. 2009. Moderna Gramática Portuguesa. 37th edn. Rio de Janeiro: Nova Fronteira.

Bender, Emily M. & Gebru, Timnit & McMillan-Major, Angelina & Shmitchell, Shmargaret. 2021. On the dangers of stochastic parrots: Can Language Models be too big? . Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. New York: ACM. https://doi.org/10.1145/3442188.3445922 (last accessed 02.02.2024)

CDH = Instituto de Investigación Rafael Lapesa de la Real Academia Española. 2013. Corpus del Nuevo diccionario histórico del español. http://web.frl.es/CNDHE.

CdP = Davies, Mark & Ferreira, Michael. 2006. Corpus do Português: 45 million words, 1300s-1900s. http://www.corpusdoportugues.org/hist-gen

COSER = Fernández-Ordóñez, Inés. 2005−. Corpus Oral y Sonoro del Español Rural. www.corpusrural.es

De Cesare, Anna-Maria. 2021. Répétitions et variations des textes générés: Une analyse linguistique basée sur un corpus d’articles financiers rédigés en français. CHIMERA: Romance Corpora and Linguistic Studies 8. 79–108. https://doi.org/10.15366/chimera2021.8.004 (last accessed: 29.01.2024)

De Cesare, Anna-Maria. 2023. Assessing the quality of ChatGPT’s generated output in light of human-written texts: A corpus study based on textual parameters. CHIMERA: Romance Corpora and Linguistic Studies 10. 179–210. https://revistas.uam.es/chimera/article/view/17979 (last accessed: 29.01.2024)

Enrique-Arias, Andrés & Gerhalter, Katharina. Submitted. Morir morirás ‘for certain you will die’. Strategies for translating the Hebrew infinitive absolute in medieval and early modern Spanish biblical translations.

Escandell Vidal, María V. 1991. Sobre las reduplicaciones léxicas. LEA: Lingüística española actual 13(1). 71–86.

Gerhalter, Katharina. In press. Escrever não escrevo, mas ler um livro, ou um jornal, uns versos, leio. A corpus-linguistic approach to topicalized infinitives in Portuguese. In Calderón Campos, Miguel & Vaamonde, Gael (eds), Linguistic Corpora and Big Data in Spanish and Portuguese. Berlin: De Gruyter.

Gerhalter, Katharina. In preparation. A diachronic corpus study on topicalized infinitives in Romance languages.

Hadfield, Jeremy. 2022. Why Large Language Models will not understand human language. https://jeremyhadfield.com/why-llms-will-not-understand-language (last accessed: 29.01.2024)

Hein, Johannes. 2020. Verb Doubling and Dummy Verb. Gap Avoidance Strategies in Verbal Fronting. Berlin: De Gruyter. https://doi.org/10.1515/9783110635607

Hein, Johannes. 2021. Verb movement and the lack of verb-doubling VP-topicalization in Germanic. The Journal of Comparative Germanic Linguistics 24(1). 89–144. https://doi.org/10.1007/s10828-021-09125-5

Hu, Jennifer & Floyd, Sammy & Jouravlev, Olessia & Fedorenko, Evelina & Gibson, Edward. 2023. A fine-grained comparison of pragmatic language understanding in humans and language models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 4194–4213. https://aclanthology.org/2023.acl-long.230 (last accessed: 29.01.2024). https://doi.org/10.18653/v1/2023.acl-long.230

Kim, Zae M. & Taylor, David E. & Kang, Dongyeop. 2023. “Is the Pope Catholic?” Applying chain-of-thought reasoning to understanding conversational implicatures. arXiv. https://arxiv.org/pdf/2305.13826 (last accessed: 29.01.2024)

Kocoń, Jan & Cichecki, Igor & Kaszyca, Oliwier & Kochanek, Mateusz & Szydło, Dominika & Baran, Joanna & Bielaniewicz, Julita & Gruza, Marcin & Janz, Arkadiusz & Kanclerz, Kamil et al. 2023. ChatGPT: Jack of all trades, master of none. Information Fusion 99. 101861. https://doi.org/10.1016/j.inffus.2023.101861 (last accessed: 29.01.2024)

Leonetti, Manuel & Escandell Vidal, María V. 2021. La estructura informativa. Preguntas frecuentes. In Leonetti, Manuel & Escandell Vidal, María V. (eds), La estructura informativa, 15–181. Madrid: Visor Libros. https://hdl.handle.net/20.500.14352/99088

Meier-Vieracker, Simon. 2024. Automated football match reports as models of textuality. Text & Talk. https://doi.org/10.1515/text-2022-0173 (last accessed: 29.01.2024)

Muñoz Pérez, Carlos & Verdecchia, Matías. 2022. Predicate doubling in Spanish. On how discourse may mimic syntactic movement. Natural Language & Linguistic Theory 40. 1159–1200. https://doi.org/10.1007/s11049-022-09536-3

Narbona Jiménez, Antonio. 2015. Sintaxis del español coloquial. Sevilla: Editorial Universidad de Sevilla.

Nieto García, Paola & Cases Berbel, Elke. 2022. Traducción de DeepL de los sujetos nulos de un texto literario hacia lenguas románicas pro drop y no pro drop. CLINA Revista Interdisciplinaria de Traducción Interpretación y Comunicación Intercultural 7(2). 41–59. https://doi.org/10.14201/clina2022724159 (last accessed: 29.01.2024)

Piantadosi, Steven. 2024. Modern language models refute Chomsky’s approach to language. In Gibson, Edward & Poliak, Moshe (eds.), From fieldwork to linguistic theory: A tribute to Dan Everett. Berlin: Language Science Press. https://langsci-press.org/catalog/book/434 (last accessed: 29.01.2024). https://doi.org/10.5281/zenodo.12665933

Qiu, Zhuang & Duan, Xufeng & Cai, Zhenguang G. under review. Pragmatic implicature processing in ChatGPT. Cognition. Preprint: https://doi.org/10.31234/osf.io/qtbh9 (last accessed: 29.01.2024)

Reich, Uli. 2011. Frontalizaciones de la semántica verbal en español y portugués. (Paper presented at the 18th Deutscher Hispanistentag, Passau, 23–26 March 2011).

Ruis, Laura & Khan, Akbir & Biderman, Stella & Hooker, Sara & Rocktäschel, Tim & Grefenstette, Edward. 2022. Large language models are not zero-shot communicators. arXiv. https://arxiv.org/pdf/2210.14986.pdf (last accessed: 29.01.2024)

Šorak, Vanessa. 2020. Die Kontaminierung maschineller Übersetzungsprozesse durch das Englische. Heidelberg: Ruprecht-Karls-Universität Heidelberg (Master’s Thesis).

Srivastava, Aarohi & Rastogi, Abhinav & Rao, Abhishek & Shoeb, Abu Awal Md & Abid, Abubakar & Fisch, Adam & Brown, Adam R. & Santoro, Adam & Gupta, Aditya & Garriga-Alonso, Adrià et al. 2023. Beyond the imitation game. Quantifying and extrapolating the capabilities of language models [version: June 2023]. Transactions on Machine Learning Research 5. https://jmlr.org/tmlr/papers (last accessed 29.01.2024)

Stark, Elisabeth. 1997. Voranstellungsstrukturen und „topic“-Markierung im Französischen. Mit einem Ausblick auf das Italienische. Tübingen: Narr.

Valenzuela, Javier & Hilferty, Joseph & Garachana-Camarero, Mar. 2005. On the reality of constructions. The Spanish reduplicative-topic construction. Annual Review of Cognitive Linguistics 3. 201–215. https://doi.org/10.1075/arcl.3.11val

Vicente, Luis. 2007. The Syntax of Heads and Phrases. A Study of Verb (Phrase) Fronting. Leiden: Leiden University (Ph.D. Dissertation)

Vigier-Moreno, Francisco J. & Pérez-Macías, Lorena. 2022. Assessing neural machine translation of court documents. A case study on the translation of a Spanish remand order into English. Revista de Llengua i Dret 78. 73–91. https://doi.org/10.2436/rld.i78.2022.3691 (last accessed 29.01.2024)

Zheng, Zilong & Qiu, Shuwen & Fan, Lifeng & Zhu, Yixin & Zhu, Song-Chun. 2021. GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning. In Zong, Chengqing & Xia, Fei & Li, Wenjie & Navigli, Roberto (eds), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2074–2085. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/2021.findings-acl (last accessed 29.01.2024). https://doi.org/10.18653/v1/2021.findings-acl.182

Published

2024-07-05

How to Cite

Gerhalter, K. (2024). How do DeepL and ChatGPT process information structure and pragmatics? An exploratory case study on topicalized infinitives in Spanish (and Portuguese). AI-Linguistica. Linguistic Studies on AI-Generated Texts and Discourses, 1(1). https://doi.org/10.62408/ai-ling.v1i1.8

Issue

Section

Full-Length Article