‘Nous fêterons’ or ‘On va fêter’?
Mimicking Age-Sensitive Variation with ChatGPT
DOI:
https://doi.org/10.62408/ai-ling.v1i1.11Keywords:
sociolinguistics, AI, LLM, gpt4, age, generation, language change, apparent-time, age-grading, variation, French, first-person plural, clitics, future tensesAbstract
This study explores ChatGPT’s capability to mimic age-sensitive linguistic variation in contemporary French, particularly focusing on older adult speech. Our investigation aimed to assess whether ChatGPT could (1) align its naive responses with age-related language use, (2) demonstrate explicit knowledge of age-related linguistic variation, and (3) modify responses based on such knowledge. Using contexts from the LangAge corpus, ChatGPT was prompted to answer questions from the perspective of speakers of different ages (30– 90) in different interview years (1980–2020), with a specific focus on the use of first-person plural subject clitics (nous/on) and future tenses (futur simple/proche). The results revealed that ChatGPT’s responses predominantly favored formal linguistic variants across all ages. While expert-knowledge injection significantly increased the usage of formal variants, there was no systematic influence of age, birth year, or interview year on variant selection. A partial exception is represented by speakers aged 70 for whom ChatGPT displayed heightened linguistic uncertainty in the naive answer. By contrast, the variant distribution in (3) is mainly motivated by ChatGPT’s expert knowledge generated in (2). These findings highlight the potential and limitations of current LLMs in capturing age-specific variation while encouraging further integration of sociolinguistic methods into LLM research.
References
Abouda, Lotfi & Skrovec, Marie. 2015. Du rapport entre formes synthétique et analytique du futur. Étude de la variable modale dans un corpus oral micro-diachronique. Revue de sémantique et pragmatique 38. 35–57.
Adolphs, Leonard & Shuster, Kurt & Urbanek, Jack & Szlam, Arthur & Weston, Jason. 2021. Reason first, then respond: Modular Generation for Knowledge-infused Dialogue. arXiv: 2111.05204 [cs.CL].
Ashby, William J. 1991. When does variation indicate linguistic change in progress? Journal of French Language Studies 1. 1–19. https://doi.org/10.1017/S0959269500000776.
Bally, Charles. 1952. Le langage et la vie. 23rd ed. Kindle-Edition. Geneva: Droz.
Blanche-Benveniste, Claire. 1990. Le français parlé: Études grammaticales. Collection Sciences du langage. Editions du Centre national de la recherche scientifique. Paris: Presses du CNRS.
Blanche-Benveniste, Claire. 1997. Approches de la langue parlée. Paris: Ophrys.
Blondeau, Hélène. 2006. La trajectoire de l’emploi du futur chez une cohorte de Montréalais francophones entre 1971 et 1995. Revue canadienne de Linguistique Appliquée 9. 73–95.
Coveney, Aidan. 2000. Vestiges of ‘nous’ and the 1st Person Plural verb in informal Spoken French. Language Sciences 22(4). 447–481. https://doi.org/10.1016/S0388-0001(00)00014-0.
El Sherbiny Ismail, Eman & Gerstenberg, Annette & Lupica Spagnolo, Marta & Schulz, Friederike & Vandenbroucke, Anne. 2022. L’âge avancé en perspective longitudinale et ses outils: LangAge, un corpus au pluriel. SHS Web Conf. (SHS Web of Conferences) - 8e Congrès Mondial de Linguistique Française 138, 10003. 1–14.
Feldman, Philip & Dant, Aaron & Foulds, James R. & Pan, Shemei. 2022. Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models. arXiv: 2204.07483 [cs.CL].
Gerstenberg, Annette. 2011. Generation und Sprachprofile im höheren Lebensalter. Untersuchungen zum Französischen auf der Basis eines Korpus biographischer Interviews. Analecta Romanica, vol. 76. Frankfurt am Main: Vittorio Klostermann.
Harrell, Frank E. Jr. 2023. Hmisc: Harrell Miscellaneous. R package version 5.1-2, https://hbiostat.org/R/Hmisc/.
Hekkel, Valerie. 2021. Eine soziolinguistische Betrachtung von parce que-Strukturen in Synchronie und Diachronie. PhD thesis. University of Potsdam. https://doi.org/10.25932/publishup-51396.
Hockett, Charles F. 1950. Age-grading and linguistic contiguity. Language 26. 449–459.
Hothorn, Torsten & Hornik, Kurt & Zeileis, Achim. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674. https://doi.org/10.1198/106186006X133933.
Hunter, John D. 2007. Matplotlib: A 2D Graphics environment. Computing in Science & Engineering 9(3). 90–95. https://doi.org/10.1109/mcse.2007.55.
King, Ruth & Nadasdi, Terry. 2003. Back to the future in Acadian French. Journal of French Language Studies 13. 323–337. https://doi.org/10.1017/s0959269503001157.
Kuhn, Max. 2008. Building Predictive Models in R Using the caret Package. Journal of Statistical Software 28(5). 1–26.
Laberge, Suzanne. 1977. Étude de la variation des pronoms sujets définis et indéfinis dans le français parlé à Montréal. PhD thesis. University of Montréal.
Labov, William. 1963. The social motivation of a sound change. Word 19. 273–309. https://doi.org/10.1080/00437956.1963.11659799.
Labov, William. 1966 [2006]. The Social Stratification of English in New York City. Cambridge: Cambridge University Press.
Labov, William. 1978. On the use of the present to explain the past. In Baldi, Philip & Werth, Ronald (eds), Readings in Historical Phonology, 275–312. Pennsylvania: State University Press.
Levshina, Natalia. 2015. How to do Linguistics with R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins. https://doi.org/10.1075/z.195.
Levshina, Natalia. 2021. Conditional Inference Trees and Random Forests. In Paquot, Magali & Gries, Stefan Th. (eds), A Practical Handbook of Corpus Linguistics, 611–643. Cham: Springer. https://doi.org/10.1007/978-3-030-46216-1_25.
Liaw, Andy & Wiener, Matthew. 2002. Classification and Regression by randomForest. R News 2(3). 18–22.
Markl, Nina. 2022. Language variation and algorithmic bias: Understanding algorithmic bias in British English Automatic Speech Recognition. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. FAccT ‘22. Seoul, Republic of Korea: Association for Computing Machinery. 521–534. https://doi.org/10.1145/3531146.3533117.
McKinney, Wes. 2010. Data structures for statistical computing in Python. In van der Walt, Stefan & Millman, Jarrod (eds), Proceedings of the 9th Python in Science Conference, 56–61. https://doi.org/10.25080/Majora-92bf1922-00a.
OpenAI. 2023. GPT-4 Turbo Model (gpt-4-1106-preview). https://www.openai.com (last accessed 7 May 2024).
Ostapenko, Alissa & Wintner, Shuly & Fricke, Melinda & Tsvetkov, Yulia. 2022. Speaker information can guide models to better inductive biases: A case study on predicting Code-Switching. arXiv.2203.08979 [cs.CL].
Paoli, Sandra & Wolfe, Sam. 2022. The GO-future and GO-past periphrases in Gallo-Romance: A comparative investigation. In Ledgeway, Adam & Smith, John Charles & Vincent, Nigel (eds), Periphrasis and Inflexion in Diachrony: A View from Romance, 123–144. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198870807.003.0005.
Poplack, Shana & Turpin, Danielle. 1999. Does the FUTUR have a future in (Canadian) French? Probus 11(1). 133–164. https://doi.org/10.1515/prbs.1999.11.1.133.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria.
Rebotier, Aude. 2015. Le futur périphrastique français avec aller: un renvoi spécifique l’avenir ou un temps en voie de grammaticalisation ? Une approche contrastive. Revue sémantique et Pragmatique 38. 11–34. https://doi.org/10.4000/rsp.502.
Roberts, Nicholas S. 2012. Future temporal reference in Hexagonal French. University of Pennsylvania Working Papers in Linguistics. Selected Papers from NWAV 40 18(2). 97–106. https://hdl.handle.net/20.500.14332/44868.
Salewski, Leonard & Alaniz, Stephan & Rio-Torto, Isabel & Schulz, Eric & Akata, Zeynep. 2023. In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. [Preprint] arXiv:2305.14930 [cs.AI].
Sankoff, Gillian. 2005. Cross-sectional and longitudinal studies. In Ammon, Ulrich & Dittmar, Norbert & Mattheier, Klaus J. & Trudgill, Peter (eds), An International Handbook of the Science of Language and Society, 1003–1013. Berlin & New York: De Gruyter Mouton. https://doi.org/10.1515/9783110171488.2.7.1003.
Sankoff, Gillian &Wagner, Suzanne Evans. 2020. The long tail of language change: A trend and panel study of Québécois French futures. Canadian Journal of Linguistics/Revue canadienne de linguistique 65. 246–275. https://doi.org/10.1017/cnj.2020.7.
Serpollet, Noëlle & Bergounioux, Gabriel & Chesneau, Annie & Walter, Richard. 2007. A Large Reference Corpus for Spoken French: ESLO1 and 2 and its Variations. Proceedings from Corpus Linguistics Conference Series. University of Birmingham.
Staab, Robin & Vero, Mark & Balunović, Mislav & Vechev, Martin. 2023. Beyond memorization: Violating privacy via inference with Large Language Models. arXiv: .2310.07298 [cs.AI].
Söll, Ludwig. 1969: Zur Situierung von on ‘nous’ im neuen Französisch. Romanische Forschungen 81. 535–549.
Söll, Ludwig. 1974 [1980]. Gesprochenes und geschriebenes Französisch. Berlin: Schmidt.
Tagliamonte, Sali A. & Baayen, R. Harald. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change, 24(2). 135–178. https://doi.org/10.1017/s0954394512000129.
The pandas development team. 2020. pandas-dev/pandas: Pandas. Zenodo. Available at: https://doi.org/10.5281/zenodo.8092754
Wagner, Suzanne Evans. 2012. Age Grading in Sociolinguistic Theory. Language and Linguistics Compass 6(6). 371–382. https://doi.org/10.1002/lnc3.343.
Wagner, Suzanne Evans & Sankoff, Gillian. 2011. Age grading in the Montréal French inflected future. Language Variation and Change 23(3). 275–313. https://doi.org/10.1017/s0954394511000111.
Wang, Jianing & Wang, Chengyu & Tan, Chuanqi & Huang, Jun & Gao, Ming. 2023. Boosting In-Context Learning with Factual Knowledge. arXiv: 2309.14771 [cs.CL].
Waskom, Michael L. 2021. seaborn: statistical data visualization. Journal of Open Source Software 6(60). 3021. https://doi.org/10.21105/joss.03021.
Weinrich, Harald.1989. Grammaire textuelle du français. Paris: Didier.
Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. New York: Springer.
Wickham, Hadley & François, Romain & Henry, Lionel & Müller, Kirill & Vaughan, Davis. 2023. dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
Xu, Benfeng & Yang, An & Lin, Junyang & Wang, Quan & Zhang, Yongdong & Mao, Zhendong. 2023. ExpertPrompting: Instructing Large Language Models to be Distinguished Experts. arXiv: 2305.14688 [cs.CL].
Zimmer, Dagmar. 1994. ‘Ça va tu marcher, ça marchera tu pas, je le sais pas.’ Le futur simple et le futur périphrastique dans le français parlé à Montréal. Langues et linguistique 20. 213–226.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Valerie Hekkel, Friederike Schulz, Marta Lupica Spagnolo

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.