Top
Women working on a large machine

Simulating the Evolution of Grammatical Gender from Latin to Old Occitan: A Computational Approach Using LSTM with Attention

Authors

Files

Wiedner_Schöffel_AI-Linguistica_2026

Abstract

In this article, we present a first approach on simulating the change of gender from Latin to Old Occitan. The reduction of genders in the transition to the Romance languages is one of the most important developments in the nominal system. In order to simulate this language change, we used varied data, taken from the most exhaustive Old Occitan dictionary (DOM), from an edition of a juridical text as well as from the transcriptions of two manuscripts taken from COMETA (Wiedner 2025). The nouns with information on their etymon and the gender in Latin and Old Occitan were then used as input for the simulation. Building on previous simulation-based approaches (Polinsky and Van Everbroeck 2003), our model provides a form-based, data-driven perspective on how gender marking might evolve under the interaction of morphological cues. Despite its inherent simplifications, the model achieved good predictive accuracy and yielded interpretable tendencies in the distribution of attention, suggesting that even at the level of orthographic form, systematic regularities relevant to gender can be captured computationally.

Wiedner_Schöffel_AI-Linguistica_2026

References

Allassonnière-Tang, Marc & Basirat, Ali. 2020. Word embedding and neural network on grammatical gender – A case study of Swedish. arXiv:2007.14222. https://doi.org/10.48550/arXiv.2007.14222

Allassonnière-Tang, Marc & Brown, Dunstan & Fedden, Sebastian. 2021. Testing Semantic Dominance in Mian Gender: Three Machine Learning Models. Oceanic Linguistics 60 (2), 302–334. https://hal.science/hal-03509042v1/document DOI: https://doi.org/10.1353/ol.2021.0018

Blom, Alderik H. 2023. Gaulish in the Late Empire (c. 200–600 CE). In Mullen, Alex & Woudhuysen, George (eds.): Languages and Communities in the Late-Roman and Post-Imperial Western Provinces. Oxford: Oxford University Press, 129–154. DOI: https://doi.org/10.1093/oso/9780198888956.003.0005

Burns, Patrick J. 2023. LatinCy: Synthetic Trained Pipelines for Latin NLP. ArXiv abs/2305.04365. https://doi.org/10.48550/arXiv.2305.04365

Chambon, Jean-Pierre. 2003. La déclinaison en ancien occitan, ou: comment s’en débarrasser ? : Une réanalyse descriptive non orthodoxe de la flexion substantivale. Revue de Linguistique Romane 67 (267–268): 343–364.

Chircu-Buftea, Adrian. 2011. Précis de morphologie romane. Cluj-Napoca: Casa Cartii de Stiinta.

Coker, Amy. 2009. Analogical change and grammatical gender in ancient Greek. Journal of Greek Linguistics 9 (1), 34–55. https://www.sciencedirect.com/org/science/article/pii/S1566584409000026 DOI: https://doi.org/10.1163/156658409X12529372103263

COMETA = Wiedner, Marinus (ed.). 2025. Corpus de l’occitan médiéval comparative et annoté: Provence et Languedoc. https://zenodo.org/records/15300719 (last accessed 31.10.2025).

Corbett, Greville G. 2006. Agreement. Cambridge: Cambridge Universitiy Press.

Cotterell, Ryan & Kirov, Christo & Hulden, Mans & Eisner, Jason. 2018. On the Diachronic Stability of Irregularity in Inflectional Morphology. arXiv:1804.08262. https://doi.org/10.48550/arXiv.1804.08262

Cucerzan, Silviu & Yarowsky, David. 2003. Minimally Supervised Induction of Grammatical Gender. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 40–47. https://aclanthology.org/N03-1006/ DOI: https://doi.org/10.3115/1073445.1073451

dDOM = Stimm, Helmut & Stempel, Wolf-Dieter & Selig, Maria (eds.).1960–. Digitaler Zettelkasten des Dictionnaire de l’occitan medieval, München: Bayerische Akademie der Wissenschaften. https://dienste.badw.de:9999/dom/db (last accessed 31.10.2025).

Derrer, Felix. 1974. Lo codi : eine Summa codicis in provenzalischer Sprache aus dem XII. Jahrhundert; die provenzalische Fassung der Handschrift A; (Sorbonne 632); Vorarbeiten zu einer kritischen Textausgabe. Zürich: Juris.

DOMél = Stimm, Helmut & Stempel, Wolf-Dieter & Selig, Maria (eds.). 1960–. Dictionnaire de l’occitan medieval, Tübingen: Niemeyer. http://www.dom-en-ligne.de/ (last accessed 31.10.2025).

Donaldson, Bryan & Sibille, Jean. 2024. 8 Histoire interne de la langue. In Esher, Louise & Sibille, Jean (eds.). Manuel de linguistique occitane, 193–230. Berlin, Boston: De Gruyter. DOI: https://doi.org/10.1515/9783110733433-009

FEW = Wartburg, Walter Von. 1922–1987. Französisches Etymologisches Wörterbuch. Eine darstellung des galloromanischen Sprachschatzes, 25 vols., Leipzig/Bonn/Bâle: Teubner/Klopp/Zbinden. https://lecteur-few.atilf.fr/

Hare, Mary & Elman, Jeffrey L. 1995. Learning and morphological change. Cognition 56, 61–98. https://www.sciencedirect.com/science/article/pii/0010027794006555 DOI: https://doi.org/10.1016/0010-0277(94)00655-5

Hochreiter, Sepp & Schmidhuber, Jürgen. 1997. Long short-term memory. In Neural computation 9 (8), 1735–1780. https://www.bioinf.jku.at/publications/older/2604.pdf DOI: https://doi.org/10.1162/neco.1997.9.8.1735

Hockett, Charles Francis. 1962. A course in modern linguistics (4th edition). New York: Macmillan.

Jain, Sarthak & Wallace, Byron C. 2019. Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 3543–3556. https://doi.org/10.48550/arXiv.1902.10186

Jensen, Frede. 1973. Désaccord entre genre et flexion: Les substantifs masculins à desinence féminine en provençal. Revue des langues romanes 80: 393–404.

Jensen, Frede. 1976. The Old Provençal noun and adjective declension. Odense: Odense University Press.

Loporcaro, Michele. 2015. Vowel length from Latin to Romance. Oxford: Oxford University Press. DOI: https://doi.org/10.1093/acprof:oso/9780199656554.001.0001

Loporcaro, Michele. 2018. Gender from Latin to Romance history, geography, typology. Oxford: Oxford University Press. DOI: https://doi.org/10.1093/oso/9780199656547.003.0007

Marzo, Daniela & Wiedner, Marinus. 2025. Remarks on Grammatical Gender in Romance. In Linzmeier, Laura & Teixera Kalkhoff, Alexander M. & Wiesinger, Evelyn (eds.): Parla, e sia breve e arguto. Festschrift für Maria Selig / Studies in honor of Maria Selig (ScriptOralia 147). Tübingen: Narr, 201–207.

Panhuis, Dirk. 2015. Lateinische Grammatik. Berlin/Boston: De Gruyter. DOI: https://doi.org/10.1515/9783110405705

Pinkster, Harm. 2015. The Oxford Latin Syntax Volume I. Oxford: Oxford University Press. DOI: https://doi.org/10.1093/acprof:oso/9780199283613.001.0001

Polinsky, Maria & van Everbroeck, Ezra. 2003. Development of Gender Classifications: Modeling the Historical Change from Latin to French. Language 79(2): 356–390. https://www.jstor.org/stable/4489422?seq=1 DOI: https://doi.org/10.1353/lan.2003.0131

Sahai, Saumya & Sharma, Dravyansh. 2021. Predicting and Explaining French Grammatical Gender. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, 90–96. https://aclanthology.org/2021.sigtyp-1.9/ DOI: https://doi.org/10.18653/v1/2021.sigtyp-1.9

Schöffel, Matthias & Wiedner, Marinus & Garcés Arias, Esteban & Ruppert, Paula & Heumann, Christian & Aßenmacher, Matthias. 2025. Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan. In Hämäläinen, Mika & Öhman, Emily & Bizzoni, Yuri & Miyagawa, So & Alnajjar, Khalid (eds.), Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, 334–349. https://aclanthology.org/2025.nlp4dh-1.30/ DOI: https://doi.org/10.18653/v1/2025.nlp4dh-1.30

Schöffel, Matthias & Garcés Arias, Esteban & Wiedner, Marinus & Ruppert, Paula & Li, Meimingwei & Heumann, Christian & Aßenmacher, Matthias. 2025. Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages. arXiv 2506.17715. https://doi.org/10.48550/arXiv.2506.17715

Serrano, Sofia & Smith, Noah A. 2019. Is Attention Interpretable?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2931–2951. https://aclanthology.org/P19-1282/ DOI: https://doi.org/10.18653/v1/P19-1282

ThLL = Thesaurus linguae latinae. Berlin/Boston: De Gruyter. https://publikationen.badw.de/de/thesaurus/lemmata (last accessed 31.10.2025).

Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan N. & Kaiser, Łukasz & Polosukhin, Illia. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems 30, 5998–6008. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Veeman, Hartger & Basirat, Ali. 2020. An exploration of the encoding of grammatical gender in word embeddings. arXiv:2008.01946. https://doi.org/10.48550/arXiv.2008.01946

Wang, Ziwen. 2023. Pérdida del género neutro del latín al hispanorromance altomedieval. Una reconstrucción panrománica. Barcelona: Tesi. https://ddd.uab.cat/pub/tesis/2024/hdl_10803_691297/ziwa1de1.pdf

Wichers Schreur, Jesse & Allassonnière-Tang, Marc & Bellamy, Kate & Rochant, Neige. 2022. Predicting Grammatical Gender in Nakh Languages: Three Methods Compared. Linguistic Typology at the Crossroads 2 (2), 93–126. https://doi.org/10.6092/issn.2785-0943/14545

Wiedner, Marinus (2023): Old Occitan handwriting. (Modell-Nr. 52822, CER=3,51%), PyLaia-Modell for handwritten Occitan from the 13th and 14th century. [Computer software] https://readcoop.eu/model/oldoccitan-handwriting/ (last accessed 31.10.2025).

Wiedner, Marinus. Forthcoming. Doublons de genre en occitan médiéval : huit études de cas sur corpus. In Dufter, Andreas & Wissner, Inka (eds.): La variation en diachronie : regards sur la Galloromania (Beihefte zur Zeitschrift für romanische Philologie). Berlin/Boston: De Gruyter.

Wiedner, Marinus. Submitted. A Corpus Study on Grammatical Gender in Old Occitan. Freiburg im Breisgau: Inauguraldissertation an der Universität Freiburg.

Wiegreffe, Sarah & Pinter, Yuval. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 11–20. https://aclanthology.org/D19-1002/ DOI: https://doi.org/10.18653/v1/D19-1002

Williams, Adina & Pimentel, Tiago & Blix, Hagen & McCarthy, Arya D. & Chodroff, Eleanor & Cotterell, Ryan. 2020. Predicting Declension Class from Form and Meaning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6682–6695. https://aclanthology.org/2020.acl-main.597/ DOI: https://doi.org/10.18653/v1/2020.acl-main.597

Details