Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics

This project was funded by UKRI-AHRC and the Irish Research Council under the ‘UK-Ireland Collaboration in the Digital Humanities Research Grants Call’ (grant numbers AH/W001934/1 and IRC/W001934/1).

August 2021–Dec 2024

This project fused deep qualitative analysis with cutting-edge computational methodologies to decode, interpret and curate the hidden heritages of Gaelic traditional narrative. Leveraging advances in Natural Language Processing, the consortium digitised, converted and disseminated a vast corpus of folklore manuscripts in Irish and Scottish Gaelic.

The project team created, analysed and disseminated a large text corpus of folktales from the Tale Archive of the School of Scottish Studies Archives and from the Main Manuscript & Schools’ Collections of the Irish National Folklore Collection. The creation of this corpus involved: scanning c.80k manuscript pages (including pages scanned by the Dúchas digitisation project), recognising handwritten text on these pages, and normalising some non-standard text. The corpus was then annotated with document-level metadata, including some motif-level annotation for select Aarne–Thompson (AT) tale-types.

Analysis of the corpus is being carried out using data mining and phylogenetic techniques. Both the data mining and phylogenetic workstreams encompass the entire corpus, however, the phylogenetic workstream is also focusing on three folktale types as case studies, namely AT 400 ‘The Search for the Lost Wife’, AT 425 ‘The Search for the Lost Husband’, and AT 503 ‘The Gifts of the Little People’. The results of these analyses will be presented in the book Decoding the Oral Traditions of Scotland and Ireland: From Manuscripts to Models, to be published by Edinburgh University Press in 2026.