Books of hours, with more than 5,000 witnesses preserved throughout the world, form a crucial ensemble for understanding the medieval mental universe. However, their textual content has scarcely been studied, even though the production of such a large number of manuscripts is a major cultural and industrial phenomenon that reveals profound changes in the religious world of the late Middle Ages.

Within the framework of the HORAE research project, supported by the ANR, TEKLIA has developed, in collaboration with Dominique Stutzmann of the IRHT, an automatic document recognition system adapted to books of hours. Using the IIIF protocol, the IRHT built up a corpus of 1,158 handwritten and printed books of hours (300,000 pages), digitized and held by various institutions. After an analysis of the structure of the pages (miniatures, initials, text lines, headings, decoration), an automatic transcription of the handwritten text was produced.

On the basis of this transcription, automatic identification of texts (prayers, psalms, etc.) is underway. The long-term objective is be to study, thanks to this corpus, the diffusion and circulation of devotional and liturgical texts transmitted by books of hours in the Middle Ages, in order to better understand the culture and faith of the 13th-16th centuries.

Document layout analysis (DLA) Handwritten text recognition (HTR) Text identification Indexing
Book of hours, 15th century. Bibliothèque nationale de France, NAL 3110.
Book of hours, 15th century. Bibliothèque nationale de France, NAL 3110.