Medieval manuscripts, and in particular charters and cartularies, are some of the most important witnesses of European cultural heritage. Thanks to major digitization campaigns, they are now easily browsable on the websites of archives and libraries. But exploring and understanding them requires new tools: the wealth of textual information remains largely inaccessible, while users expect to be able to search through manuscript resources in the same way as printed texts, using full-text search engines.

Within the framework of the European HIMANIS project, thanks to a close collaboration between heritage institutions, researchers in the humanities and social sciences, and researchers in artificial intelligence, the entirety of the 80,000 pages of the medieval registers of the French royal chancellery, known as the “Registres du Trésor des Chartes”, has been automatically transcribed and indexed for the first time.

The European HOME project, which follows on from HIMANIS, will make it possible to extend automatic transcription to documents in other medieval European languages (Latin, German, and Czech in addition to French), to develop methods for extracting named entities (dates, names of persons and places) and to process a very large number of charters and cartularies (already more than 700 000 images).

{{ tag_list(tags=["Document layout analysis (DLA)", "Handwritten text recognition (HTR)", "Named entity recognition (NER)", "Indexing"]) }}

Page from a chancery register of the HIMANIS project, 13th century. Archives nationales, JJ 007.
Medieval charter from the HOME project. National Archives of the Czech Republic, ACK 279.