Québec's parish registers, held in the Bibliothèque et Archives nationales du Québec (BAnQ), have for more than 40 years been manually transcribed and processed by the researchers of the Balsac project, currently headed by Hélène Vézina, professor at the Université du Québec à Chicoutimi (UQAC). This analysis began with the oldest registers, but as the population of Quebec grew with time, manual processing proved to be a very long and tedious task.

In order to speed up the processing of baptism, marriage and death records for the period 1850-1920, TEKLIA was tasked with developing a complete handwriting recognition and information extraction system. After a classification of page types (act/non-act) and an automatic detection of text lines, a custom handwriting recognition system performed the complete transcription of the acts, on which a named entity extraction system identified dates, persons (subjects, parents, relatives, witnesses) and places. Finally, the information was grouped by act and the acts were typed (birth, marriage, death). TEKLIA then delivered all the information to the researchers of the Balsac project who carried out the matching of individuals with their database.

Over the course of this project, TEKLIA processed 2.6M pages and extracted 5.5M individual records.

See this blog post for a full description of the automatic process.

{{ tag_list(tags=["Document layout analysis (DLA)", "Page classification", "Handwritten text recognition (HTR)", "Indexing", "Named entity recognition (NER)"]) }}

balsac-project
Parish register page from the Balsac project. BAnQ, CE102S08-1900.