When Automatic Document Processing meets Egyptian History

Exploring the remains of a village in Ancient Egypt

One hundred years ago, from 1922 to 1952, a crucial campaign of archeological research has been conducted in Deir el-Medina (Egypt), a village where residents were all involved in the construction and decoration of tombs and funerary temples of New Kingdom’s Pharaohs. The leader of the excavations, the French archeologist Bernard Bruyere meticulously kept detailed record of the discoveries that were made, regarding the everyday life of this community from the social, professional, and religious points of view.

Ancient artisans’ village Deir el-Medina close to Luxor in Upper Egypt
Ancient artisans’ village Deir el-Medina close to Luxor in Upper Egypt

Four handwritten notebooks have been filled out by the end of the campaign. These have all been digitized and are now available for consultation on the website of the French Institute of Oriental Archeology (IFAO) based in Cairo. Teklia has been selected by the IFAO to train Deep Learning models for an automatic processing of the document, including a transcription of the notebooks and to provide a platform where the recognized content could be classified and indexed.

Training Deep Learning models for Handwriting text recognition with Arkindex

Teklia's document processing platform Arkindex is the main tool used on this project. The thousand pages of notes thoroughly taken by Mr Bruyère had to be processed with a high level of accuracy and efficiency.

This project needed the training of specific models to detect the lines on the scanned pages, and to recognize the type of element that was written or drawn on the notebooks. Then the model could be trained to transcribe and index the writings, while creating a digital library of data which IFAO members are free to consult on the platform. In order to generate some ground truth and increase the accuracy of the models, archeologists and professionals from the IFAO are able to access the platform and annotate the documents, helping with the training of the models into performing lines detection and handwritten text recognition (HTR).

The models first needed to be trained into detecting lines of text.
The models first needed to be trained into detecting lines of text.

Dealing with the diversity of elements to detect and recognize

Throughout this important archeological campaign, Mr Bruyère was committed to provide future generations of Egyptologists with as much information as possible. The complete, and yet thoroughly organized sets of notes, truly bear witness to the Egyptian History. These include text, sketches, and other illustrations from the hand of the lead archeologist, which can’t be separated from the rest of the notes. Therefore, the automatic handwritten text recognition process needed to be completed by the knowledge of IFAO members, when dealing with the annotation of illustrations.

Examples of pages which included both text lines and illustrations.
Examples of pages which included both text lines and illustrations.

Combining HTR with Illustration annotations

Once the models were trained, the HTR process on the notebooks appeared to be quite fast and executed with an error rate of 4.5% on the characters. Arkindex is indeed specialized in the training of Deep Learning models for handwritten text recognition, delivering accurate results of transcription from a scanned page in a few seconds. However, the project required more than an automatic processing of the document, especially when it came to image recognition, in the middle of lines of text.

Bounding boxes surrounding both text lines and illustrations.
Bounding boxes surrounding both text lines and illustrations.

Thus, archeologists from the IFAO were given access to the library of data gathered by Arkindex on this project, in order to manually annotate all the illustrations, based on their own metadata.

Example of manual annotation brought by IFAO
Example of manual annotation brought by IFAO

This winning combination of Artificial Intelligence and Human Knowledge allows Teklia to come up with the most accurate indexing of the notebooks.

Example of a search both on full text and metadata in the collection
Example of a search both on full text and metadata in the collection

What's to be followed?

As of today, the handwriting text recognition has been executed on the entire collection of pages and needs to be completed by future annotations of the various sketches. The next step in this project for Teklia is to build a website and supply it with all the data gathered on Arkindex for this project, in order to create a form of virtual exhibition introducing Mr Bruyère's campaign, and celebrating its Centenary.

Image credits:

  • Djehouty, CC BY-SA 4.0 , via Wikimedia Commons
  • IFAO - Institut français d'archéologie orientale