When Automatic Document Processing meets Egyptian History
Exploring the remains of a village in Ancient Egypt
One hundred years ago, from 1922 to 1952, a crucial campaign of archeological research has been conducted in Deir el-Medina (Egypt), a village where residents were all involved in the construction and decoration of tombs and funerary temples of New Kingdom’s Pharaohs. The leader of the excavations, the French archeologist Bernard Bruyere meticulously kept detailed record of the discoveries that were made, regarding the everyday life of this community from the social, professional, and religious points of view.
Four handwritten notebooks have been filled out by the end of the campaign. These have all been digitized and are now available for consultation on the website of the French Institute of Oriental Archeology (IFAO) based in Cairo. Teklia has been selected by the IFAO to train Deep Learning models for an automatic processing of the document, including a transcription of the notebooks and to provide a platform where the recognized content could be classified and indexed.
Training Deep Learning models for Handwriting text recognition with Arkindex
Teklia's document processing platform Arkindex is the main tool used on this project. The thousand pages of notes thoroughly taken by Mr Bruyère had to be processed with a high level of accuracy and efficiency.
This project needed the training of specific models to detect the lines on the scanned pages, and to recognize the type of element that was written or drawn on the notebooks. Then the model could be trained to transcribe and index the writings, while creating a digital library of data which IFAO members are free to consult on the platform. In order to generate some ground truth and increase the accuracy of the models, archeologists and professionals from the IFAO are able to access the platform and annotate the documents, helping with the training of the models into performing lines detection and handwritten text recognition (HTR).
Dealing with the diversity of elements to detect and recognize
Throughout this important archeological campaign, Mr Bruyère was committed to provide future generations of Egyptologists with as much information as possible. The complete, and yet thoroughly organized sets of notes, truly bear witness to the Egyptian History. These include text, sketches, and other illustrations from the hand of the lead archeologist, which can’t be separated from the rest of the notes. Therefore, the automatic handwritten text recognition process needed to be completed by the knowledge of IFAO members, when dealing with the annotation of illustrations.
Combining HTR with Illustration annotations
Once the models were trained, the HTR process on the notebooks appeared to be quite fast and executed with an error rate of 4.5% on the characters. Arkindex is indeed specialized in the training of Deep Learning models for handwritten text recognition, delivering accurate results of transcription from a scanned page in a few seconds. However, the project required more than an automatic processing of the document, especially when it came to image recognition, in the middle of lines of text.
Thus, archeologists from the IFAO were given access to the library of data gathered by Arkindex on this project, in order to manually annotate all the illustrations, based on their own metadata.
This winning combination of Artificial Intelligence and Human Knowledge allows Teklia to come up with the most accurate indexing of the notebooks.
What's to be followed?
As of today, the handwriting text recognition has been executed on the entire collection of pages and needs to be completed by future annotations of the various sketches. The next step in this project for Teklia is to build a website and supply it with all the data gathered on Arkindex for this project, in order to create a form of virtual exhibition introducing Mr Bruyère's campaign, and celebrating its centenary.
- Djehouty, CC BY-SA 4.0 , via Wikimedia Commons
- IFAO - Institut français d'archéologie orientale