Teklia has developed for its clients automatic document understanding systems based on Machine Learning and Deep Learning in a wide range of application domains.


For the French National Archives, We have developed a transcription platform for handwritten indexes processing, assisted by automatic handwriting recognition and entity extraction. Automatic processing of 800,000 records and 100,000 pages of registers.

Handwriting Entities

Within the framework of the collaborative research project Hugin-Munin funded by the Research Council of Norway, TEKLIA is developing adaptive techniques for the recognition of handwritten documents in Norwegian.


For the Balsac project, TEKLIA has performed document structure analysis, handwritten text recognition and personal information extraction in 2.7 million parish record pages from Quebec between 1880 and 1920.

Handwriting Entities Acts

Automatic structure analysis of tables, printed and handwritten text recognition, validation with crowdsourcing using Callico.

OCR Meta-data Table analysis

OCR and extraction of meta-data from 500,000 printed index cards.

OCR Meta-data

OCR improvement for the project Parlementary archives of the French revolution.


Data extraction, classification and summarization of case law decisions.


For the National Archives of the Nederlands, TEKLIA has developed page classification models for processing documents from the archives of the Ministry of Colonies from 1814 to 1849.

Classification Handwriting

TEKLIA is collaborating with IRHT since several years to develop solutions for the processing of medieval handwritten documents, within the framework of the HORAE and HOME projects.

Handwriting Entities Acts

TEKLIA contributes to the development of new features for the eScriptorium project: development of a search engine, setting up user quotas, tracking Machine Learning tasks, etc. TEKLIA also provides system administration for several major eScriptorium instances.


Development of the Fuzzing platform (for automated vulnerability detection) and of a tool for automatic classification of patch sets for the Firefox software, to reduce CI costs.


TEKLIA collaborates with Hopital Necker-Enfants Malades in the Macadamia project to develop a plateform for named entity recognition and numerical information extraction from medical records.


Evaluation of document search engines and technologies for automatic summarisation and classification of documents based on contextual embeddings. Performance analysis of search engines (ElasticSearch, OpenSearch).


Normalization and organization of internal control procedures. Text distance, concept extraction, word embeddings.


Unstructured and textual information extraction from product test reports using machine learning and OCR. Training of named-entity extraction models, incremental and active learning.


Automatic prediction of priority levels from complaint mail through semantic analysis. Text recognition, topic detection, document classification.


Clustering of IT tickets by topics, automatic tickets classification and triaging through density-based spatial clustering, keyword extraction and classification.


Automatic extraction of financial information in scanned invoices.


Automatic redaction of confidential information in fiscal forms. Automatic processing (document clustering, document classification) of a 2.5 million archive documents.


Automatic extraction of construction rules from local urban planning documents (Plan locaux d'urbanisme).