TEKLIA's Open-source contributions

Arkindex

Arkindex is TEKLIA's platform for managing and processing large collections of digitised documents. We have been actively developing Arkindex since 2019 and use it intensively in all our projects.

Callico

Callico is the annotation and validation platform for digitised documents developed by TEKLIA. We use it in all our projects to generate training data for our Deep Learning models. It is available as open source.

Callico on GitLab
Callico Documentation
Callico paper

Deep Learning libraries and tools

we publish and maintain our code as open source on Gitlab.

Doc-UFCN, a library for detecting objects in scanned documents. See it on PyPi and our GitLab
PyLaia, a handwriting recognition library. See it on PyPi and our Gitlab
Nerval, a named entity extraction evaluation library. See on GitLab
DISS, a document image segmentation scoring library. See it on GitLab

Open deep learning models

we publish our models in free access on HuggingFace

Handwriting recognition models for PyLaia
Document Layout Analysis models for Doc-UFCN
Named entity recognitions models for spaCy

Data tools

Transkribus client and PAGE XML parser
Virtual keyboard as a web extension for eScriptorium

Arkindex tools

Open-source tools to interact with Arkindex, the document processing platform

Arkindex command line client: a command line interface to Arkindex instance. See it on PyPi and GitLab
Arkindex API client: a python library to communicate with Arkindex API. See it on PyPi and GitLab
Arkindex Export: a library for exploring and using Arkindex exports in sqlite format. See it on PyPi and GitLab
Arkindex base worker: a base class for integrating processing algorithms in Arkindex. See it on PyPi and GitLab

Public Databases

We publish ready to use datasets on HugginFace
The RIMES database: Handwritten documents in French
NorHand: a dataset for handwritten text recognition in Norwegian. See our paper.
SIMARA: a dataset of handwritten index cards.