Arkindex
Arkindex is TEKLIA's platform for managing and processing large collections of digitised documents. We have been actively developing Arkindex since 2019 and use it intensively in all our projects.
- Arkindex source code on Gitlab
- Arkindex description
- Arkindex documentation
- Contribute code to Arkindex
- Self-host Arkindex
- Arkindex tutorials on Youtube
Callico
Callico is the annotation and validation platform for digitised documents developed by TEKLIA. We use it in all our projects to generate training data for our Deep Learning models. It is available as open source.
- Callico on GitLab
- Callico Documentation
- Callico paper
Deep Learning libraries and tools
we publish and maintain our code as open source on Gitlab.
- Doc-UFCN, a library for detecting objects in scanned documents. See it on PyPi and our GitLab
- PyLaia, a handwriting recognition library. See it on PyPi and our Gitlab
- Nerval, a named entity extraction evaluation library. See on GitLab
- DISS, a document image segmentation scoring library. See it on GitLab
Open deep learning models
we publish our models in free access on HuggingFace
- Handwriting recognition models for PyLaia
- Document Layout Analysis models for Doc-UFCN
- Named entity recognitions models for spaCy
Data tools
- Transkribus client and PAGE XML parser
- Virtual keyboard as a web extension for eScriptorium
Arkindex tools
Open-source tools to interact with Arkindex, the document processing platform
- Arkindex command line client: a command line interface to Arkindex instance. See it on PyPi and GitLab
- Arkindex API client: a python library to communicate with Arkindex API. See it on PyPi and GitLab
- Arkindex Export: a library for exploring and using Arkindex exports in sqlite format. See it on PyPi and GitLab
- Arkindex base worker: a base class for integrating processing algorithms in Arkindex. See it on PyPi and GitLab
Public Databases
- We publish ready to use datasets on HugginFace
- The RIMES database: Handwritten documents in French
- NorHand: a dataset for handwritten text recognition in Norwegian. See our paper.
- SIMARA: a dataset of handwritten index cards.