Callico: a Versatile Open-Source Document Image Annotation Platform

This paper presents Callico, a pioneering web-based open source platform
designed to simplify the annotation process in document recognition projects.
The move towards data-centric AI in machine learning and deep learning
underscores the importance of high-quality data, and the need for specialised
tools that increase the efficiency and effectiveness of generating such data.
For document image annotation, Callico offers dual-display annotation for
digitised documents, enabling simultaneous visualisation and annotation of
scanned images and text. This capability is critical for OCR and HTR model
training, document layout analysis, named entity recognition, form-based key
value annotation or hierarchical structure annotation with element grouping.
The platform supports collaborative annotation with versatile features backed
by a commitment to open source development, high-quality code standards and
easy deployment via Docker. Illustrative use cases - including the
transcription of the Belfort municipal registers, the indexing of French World
War II prisoners for the ICRC, and the extraction of personal information from
the Socface project's census lists - demonstrate Callico's applicability and
utility.