Once you've collected a number of documents into your Arkindex project, exporting that data into another format for sharing or publishing can be very helpful. However, while the SQLite database generated by Arkindex's Export button is flexible, allowing you to generate views on your data beyond Arikindex's API, it's still not the most convenient option for common use cases.

That's why, in version 0.2.0 of the open source Arkindex command-line tool, arkindex-cli, we've included a new arkindex export command. This command allows you to convert an Arkindex export to a series of PDF or ALTO files.

Using the arkindex export CLI tool

Assuming that pip is installed on your system and that the defaults fit your project, using the new Arkindex export feature involves just a few steps:

  1. Export a project from Arkindex to an SQLite database, and download it.
  2. Install the CLI, and connect to your Arkindex instance:
pip install arkindex-cli[export]
arkindex login --host arkindex.example.com
  1. Finally, run the exports:
arkindex export /path/to/export.sqlite pdf
arkindex export /path/to/export.sqlite alto

You can use arkindex export --help to check out the various settings you can use to fine-tune an export. For more details, see the export command documentation.

PDF Export

The PDF export will generate one PDF file for each folder element, containing all of its pages and the recognized text lines. The text lines are selectable and searchable on each page using standard PDF tools.

search
Searching on an exported PDF

With this new enhanced PDF export feature, Arkindex users will find the common PDF import-OCR-export workflow---that is, importing a PDF, performing text recognition on it, and exporting to a PDF with selectable text---extremely simple.

For greater visibility of the identified lines of text in the exported PDF, run Arkindex with the --debug argument. The resulting argument will highlight text lines in the export:

debug
An exported PDF in debug mode

ALTO Export

The ALTO export includes as many of the ALTO XML tags that were compatible with the Arkindex structure, including which Workers have been used to transcribe each line and the confidence score they reported.

Teklia's commitment to open source

Teklia is committed to contributing back to the open-source and academic community which has supported and collaborated with us while we do our work. To that end, Arkindex CLI is open-source. You can browse the code, report bugs or submit patches on GitLab.