We are pleased to announce the release of Arkindex version 0.14.1, enabled on our main instance

Postgis

Arkindex now relies on the postgis database extension for Postgresql. This will allow us to develop new features that will take advantage of its geographical computing capabilities.

We already use said features to bring some performances and data checks improvements regarding the image zones.

Features

GitLab repositories

Arkindex now checks and require that you have Maintainer or Owner access to a Gitlab repository when using the Add repository feature. This is required to enable the webhook on the project.

You can always fork a repository to gain ownership.

Processes

We have worked a lot to improve the new workers system based on Git repositories.

It's now possible to see precisely which elements are used when starting a new Process. You can pick an element, then load all its children, and finally filter by a specific type (to pick all the Page nested under a Volume for example).

This selection is directly used by the ML workers (simplifying our previous implementation, and thus avoiding bugs !).

You can again split the workload by chunks (up to 10), so that different workers run in parallel (this feature was missing in 0.14.0 in comparison with the previous workflow system)

The Docker image build has also been enhanced:

  • build only once a Dockerfile if it's shared across worker versions
  • use Docker image ID to reference images instead of tags
  • delete Docker image on the ponos hosts, once they are stored safely on our distribution platform

Finally, you can view which worker version has produced elements, transcriptions, classifications or entities, by hovering over the name of the worker. A small popup will appear with details.

API endpoints

  • A new endpoint CreateMLClass allows you to create a new ML class on a corpus
  • Require zones for the endpoint CreateElementTranscriptions
  • Serialize the dataimport created when adding a repository, instead of returning only its ID

Corpora

  • A text_zone element type is added by default to all new corpus
  • A new corpus is created for all new IIIF imports

Deprecations

  • The endpoint ListMLClasses is deprecated in favor of ListCorpusMLClasses.
  • The endpoint DataImportElements is deprecated in favor of ListProcessElements.
  • The parameter with_transcription_sources_count on the endpoint ListElements has been removed.
  • The frontend no longer exposes the previous workflow system

Bugfixes

  • Replace assertions with conditions in registration and login APIs
  • Allow a confidence of 0 in CreateClassification
  • Preserve Python assert statements in binary build
  • Avoid stale read while retrieving elements on a new Process
  • Avoid stale read on newly created data import
  • Avoid returning extra rows from ListElementNeighbors
  • Fix Transkribus login process before validating user access
  • Prevent unhandled exception warnings on incorrect Transkribus credentials
  • Prevent updating an element's zone to an invalid image
  • Properly display errors on login
  • Prevent 0×0 image size warnings after a store-wide reset
  • Prevent duplicate repository creation
  • Add a thumbnails generation task to IIIF imports only

Performance

  • The top level loading time have decreased by 30-40% after simplifying some internal queries.

Arkindex Base worker

  • Base Arkindex api client has been updated to 1.0.2
  • Automatic creation of missing ML classes
  • Bugfix on score & confidence type invalid checks for falsey values