We are pleased to announce the release of Arkindex version 0.14.1, enabled on our demo instance
Postgis
Arkindex now relies on the postgis database extension for Postgresql. This will allow us to develop new features that will take advantage of its geographical computing capabilities.
We already use said features to bring some performances and data checks improvements regarding the image zones.
Features
GitLab repositories
Arkindex now checks and require that you have Maintainer or Owner access to a Gitlab repository when using the Add repository
feature. This is required to enable the webhook on the project.
You can always fork a repository to gain ownership.
Processes
We have worked a lot to improve the new workers system based on Git repositories.
It's now possible to see precisely which elements are used when starting a new Process. You can pick an element, then load all its children, and finally filter by a specific type (to pick all the Page
nested under a Volume
for example).
<embed alt="elements" embedtype="image" format="fullwidth" id="91"/>
This selection is directly used by the ML workers (simplifying our previous implementation, and thus avoiding bugs!).
You can again split the workload by chunks (up to 10), so that different workers run in parallel (this feature was missing in 0.14.0 in comparison with the previous workflow system)
The Docker image build has also been enhanced:
- build only once a
Dockerfile
if it's shared across worker versions - use Docker image ID to reference images instead of tags
- delete Docker image on the ponos hosts, once they are stored safely on our distribution platform
Finally, you can view which worker version has produced elements, transcriptions, classifications or entities, by hovering over the name of the worker. A small popup will appear with details.
API endpoints
- A new endpoint
CreateMLClass
allows you to create a new ML class on a corpus - Require zones for the endpoint
CreateElementTranscriptions
- Serialize the dataimport created when adding a repository, instead of returning only its ID
Corpora
- A
text_zone
element type is added by default to all new corpus - A new corpus is created for all new IIIF imports
Deprecations
- The endpoint
ListMLClasses
is deprecated in favor ofListCorpusMLClasses
. - The endpoint
DataImportElements
is deprecated in favor ofListProcessElements
. - The parameter
with_transcription_sources_count
on the endpointListElements
has been removed. - The frontend no longer exposes the previous workflow system
Bugfixes
- Replace assertions with conditions in registration and login APIs
- Allow a confidence of 0 in CreateClassification
- Preserve Python assert statements in binary build
- Avoid stale read while retrieving elements on a new Process
- Avoid stale read on newly created data import
- Avoid returning extra rows from ListElementNeighbors
- Fix Transkribus login process before validating user access
- Prevent unhandled exception warnings on incorrect Transkribus credentials
- Prevent updating an element's zone to an invalid image
- Properly display errors on login
- Prevent 0×0 image size warnings after a store-wide reset
- Prevent duplicate repository creation
- Add a thumbnails generation task to IIIF imports only
Performance
- The top level loading time have decreased by 30-40% after simplifying some internal queries.
Arkindex Base worker
- Base Arkindex api client has been updated to 1.0.2
- Automatic creation of missing ML classes
- Bugfix on score & confidence type invalid checks for falsey values