Arkindex release 1.6.2

We are happy to announce that a new Arkindex release is available. You can explore Arkindex and try out the newest features on our demo instance, demo.arkindex.org.

Technical release notes for developers and instance administrators are available here.

Datasets

A new Populate dataset action on projects and elements allows to populate an empty dataset with a random sample of elements, without needing to use the Arkindex CLI. It provides the same options as the CLI's ml-splits command.

In addition, datasets in a Complete or Error state can now be re-opened using the Reopen button on the dataset details page. This removes any generated artifacts and allows to edit the dataset before building it again.

Finally, the default dataset set names have been updated to train, dev and test, and are now consistent between the frontend, the API and the CLI.

Processes

Several improvements have been made to multiple aspects of process execution and management in this release.

GPU management

Whether or not a GPU will be used is now configurable on each worker in a process, instead of the whole process. Workers that require a GPU will always use a GPU, and workers that do not support a GPU will not. For workers that support GPU usage, but do not require it, users are now free to choose whether or not to use one.

Worker configurations

When creating a worker configuration, only the required fields are now displayed by default. This makes workers with a large amount of options more user-friendly, by letting users focus on what they need to do to run the worker without having to understand every advanced option.

Additionally, configuration fields that allow to select a model, mainly used for training workers, now use a modal instead of a text field with a list of suggestions. This makes browsing the models easier and solves some user interface bugs, particularly with large configurations.

Task execution

Various issues have been fixed on tasks running in Community Edition:

The Restart task feature, which runs a new task without running the whole process again, now properly runs the new task.
Stopping a task marks it as Stopped and not Failed.
Tasks now only start after any selected model versions are fully downloaded, rather than before.

Additionally, the API endpoints used to manage tasks have been simplified. The RetrieveTaskFromAgent endpoint has been renamed RetrieveTask, and UpdateTaskFromAgent and PartialUpdateTaskFromAgent have been merged into the existing UpdateTask and PartialUpdateTask endpoints.

Misc

When restarting a task with the Restart task button, the original task that got restarted is now displayed:

The Restart task button is now disabled when a task has already been restarted. You will need to restart the newer task instead.
In Enterprise Edition, restarting a task now creates a task without an associated agent and GPU, allowing the task to be assigned to any other agent in the same farm.
Process names can now be up to 250 characters long, and errors are now properly displayed when they occur while renaming a process.
Creating a process from failed worker activities now runs asynchronously, allowing to create processes from a much larger amount of failures.

Imports

IIIF imports have now been merged into file imports. It is now possible to import images, PDF files, Transkribus collection exports, IIIF manifests, and archives containing any of those, all at once in the same process.

Continuing on our work to convert internal Arkindex tasks to workers, S3 imports now run using a separate worker. The upgrade notes contain some notes about this change for instance administrators.

User management

The profile page has been updated to allow editing your display name and changing your password. The API token is also hidden by default to prevent any leaks during screenshares.

The registration and email verification process has been improved. Users that did not receive the verification email now have the option to send a new one. When clicking the confirmation link in an email, errors are now displayed more clearly.

Finally, users that have been registered without a password through the API, and thus cannot login normally, get a warning and an invite to set their password through the new profile page.

Cleanup

We have continued our efforts to improve consistency and remove deprecated features from Arkindex:

The long-deprecated worker version IDs have been removed from the APIs and from SQLite exports. This means that old Machine Learning results created before Arkindex 1.4.0 will now appear as if they were created manually, instead of by a worker version.
Worker versions no longer have Docker image artifacts associated with them. These were only used by Git imports, which had been removed in Arkindex 1.6.0.
Git repositories no longer have access rights associated with them. Any existing access rights have been transferred to every worker linked to repositories.
Classification confidences are no longer optional. Any classification without a confidence set now has its confidence score set to 1.
The unique identifier for transcription entities is now an UUID rather than an integer, to be consistent with every other Machine Learning result.

CLI

A new PAGE XML export command is now available.
In the entities export, the confidence score of transcription entities is now also exported.
In the arkindex elements link command, fatal API errors are now more clearly displayed.

Misc

The ListElements, ListElementChildren and ListElementParents API endpoints now provide a with_transcriptions option, allowing all transcriptions on each element to be fetched similarly to with_metadata or with_classes.
The Delete worker results action now also deletes entities when it runs on a project.
Project exports that fail because of a database connection issue should now be properly shown as Failed rather than still be shown as Running.
The feedback button in the footer has been replaced by a link to our new support forum.
IIIF image checks now use an Arkindex-specific user agent string (Arkindex/1.6.2 (+https://teklia.com/)) rather than a generic Requests one. This can solve issues when importing images from servers that block web scraping.