A new release is available for Arkindex instances. You can test it on our demo instance: https://demo.arkindex.org

Element deletion

One of the main focuses of that release was a move toward a unified and robust deletion experience:

  • users should be able to delete a list of elements
  • users should be able to delete a whole project
  • users should be able to delete a specific (and sometimes large) element, with all its sub-elements, transcriptions, ...

You can now do all of the above, using a few new features...

Delete all elements in the list

When browsing your elements, you can now delete the current list of elements, by using the top left Delete All button.

As for all deletions, a popup window will appear to confirm your choice.

Delete a single element

You can still use the Delete button in the Actions menu of an element, but we've added a quick-access button on the element list view.

Delete a selection

Finally, you can delete a selection of elements: choose your elements in a projects by using the ➕icon, then on the selection page, simply use the action to delete them.

Deletion progress

Each of these deletions will happen as a background job, and may take some time to be fully processed (depending on the job size, and the current load on the instance).

You can view the state of your deletion jobs in the top menu, a new icon will appear that opens a modal once clicked and lists all your jobs and their state.

Transcription type deprecation

This is the last step in our effort to simplify transcription creation and usage in Arkindex: the transcriptions API no longer requires setting a type (previously word, line, ...).

As transcriptions are now directly linked to a specific element, we rely on this element's type to get that information: a transcription set on an element of type Line is considered a line from now on.

Users group management

This release adds group management for users: this is done in preparation of a new access rights module that will be available in the next release.

As we've disabled this feature on our own instances, we'll provide more information in the next release doc.

Data import

File imports

You can now import multiple PDFs in two clicks (one to upload them, another to start the import!)

We've also worked on files imported at the top level of a project: now a folder is created for those specific imports, so that they are immediately visible when browsing a project.

Transkribus import

The Transkribus import just got some bugfixes and nice quality of life improvements:

  • Keep the original image name as metadata
  • Use worker version in the import (on elements, entities, transcriptions, ...)
  • Support missing transcriptions on elements
  • Reindex the whole Transkribus corpus instead of element by element
  • Fix the endpoint names and add tags in the API docs

API performance

We continue to work on the raw performance of our API. Here is a list of performance fixes for that release:

  • Reduce the amount of queries for RetrieveDataImport endpoint
  • Reduce the amount of queries for ListSelection endpoint
  • Avoid some queries in ListElementChildren
  • Prevent the frontend from calling ListElementEntities endpoint for folders

Processes

Agent list

You can now view all the Ponos agents available on your Arkindex instance, and their detailed hardware usage (CPU, RAM, and even GPU).

It's accessible from the Process list top right link: View agents.

Bugfixes on the workers and workflows

  • The type attribute on a Worker is now a non-limited string: it's purely used for display purposes (you are no longer limited to ner, dla, ...)
  • A new endpoint ListCorpusWorkerVersions is available, to list all the specific workers that produced some elements in a project.
  • The worker_version attribute is now available on all endpoints listing elements…
  • …and you can filter your elements in the frontend by worker version
  • Ensure unique elements are fetched by initialisation tasks (and avoid duplicate processing!)

GPU

The workflow system now supports GPUs that may be present on host machines. This feature is in beta, and only available to some specific workers for now.

More information will be available soon.

Bugfixes on the process agent

  • Retry image load to avoid ReadTimeout
  • Report already killed containers as stopped
  • Handle wrong value in the PID file
  • Do not log during exit stage to avoid log issues
  • Remove extra tasks from schedule
  • Support missing container id on task (and avoid crashing the agent!)

Frontend changes

  • We've updated the Bulma CSS framework to the latest available stable version, this brings some small changes to the display.
  • The ML stats modal is not available anymore, as we are transitioning away from DataSource