We are happy to announce that a new Arkindex release is available. You can explore Arkindex and try out the newest features on our demo instance, demo.arkindex.org.
Technical release notes for developers and instance administrators are available here.
Entity table removal
In Arkindex, there were two distinct concepts of entities and transcription entities. Transcription entities are the mention of an entity within a transcription, and an entity can be mentioned multiple times, while also being linked to metadata for special cases where an element refers to an entity without doing so through any transcription.
However, we have found that the vast majority of the projects cannot group entities together, thus only have one transcription entity per entity. Entity names were the same as the text found on transcription entities, and other entity attributes were not used. This means a lot of data was duplicated, taking a lot of disk space and slowing down any worker that produces transcription entities. Maintaining the link with both transcription entities and metadata at once was also too high a cost compared to its very low usage.
In this release, we are removing entities entirely, leaving only transcription entities. The breaking changes are as follows:
- Transcription entities are now linked to entity types instead of entities.
- Metadata cannot be linked to entities anymore, so
entity_id
is now longer available on metadata. - The entities list and entity details pages are removed from the frontend.
- The
ListCorpusEntities
,CreateEntity
,RetrieveEntity
,UpdateEntity
,PartialUpdateEntity
,DestroyEntity
andListEntityElements
API endpoints are removed. - The
ListTranscriptionEntities
API endpoint cannot filter by entity worker run or entity worker version anymore, and returns the entity type astype
instead of the entity asentity
. - The
CreateTranscriptionEntity
API endpoint now expects an entity type ID astype_id
instead of an entity ID asentity_id
. - The
CreateTranscriptionEntities
API endpoint now expects a list oftranscription_entities
instead ofentities
, and returns a list of UUIDs instead of a list of objects with both the entity and transcription entity IDs. - The
SearchCorpus
API endpoint now works with transcription entities and not entities. - The
entity
table has been removed from database exports. - The Arkindex CLI's entity export now exports transcription entities instead of entities.
Of course you can still produce and store entities in Arkindex as before: their internal representation is only becoming simpler and more efficient.
On Arkindex instances that use a significant amount of workers producing transcription entities, we have found that this removal can reduce the database size by up to 40%. This removal has also allowed us to simplify various complex database queries, improving performance on project, element or worker results deletion as well as transcription entity APIs.
This removal may take a long time to execute and may require manual intervention from a system administrator when upgrading to this new Arkindex release. We encourage administrators to review the technical release notes carefully.
Processes
We have made some improvements to processes and task execution to make them easier to understand and avoid common mistakes.
On Workers processes, the Load children filter can now take three different values, to allow selecting no children, only direct children, or all children recursively. This matches what the element navigation performs more precisely.

With this change, we have also reworked the way in which we list elements on processes, which could bring performance improvements and make the initialization tasks faster on most processes.
To make it easier to understand how the TTL (time-to-live) of a task impacts it, the execution time is now shown in red when it exceeds the TTL, and a warning is shown when a task enters a Cancelled state after having exceeded that limit.

Additionally, we fixed an issue that caused processes with any retried tasks to be excluded when listing processes filtered by state.
Jobs
Background jobs such as deletions, search indexations or database exports can now be stopped while they are running. This can give more opportunities to avoid accidental deletions.

Multipart uploads
New APIs have been introduced to support uploading files and model versions in multiple chunks. This allows for very large files and models to be uploaded, even on networks with lower reliability. The Arkindex CLI can use those APIs through the new arkindex upload model_version
and arkindex upload data_file
commands.
You can now upload and use large vision models (usually based on LLMs) without storing them on external resources.
Worker results deletion
The worker results deletion was built before transcription entities had their own independent link to WorkerRuns. For this reason, transcription entities were only deleted when their transcription matched the specified WorkerRun. In this release, the deletion will now also target transcription entities that are directly linked to the WorkerRun.
To prevent accidental deletions, the frontend will now also prevent any worker results deletion from happening until all filters are removed in the elements navigation, as those filters cannot be applied to the deletion.

Worker configurations
We have continued to work on the new worker configuration format, particularly in building the new forms that will enable users to fill in each of the configuration fields. As this will come with more precise validation, we will be using a new format for API errors that will help the new form to attribute each error to each field and make it easier to resolve any errors before saving a configuration or starting a process.
In this release, while the form is not yet available, this new API error format will be made available on CreateWorkerConfiguration
, UpdateWorkerConfiguration
and PartialUpdateWorkerConfiguration
. In the long term, we hope to be able to gradually replace all errors on all API endpoints with this new format to make it easier for all API users to handle errors, and make error messages more precise on our frontend.
Misc
- A padlock or globe icon is now shown when listing models and viewing their details, to show whether they are public.
- When an error occurs while adding an element type to a project, an error notification is now shown only once and not twice.
- In the worker results deletion modal, clicking on a configuration's name now shows its details instead of a placeholder.