We are happy to announce that a new Arkindex release is available. You can explore Arkindex and try out the newest features on our demo instance, demo.arkindex.org.

Datasets

Now that the datasets feature is fully operational, and dataset processes can be used to train Machine Learning models, the old training processes have been removed entirely, both from the Arkindex frontend and the API.

The ListElementDatasets API endpoint now allows filtering by dataset set, and the RetrieveDataset endpoint returns an elements count for each dataset set, with no changes to the way dataset details are displayed by the Arkindex frontend. See the API documentation for more information.

dataset details

It is now possible to navigate between dataset elements. If an element belongs to a dataset, then on its details page, in the Dataset section of the details panel on the right, arrows are displayed which allow you to go to the previous and next elements in the same dataset and set.

dataset elements navigation

When creating a dataset process, it is now possible to select precisely which sets from a dataset to use, instead of the process always running on all the selected dataset's elements.

dataset sets in dataset processes

A new API endpoint, CreateDatasetElement, is now available and allows you to add a single element to a dataset.

Processes

The buttons in the top-right corner of the process status page have been transformed into an Actions dropdown menu, and from there you can now access a started or completed process's configuration.

see process configuration

When viewing a started or completed process's elements, the filters that were used to select these elements when the process was created are now displayed.

elements filters

When deleting a process from the processes list page, a confirmation modal now opens, instead of the process being deleted right away.

process deletion confirmation modal

Worker and model archival

Workers and Models can now be archived by users with administrator rights on the target worker or model. An archived worker or model can no longer be used in a Process.

worker archival

In the Arkindex frontend, the Archive worker button is available from the workers list, as well as on a worker's details page. Conversely, the Archive model button is available from the models list, as well as from a model's details page.

Exports

Project exports can now be deleted from the project exports management modal. A new API endpoint, DestroyExport, has been created.

project exports mgmt modal

Worker runs

In order to be able to use worker runs the way we previously used worker versions, as identifiers of how an object was produced on Arkindex, we have started to build a cache of worker runs in a project, much like the existing cache of worker versions. This cache stores the worker runs that are attached to currently existing objects in a project: if all the objects created by a given worker run are deleted, then that worker run will be removed from the cache.

A new API endpoint, ListCorpusWorkerRuns, has been created, and allows you to list, for a given project, the worker runs which have produced existing objects within that project.

The DestroyWorkerResults API endpoint now accepts the worker_run_id argument. When a worker run ID is specified, if any worker version, model version or configuration ID is also passed to the endpoint it will be ignored. See the API documentation for details.

Elements can now be filtered by worker run ID in the Arkindex frontend, through the filter bar.

worker run id filter bar

Miscellaneous frontend improvements

Markdown descriptions for Workers, Models, Model Versions and Datasets are now rendered in the Arkindex frontend.

markdown description

A copyable worker run ID is now displayed at the top of the worker run details modal.

worker run id in wr details modal

Element thumbnails are now correctly generated for elements that are less than 400 pixels high.

A child element's details are now being correctly displayed even if that element has no zone.

Command Line Interface

The handling of classifications, metadata and entities in the CLI's CSV export has been updated. See the CLI documentation for information about the CSV export's output.

A new export mode that outputs DocX documents has been added. See the CLI documentation for usage instructions and details.

Gitlab integration removal

As we do not need Gitlab integration anymore, it has been removed from the backend code, and the relevant views and references have been removed from the frontend as well.

Miscellaneous

  • Errors encountered by the Ponos agent when starting a container are now being correctly reported, for easier troubleshooting.
  • When a user tries to upload a ZIP archive bigger than 2GB, an explicit error message is displayed by the frontend.
  • The description of the PartialUpdateElement API endpoint has been updated to reflect that it can do a lot more than just rename the target element.