Arkindex Release 0.14.3

We are pleased to announce the release of Arkindex version 0.14.3, enabled on our main instance

Performance

The main focus of that release was to work on stability and performance, especially to make workflows a lot faster.

Listing elements for a process

The ListProcessElements was first implemented in the last release, and we’ve worked on it even more this release, making it up to 30 times faster than before. This will speed up the first task of all new processes working on elements : the init_elements task directly use that endpoint.

We have optimized a lot the SQL queries, but also reduced the data loaded, skipped a lot of intermediary requests and results parsing, and built a way more resilient and faster pagination algorithm.

This whole optimization process will be repeated on different critical endpoints until we have a really stable and efficient platform.

Deletion

We’ve enhanced the Element deletion capability of Arkindex: you can now delete folders and their sub elements.

⚠️ If you delete a folder, all its children will also be deleted immediately.

Some large folders may fail to delete for now, but we are actively working on an asynchronous solution to allow deletion of any dataset withing Arkindex.

List newly created elements

An element’s details page will now automatically reload its sub-element list when they are created by a workflow.

For example, when you navigate from a workflow status toward an element, if the worklfow has created many lines, they will now appear without refreshing the page.

Other achievements

API endpoints

New endpoint to create transcriptions

A new endpoint is available to create a list of transcriptions in one call : CreateTranscriptions.

You can create transcriptions on multiple (sub-)elements at once, making your Machine Learning Worker a lot faster.

⚠️ It can only be used by workers (as it requires a worker_version ID).

Breaking changes

  1. Arkindex does not support the source field anymore on ML Results creation endpoints (in favor of the new worker_version).
  2. The ListElements endpoint now requires a corpus. Its URL is now /api/v1/corpus/{corpus}/elements/ instead of /api/v1/elements/?corpus={corpus}.

Fixes

Features

Project creation

We’ve changed the project creation visibility state: only Arkindex instance administrators can create public projects now.

This means that new projects created by users will be only visible to them.

In the next release, we’ll introduce a new feature to share corpora across users on an instance.

File imports

The file import process has been updated:

Transcriptions

All the transcriptions are now visible on the details Panel. There are no transcriptions available from the tree on the left side of the screen: they are all set on an Element from now on.

As a lot of new elements are automatically created to support the transcriptions, we’ve increased the number of automatically loaded sub-elements.

Workflows

Secrets

The workflow system now supports storing secret values (such as third party credentials). Those secrets values are set by the Arkindex instance administrators, and only usable by the tasks. No user can ever see those confidentials payloads.

Machine Learning engineers can easily require secrets values through a simple declaration in their workers .arkindex.yml configuration:

workers:
 - name: My Worker
   secrets:
	  - project/X/google.json

This configuration will automatically retrieve the secret named project/X/google.json and make it available to the worker (so it can connect to a Third Party service).

More information will be available soon on this website to implement that feature in your workers.

Garbage collector

The agent running our tasks now support an automated garbage collector, that will delete useless Docker payloads (containers & images that are not used anymore). This allows us to keep more of our system’s disk space available.

Base worker

A few updates are available in the version 0.1.9 of arkindex-base-worker:

Fixes

API client

Our open source Python API client has been updated to version 1.0.4, and now offers a resillient pagination support (retry on errors, support backend down, and configurable missing data).

More information is available on the PyPi homepage of the project