We are pleased to announce the release of Arkindex version 0.14.3, enabled on our main instance
The main focus of that release was to work on stability and performance, especially to make workflows a lot faster.
Listing elements for a process
ListProcessElements was first implemented in the last release, and we've worked on it even more this release, making it up to 30 times faster than before. This will speed up the first task of all new processes working on elements : the
init_elements task directly use that endpoint.
We have optimized a lot the SQL queries, but also reduced the data loaded, skipped a lot of intermediary requests and results parsing, and built a way more resilient and faster pagination algorithm.
This whole optimization process will be repeated on different critical endpoints until we have a really stable and efficient platform.
We've enhanced the Element deletion capability of Arkindex: you can now delete folders and their sub elements.
:warning: If you delete a folder, all its children will also be deleted immediately.
Some large folders may fail to delete for now, but we are actively working on an asynchronous solution to allow deletion of any dataset within Arkindex.
List newly created elements
An element's details page will now automatically reload its sub-element list when they are created by a workflow.
For example, when you navigate from a workflow status toward an element, if the workflow has created many lines, they will now appear without refreshing the page.
- Optimize the
ListWorkerVersionsendpoint (5x factor)
- Replace expensive
ListChildrenElements: this will speed up a lot the tree displayed on an element's details page.
New endpoint to create transcriptions
A new endpoint is available to create a list of transcriptions in one call : CreateTranscriptions.
You can create transcriptions on multiple (sub-)elements at once, making your Machine Learning Worker a lot faster.
:warning: It can only be used by workers (as it requires a
- Arkindex does not support the
sourcefield anymore on ML Results creation endpoints (in favor of the new
- The ListElements endpoint now requires a corpus. Its URL is now
- We have fixed a bunch of stale read issues (due to Postgresql clustering), mainly affecting machine learning result creation.
- Avoid duplicating a process' elements in the pagination
- Avoid duplicated processes when filtering with unscheduled state
We've changed the project creation visibility state: only Arkindex instance administrators can create public projects now.
This means that new projects created by users will be only visible to them.
In the next release, we'll introduce a new feature to share corpora across users on an instance.
The file import process has been updated:
- If you import a PDF file with multiple pages, a folder with its name will be created, containing all of its pages;
- If you import some Images or a single-page PDF, no folder will be created, they will be simply imported as
Page, with the name of the original file;
- The URL import is now available again;
- You can still select element types, but they are set in the
Advanced settingscollapsed section.
All the transcriptions are now visible on the details Panel. There are no transcriptions available from the tree on the left side of the screen: they are all set on an
Element from now on.
As a lot of new elements are automatically created to support the transcriptions, we've increased the number of automatically loaded sub-elements.
The workflow system now supports storing secret values (such as third party credentials). Those secrets values are set by the Arkindex instance administrators, and only usable by the tasks. No user can ever see those confidentials payloads.
Machine Learning engineers can easily require secrets values through a simple declaration in their workers
workers: - name: My Worker secrets: - project/X/google.json
This configuration will automatically retrieve the secret named
project/X/google.json and make it available to the worker (so it can connect to a Third Party service).
More information will be available soon on this website to implement that feature in your workers.
The agent running our tasks now support an automated garbage collector, that will delete useless Docker payloads (containers & images that are not used anymore). This allows us to keep more of our system's disk space available.
A few updates are available in the version 0.1.9 of arkindex-base-worker:
- Reload known ML classes when an error is received on creation,
- Add helpers to retrieve and cache worker versions.
- Fix a bug for complex graphs (Deduplicate parents),
- Prevent UTF-8 decoding errors on task logs,
- Fix a bug to prevent infinitely pending tasks,
- Fix a bug that prevented some
Stopactions to be effective, thus running unwanted tasks.
Our open source Python API client has been updated to version 1.0.4, and now offers a resillient pagination support (retry on errors, support backend down, and configurable missing data).
More information is available on the PyPi homepage of the project