Arkindex in practice: import IIIF images

This article will guide you through the steps necessary to import images from a IIIF server to Arkindex. You will learn how to communicate with the Arkindex Server via the API. You will also get a glimpse of how documents are organized in Arkindex. In the following article, we will assume you have a list of urls of images already uploaded on an IIIF server, that you want to import to Arkindex. We will use as an example the following list of images in the public domain, available on Teklia’s IIIF server.

public/GeorgeWashingtonPapers/0552.jpg
public/GeorgeWashingtonPapers/0550.jpg
public/GeorgeWashingtonPapers/0551.jpg
public/GeorgeWashingtonPapers/0549.jpg
public/GeorgeWashingtonPapers/0548.jpg
public/GeorgeWashingtonPapers/0547.jpg
public/GeorgeWashingtonPapers/0546.jpg

Requirements

pip3 install apistar  arkindex-client

Corpus id

Authentication

In order to be able to communicate with Arkindex via the API, you have to use a token to authenticate. Once registered, you can access your token at your_arkindex_url/api/v1/user/. You can then store it along with the arkindex url as environment variables:

export ARKINDEX_API_URL=your_arkindex_url
export ARKINDEX_API_TOKEN=your_token

You can use those variables to authenticate your requests using :

from arkindex import ArkindexClient, options_from_env
api_client = ArkindexClient(**options_from_env())

There are two main steps when importing images from a IIIF server to Arkindex. First, you have to create an Image object that represents the image in Arkindex. Then, you have to create elements linked to those images that will store related information (transcriptions, classifications, children elements…). Both image and element creation can be done by sending requests to the Arkindex server containing a JSON body describing the image or element to create. The documentation of Arkindex API is available at https://arkindex.gitlab.io/api-client/.

Image creation

Image creation is done by sending a request containing the image url on the IIIF server to the CreateIIIFImage endpoint. The server will respond with a JSON describing the image:

{
  "height": 0,
  "id": "string",
  "path": "string",
  "s3_url": "string",
  "server": {
    "display_name": "string",
    "max_height": 0,
    "max_width": 0,
    "url": "http://example.com"
  },
  "status": "checked",
  "url": "http://example.com",
  "width": 0
}

Attempts at creating an existing image will return an HTTP 400 error code containing the original image id. You can then send a request containing this id to the RetrieveImage endpoint to get the JSON describing the image.

from arkindex import ArkindexClient, options_from_env()
from apistar.exceptions import ErrorResponse

api_client = ArkindexClient(**options_from_env()) try: image = api_client.request('CreateIIIFImage', body={'url': 'image_url'} except ErrorResponse as e: if e.status_code != 400 or 'id' not in e.content: raise image = api_client.request('RetrieveImage', id=e.content['id'])

Element creation

Documents in Arkindex are organised by corpus. Each document is represented by an Element object. You can create a new Element by sending a request to the CreateElement endpoint with a JSON describing the element to create:

element = ark_client.request('CreateElement', body={
                'corpus': str(corpus_id),
                'type': element_type,
                'parent': doc_element['id'],
                'name': page_name,
                'image': image['id']
            })

(https://arkindex.gitlab.io/api-client/#operation/CreateElement)

The Arkindex server will respond to a successful Element creation request with an HTTP 200 containing a JSON payload describing the element.

Putting it together

Before importing our images, we have to check that the corpus accepts at least two types of elements: one correspnding to folders, and one for individual items. By default, a corpus accepts two types of elements, folder and page.

For this example, we will create an element of type page for every image that we want to import to Arkindex. We will group them as children of the same element of type folder.

The code is available in this file and you can use the sample list of images available here.

You can run the script using

python3 simplified_images_import.py --corpus  your_corpus_id --input-file washington_list.txt

After running this script, you should be able to visualize your newly imported images at your_arkindex_url/browse?corpus=your_corpus_id.