This article will guide you through the steps necessary to import images from a IIIF server to Arkindex. You will learn how to communicate with the Arkindex Server via the API. You will also get a glimpse of how documents are organized in Arkindex. In the following article, we will assume you have a list of urls of images already uploaded on an IIIF server, that you want to import to Arkindex. We will use as an example the following list of images in the public domain, available on Teklia's IIIF server.

public/GeorgeWashingtonPapers/0552.jpg
public/GeorgeWashingtonPapers/0550.jpg
public/GeorgeWashingtonPapers/0551.jpg
public/GeorgeWashingtonPapers/0549.jpg
public/GeorgeWashingtonPapers/0548.jpg
public/GeorgeWashingtonPapers/0547.jpg
public/GeorgeWashingtonPapers/0546.jpg

Requirements

  • python packages :
pip3 install apistar  arkindex-client
  • list of iiif images urls of images already uploaded on a IIIF server: your_list.txt (for example washington_list.txt)

  • url of the instance of Arkindex you want to upload the image to: your_arkindex_url (for example https://arkindex.teklia.com/)

  • url of the IIIF server: your_iiif_url (for example https://iiif.teklia.com/main/iiif/2/)

  • your Arkindex API token: you can find it on you Akindex instance at the address your_arkindex_url/api/v1/user/ (for example https://arkindex.teklia.com/api/v1/user/)

  • on the Arkindex instance: a user with "write" access to a corpus, for example "Washington_Test"

  • the id of the corpus: the corpus id can be found under the corpus name on the "Corpora" page, on the following example, it is 4ac9979c-e40b-4cf8-80e4-74cde1f059b3.

Corpus id

Authentication

In order to be able to communicate with Arkindex via the API, you have to use a token to authenticate. Once registered, you can access your token at your_arkindex_url/api/v1/user/. You can then store it along with the arkindex url as environment variables:

export ARKINDEX_API_URL=your_arkindex_url
export ARKINDEX_API_TOKEN=your_token

You can use those variables to authenticate your requests using :

from arkindex import ArkindexClient, options_from_env
api_client = ArkindexClient(**options_from_env())

There are two main steps when importing images from a IIIF server to Arkindex. First, you have to create an Image object that represents the image in Arkindex. Then, you have to create elements linked to those images that will store related information (transcriptions, classifications, children elements...). Both image and element creation can be done by sending requests to the Arkindex server containing a JSON body describing the image or element to create. The documentation of Arkindex API is available at https://arkindex.gitlab.io/api-client/.

Image creation

Image creation is done by sending a request containing the image url on the IIIF server to the CreateIIIFImage endpoint. The server will respond with a JSON describing the image:

{
  "height": 0,
  "id": "string",
  "path": "string",
  "s3_url": "string",
  "server": {
    "display_name": "string",
    "max_height": 0,
    "max_width": 0,
    "url": "http://example.com"
  },
  "status": "checked",
  "url": "http://example.com",
  "width": 0
}

Attempts at creating an existing image will return an HTTP 400 error code containing the original image id. You can then send a request containing this id to the RetrieveImage endpoint to get the JSON describing the image.

from arkindex import ArkindexClient, options_from_env()
from apistar.exceptions import ErrorResponse

api_client = ArkindexClient(**options_from_env())
try:
    image = api_client.request('CreateIIIFImage', body={'url': 'image_url'}
except ErrorResponse as e:
    if e.status_code != 400 or 'id' not in e.content:
        raise
    image = api_client.request('RetrieveImage', id=e.content['id'])

Element creation

Documents in Arkindex are organised by corpus. Each document is represented by an Element object. You can create a new Element by sending a request to the CreateElement endpoint with a JSON describing the element to create:

element = ark_client.request('CreateElement', body={
                'corpus': str(corpus_id),
                'type': element_type,
                'parent': doc_element['id'],
                'name': page_name,
                'image': image['id']
            })
  • 'corpus' : id of the corpus you want to import elements to.
  • 'type' : type of the element you want to create. .
  • 'parent' : id of an element in the same corpus to which the elements to be created is linked.
  • 'name' : name of the element you want to create.
  • 'image' : id of the image to which the element is linked. (optional)
  • 'polygon' : array of coordinates [x,y] that represents the zone of the image the element is related to.

(https://arkindex.gitlab.io/api-client/#operation/CreateElement)

The Arkindex server will respond to a successful Element creation request with an HTTP 200 containing a JSON payload describing the element.

Putting it together

Before importing our images, we have to check that the corpus accepts at least two types of elements: one corresponding to folders, and one for individual items. By default, a corpus accepts two types of elements, folder and page.

For this example, we will create an element of type page for every image that we want to import to Arkindex. We will group them as children of the same element of type folder.

The code is available in this file and you can use the sample list of images available here.

You can run the script using

python3 simplified_images_import.py --corpus  your_corpus_id --input-file washington_list.txt

After running this script, you should be able to visualize your newly imported images at your_arkindex_url/browse?corpus=your_corpus_id.