In many European natural history museums, specimens have been digitized massively over the last decade. However, around 85% of the information related to those specimens is found on the specimen labels or in physical registers, which may be available as images but not as readily accessible or usable textual information. Digitization of this information is done manually which is time-consuming and costly.

The Specimen Data Refinery (SDR) project aims to develop a cloud-based platform of tools to process natural history specimen images and their labels en masse in order to efficiently and effectively extract essential data. TEKLIA's technology for line detection, printed and handwritten text recognition and named-entity extraction will be adapted to the specimen data, containerized and incorporated into the SDR workflow & tool registry. This project is carried out in close collaboration with the Natural History Museum, The Royal Botanic Garden Edinburgh, Meise Botanic Garden and many other partners of the SYNTHESYS+ project.

For more information, see our paper Landscape Analysis for the Specimen Data Refinery

s8

Image: Natural History Museum