Exploring handwritten heritage

TranSkriptorium – making handwritten assets searchable

Colleagues from our Time Machine partner Universitat Politècnica de València (UPV) are part of the team behind tranSkriptorium – a digital tool aiming to help public and private institution to make their large handwritten assets searchable.

Immense collections of historical manuscripts are stored in thousands of kilometres of shelves in archives and libraries. It is estimated that the total amount of handwritten text is still greater than the amount of mechanized text. Digital preservation of these works shouldn’t be the final goal. All efforts should go towards making the valuable information contained in them available for consumption. Digitalization is a necessary step, but only the first in many when it comes to further exploring the sources.

Motivation

The team behind tranSkriptorium tries to accommodate the following questions:

  • Is the current tendency to digitalize collections truly delivering easy access to the information?
  • How is one to search through the thousands of images of a collection for the content they need?
  • Can any user, without the correct context and expertise, discern the contents?
  • What would be the cost in expert hours and the cost of opportunity?
  • How much of this invaluable information are we ready to lose forever?
  • Would you be OK with a massive binary dump of all the data in your company and no way to search or actually understand what the contents are?

Solution

Transcribing all these texts would facilitate access to their contents for an extraordinary number of users and researchers. Unfortunately, manual transcription is prohibitive and unassisted automatic transcription lacks the desired precision. Via Computer Assisted Transcription TranSkriptorium can make precise transcriptions at affordable prices. Even better, they can automatically index and allow probabilistic searches without the need of transcribing. Their probabilistic indexes allow to perform big data analysis over the indexed documents: classification, automatic summaries, etc.

Get more information about the assets and cooperation possibilities here: