Photographs and other images of architecture serve in many historical sciences as a source and basis for subject- and theory specific investigations. For example, historical photographs are used to reconstruct the condition of a building or to identify the formal language of an era. The starting point for these scenarios from the fields of architecture, art history and cultural studies is a source research and criticism supported by aids from the respective disciplines, on which further evaluations and uses in the scientific context are based.
Although AI-based methods of computer vision have developed considerably in recent years, they can only support the process of source research and criticism to a limited extent, e.g. for the exploration of image repositories or the retrieval of images. This is due in part to the fact that although elementary procedures in this regard are well documented, scientists – as investigated in three dissertation projects under the supervision of the coordinator – proceed very individually. On the other hand, AI image processing has so far not been designed to contextualize pictorial content in a multimodal way, i.e. to combine different source genres such as images and texts. Existing methods of computer vision extract purely visual features and classify them, while texts or metadata and the knowledge contained therein, such as references to temporal contexts or individual motifs, cannot be linked to the analysis.
The proposed project HistKI aims to explore the support and modelling of image source research and critique as a complex and fundamental historiographical working technique using multimodal AI-based procedures. Related subquestions are: How do historians and other specialists find and evaluate image sources? Which generic approaches and subproblems can be identified for this purpose? How can this be promoted with AI-based approaches? How do AI techniques affect the research process in the humanities?
These questions are to be examined on the basis of selected scenarios in which images, texts and 3D models describing architectural objects and urban planning ensembles interact synergistically for an analysis process. With the help of machine learning methods, object sources and text sources (e.g., captions) will be linked in HistKI to allow a detailed contextualization and location of the photographs in the future, thus going a significant step beyond previous methods of distant viewing.
- Ludwig-Maximilians-Universität München (LMU), Institut für Kunstgeschichte