A Large Visual Question Answering Dataset for Cultural Heritage

Ludovica Marinucci (Italian CNR)
10 June 2021, online seminar

Connecting the bridge between Computer Vision and Natural Language Processing, Visual Question Answering (VQA) has recently got interest as a thriving research area that has achieved considerable results in the field of artificial intelligence. Placed within this framework, the proposed work aims at creating a large resource for VQA related to the Cultural Heritage (CH) domain.

To this end, by using data and models from ArCo (Architecture of Knowledge), the biggest  Knowledge Graph (KG) of the Italian cultural heritage, a template-based approach was pursued to create a large dataset for VQA by combining (i) the perspective of domain experts, represented by competency questions elicited to model the ArCo ontology network, with (ii) a user-centered perspective, given by the questions of mostly non-expert users collected through questionnaires on a set of images of various kinds of cultural assets belonging to the ArCo KG. Those perspectives allowed the generation of a large dataset with question-answer pairs in natural language (both in Italian and English) by extracting data from ArCo KG through SPARQL queries and suitably cleaning and transforming such data.

During the talk, Ludovica Marinucci will describe the results and the lessons learned by this semi-automatic process for the dataset generation, and discuss the employed tools (cleaning, grammar checking, semantic clustering, automatic translation, etc.) for data extraction and transformation.

About Ludovica Marinucci

Ludovica Marinucci is a Post-Doctoral Researcher at the Semantic Technology Laboratory (STLab) of the National Research Council (CNR) in Rome, Italy, working on projects involving the analysis of the social and cognitive aspects of the use of semantic technologies. From 2014, she is adjunct professor in Philosophy of Science at Tor Vergata University of Rome, Faculty of Medicine. In 2017 she received her PhD in Philosophy, Epistemology and History of Culture at University of Cagliari (Italy) during which she began to address the theoretical possibilities and challenges offered by the computational analysis of historical and philosophical texts and, more generally, by the interaction of computer science and humanities.


  • 10 June 2021, 12-13h
  • Location: online (Teams)
  • Contact: sebastien.de.valeriola@ulb.be & andrea.penso@vub.be
  • Language: English


  • Registration period: before 1 June 2021

Ready to get started?

All practical information can be found on the training page of VUB Digi Group.