Skip to main content

DH Seminar series (VUB-ULB): "A Large Visual Question Answering Dataset for Cultural Heritage"

Location: Microsoft Teams
Add to personal calendar

Save the date! On June 10, 2021, the last event of the Brussels DH seminars series, co-organized by the VUB and the ULB, will take place with the following seminar.


Ludovica Marinucci (Italian CNR): “A Large Visual Question Answering Dataset for Cultural Heritage




Connecting the bridge between Computer Vision and Natural Language Processing, Visual Question Answering (VQA) has recently got interest as a thriving research area that has achieved considerable results in the field of artificial intelligence. Placed within this framework, the proposed work aims at creating a large resource for VQA related to the Cultural Heritage (CH) domain.


To this end, by using data and models from ArCo (Architecture of Knowledge), the biggest  Knowledge Graph (KG) of the Italian cultural heritage, a template-based approach was pursued to create a large dataset for VQA by combining (i) the perspective of domain experts, represented by competency questions elicited to model the ArCo ontology network, with (ii) a user-centered perspective, given by the questions of mostly non-expert users collected through questionnaires on a set of images of various kinds of cultural assets belonging to the ArCo KG. Those perspectives allowed the generation of a large dataset with question-answer pairs in natural language (both in Italian and English) by extracting data from ArCo KG through SPARQL queries and suitably cleaning and transforming such data.


During the talk, I will describe the results and the lessons learned by this semi-automatic process for the dataset generation, and discuss the employed tools (cleaning, grammar checking, semantic clustering, automatic translation, etc.) for data extraction and transformation.


The speaker

Ludovica Marinucci is a Post-Doctoral Researcher at the Semantic Technology Laboratory (STLab) of the National Research Council (CNR) in Rome, Italy, working on projects involving the analysis of the social and cognitive aspects of the use of semantic technologies. From 2014, she is adjunct professor in Philosophy of Science at Tor Vergata University of Rome, Faculty of Medicine. In 2017 she received her PhD in Philosophy, Epistemology and History of Culture at University of Cagliari (Italy) during which she began to address the theoretical possibilities and challenges offered by the computational analysis of historical and philosophical texts and, more generally, by the interaction of computer science and humanities.


Considered the current situation related to the spread of Covid-19, for 2020 the seminar will take the form of a webinar, hosted on the platform Teams. The participation is free but registration is needed: please send an email to  and before June 1. A link will then be sent to access the Teams meeting. This event replaces the one we originally planned for April 19 we had to postpone, all the people enrolled in that seminar will receive a link to participate.


The seminar will take place during lunch time, from 12:00 to 13:00.