Future Computing and Informatics Journal
Abstract
Visual Question Answering (VQA) has been an emerging field in computer vision and natural language processing that aims to enable machines to understand the content of images and answer natural language questions about them. Recently, there has been increasing interest in integrating Semantic Web technologies into VQA systems to enhance their performance and scalability. In this context, knowledge graphs, which represent structured knowledge in the form of entities and their relationships, have shown great potential in providing rich semantic information for VQA. This paper provides an abstract overview of the state-of-the-art research on VQA using Semantic Web technologies, including knowledge graph based VQA, medical VQA with semantic segmentation, and multi-modal fusion with recurrent neural networks. The paper also highlights the challenges and future directions in this area, such as improving the accuracy of knowledge graph based VQA, addressing the semantic gap between image content and natural language, and designing more effective multimodal fusion strategies. Overall, this paper emphasizes the importance and potential of using Semantic Web technologies in VQA and encourages further research in this exciting area.
Recommended Citation
El-Naggar, Gehad Assem
(2023)
"Visual Question Answering: A SURVEY,"
Future Computing and Informatics Journal: Vol. 8:
Iss.
1, Article 1.
Available at:
https://digitalcommons.aaru.edu.jo/fcij/vol8/iss1/1
Included in
Biomedical Commons, Computer and Systems Architecture Commons, Data Storage Systems Commons, Digital Communications and Networking Commons, Operational Research Commons, Other Computer Engineering Commons, Robotics Commons, Signal Processing Commons, Systems and Communications Commons