Information Sciences Letters

Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application

Amir El-Komy, English Language Department, College of Science and Arts in Qurayyat, Jouf University, Saudi ArabiaFollow
Osama R. Shahin, Computer Science Department, College of Science and Arts in Qurayyat, Jouf University, Saudi Arabia\\ Physics and Mathematics Department, Faculty of Engineering, Helwan University, EgyptFollow
Rasha M. Abd El-Aziz, Computer Science Department, College of Science and Arts in Qurayyat, Jouf University, Saudi Arabia\\ Computer Science Department, Faculty of Computers and Information, Assiut University, EgyptFollow
Ahmed I. Taloba, Computer Science Department, College of Science and Arts in Qurayyat, Jouf University, Saudi Arabia\\ Information System Department, Faculty of Computers and Information, Assiut University, EgyptFollow

Abstract

Computer vision and natural language processing (NLP) are two active machine learning research areas. However, the integration of these two areas gives rise to a new interdisciplinary field, which is currently attracting more attention of researchers. Research has been carried out to extract the text associated with an image or a video that can assist in making computer vision effective. Moreover, researchers focus on utilizing NLP to extract the meaning of words through the use of computer vision. This concept is widely used in robotics. Although robots should observe the surroundings from different ways of interactions, natural gestures and spoken languages are the most convenient way for humans to interact with the robots. This would be possible only if the robots can understand such types of interactions. In the present paper, the proposed integrated application is utilized for guiding vision-impaired people. As vision is the most essential in the life of a human being, an alternative source that helps in guiding the blind in their movements is highly important. For this purpose, the current paper uses a smartphone with the capabilities of vision, language, and intelligence which has been attached to the blind person to capture the images of their surroundings, and it is associated with a Faster Region Convolutional Neural Network (F-RCNN) based central server to detect the objects in the image to inform the person about them and avoid obstacles in their way. These results are passed to the smartphone which produces a speech output for the guidance of the blinds.

Recommended Citation

El-Komy, Amir; R. Shahin, Osama; M. Abd El-Aziz, Rasha; and I. Taloba, Ahmed (2022) "Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application," Information Sciences Letters: Vol. 11 : Iss. 3 , PP -.
Available at: https://digitalcommons.aaru.edu.jo/isl/vol11/iss3/9

Download

COinS

Information Sciences Letters

Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application

Authors

Abstract

Recommended Citation

Share

Search