GENERATION OF VIEWED IMAGE CAPTIONS FROM HUMAN BRAIN ACTIVITY VIA UNSUPERVISED TEXT LATENT SPACE
Saya Takada, Ren Togo, Takahiro Ogawa, Miki Haseyama
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:30
Generation of human cognitive contents based on the analysis of functional magnetic resonance imaging (fMRI) data has been actively researched. Cognitive contents such as viewed images can be estimated by analyzing the relationship between fMRI data and semantic information of viewed images. In this paper, we propose a new method generating captions for viewed images from human brain activity via a novel robust regression scheme. Unlike conventional generation methods using image feature representations, the proposed method makes use of more semantic text feature representations which are more suitable for the caption generation. We construct a text latent space with unlabeled images not used for the training, and the fMRI data are regressed to the text latent space. In addition, we newly make use of unlabeled images not used for the training phase to improve caption generation performance. Finally, the proposed method can generate captions from the fMRI data measured while subjects are viewing images. Experimental results show that the proposed method enables accurate caption generation for viewed images.