Earthquake Location and Magnitude Estimation With Graph Neural Networks
Ian McBrearty, Gregory Beroza
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:09
in this paper, we propose a relation enhanced vision-language pre-training (VLP) method for a transformer model (TM) to improve performance in vision-language (V+L) tasks. Current VLP studies attempted to generate a multimodal representation with individual objects as input and relied on a self-attention to learn semantic representation in a brute force manner. However, the relations among objects in an image are largely ignored. To address the problem, We generate a paired visual feature (PVF) that is organized to express the relations between objects. Prior knowledge that reflects co-occurrences of the paired objects and a pair-wise distance matrix adjusts the relations of paired objects. A triplet is used for sentence embedding. Experimental results demonstrate that the proposed method is efficiently used for VLP by enhanced relations between objects, and thus improving performance on V+L downstream tasks.