Time-Lag Aware Multi-Modal Variational Autoencoder Using Baseball Videos And Tweets For Prediction Of Important Scenes

Kaito Hirasawa, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:05:39

21 Sep 2021

A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (Tl-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous data for the prediction of important scenes, we transform textual, visual and audio features obtained from tweets and videos to the latent features. Then Tl-MVAE-PIS can flexibly express the relationships between them in the constructed latent space. Second, since there are time-lags between tweets and the corresponding multiple previous events, Tl-MVAE-PIS considers such time-lags in their relationship estimation for successfully deriving their latent features. Therefore, these two contributions enable accurate important scene prediction. Results of experiments using actual baseball videos and their corresponding tweets show the effectiveness of Tl-MVAE-PIS.

Tags:

signal processing society

IEEE icip 2021

september 19-22

virtual conference

2021

sps

virtual conference icip 2021

icip 2021

Time-Lag Aware Multi-Modal Variational Autoencoder Using Baseball Videos And Tweets For Prediction Of Important Scenes

Kaito Hirasawa, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Value-Added Bundle(s) Including this Product

ICIP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Coordinate Transformer Network For Prediction Of Pseudomonas AeruginosaÐ²Ð‚â„¢S Drug Resistance

Msrt: Multi-Scale Spatial Regularization Transformer For Multi-Label Classification In Calcaneus Radiograph

Deep Hierarchical Multiple Instance Learning For Whole Slide Image Classification

Join an IEEE Society