RECONSTRUCTING SPEECH FROM CNN EMBEDDINGS

Luca Comanducci, Paolo Bestagini, Augusto Sarti, Stefano Tubaro, Marco Tagliasacchi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:21

13 May 2022

The complete understanding of the decision-making process of Convolutional Neural Networks (CNNs) is far from being fully reached. Many researchers proposed techniques to interpret what a network actually ?learns? from data. Nevertheless many questions still remain unanswered. In this work we study one aspect of this problem by reconstructing speech from the intermediate embeddings computed by a CNNs. Specifically, we consider a pre-trained network that acts as a feature extractor from speech audio. We investigate the possibility of inverting these features, reconstructing the input signals in a black-box scenario, and quantitatively measure the reconstruction quality by measuring the word-error-rate of an off-the-shelf ASR model. Experiments performed using two different CNN architectures trained for six different classification tasks, show that it is possible to reconstruct time-domain speech signals that preserve the semantic content, whenever the embeddings are extracted before the fully connected layers.

Tags:

null

RECONSTRUCTING SPEECH FROM CNN EMBEDDINGS

Luca Comanducci, Paolo Bestagini, Augusto Sarti, Stefano Tubaro, Marco Tagliasacchi

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

PROGRESS-ICASSP 2022: Opening Speech

PROGRESS-ICASSP 2022: Introduction by Farokh Atashzar and Nancy F. Chen

AN ADAPTIVE ALL-PASS FILTER FOR TIME-VARYING DELAY ESTIMATION

Join an IEEE Society