KEYNOTE: Least Squares Support Vector Machines and Deep Learning

Johan Suykens, Katholieke Universiteit Leuven, BELGIUM

DOI

CIS

Members: Free
IEEE Members: Free
Non-members: Free

Length: 01:07:40

03 Jul 2024

Johan Suykens, Katholieke Universiteit Leuven, BELGIUM ABSTRACT: "While powerful architectures have been proposed in deep learning, with support vector machines and kernel-based methods solid foundations have been obtained from the perspective of statistical learning theory and optimization. Simple core models were obtained within the least squares support vector machines framework, related to classification, regression, kernel principal component analysis, kernel canonical correlation analysis, kernel spectral clustering, recurrent models, approximate solutions to partial differential equations and optimal control problems, etc. The representations of the models are understood in terms of primal and dual representations, respectively related to feature maps and kernels. The insights have been exploited for tailoring representations to given data characteristics, both for high dimensional input data and large scale data sets. One can either work with explicit feature maps (such as e.g. convolutional feature maps) or implicit feature maps through the kernel functions. Within this talk we will mainly focus on new insights connecting deep learning and least squares support vector machines. Related to Restricted Boltzmann machines and Deep Boltzmann machines we show how least squares support vector machine models can be transformed into so-called Restricted Kernel Machine representations. It enables to conceive new deep kernel machines, generative models, multi-view and tensor based models with latent space exploration, and obtain improved robustness and explainability. On most recent work, we will explain how the attention mechanism in transformers can be seen within the least squares support vector machine framework. More precisely it can be represented as an extension to asymmetric kernel singular value decomposition with primal and dual model representations, related to two feature maps (queries and keys) and an asymmetric kernel. In the resulting method of ""Primal-Attention"" a regularized loss is employed to achieve low-rank representations for efficient training in the primal. Finally, these newly obtained synergies are very promising in order to obtain the bigger and unifying picture. Several future challenges will be outlined from this perspective."

Tags:

IEEE

cis

video

WCCI 2024

support vector machines

Kernel-based methods

Statistical learning theory

optimization

Least squares support vector machines

classification

regression

Kernel principal component analysis

Kernel canonical correlation analysis

Kernel spectral clustering

Recurrent models

Primal and dual representations

Feature maps

kernels

Convolutional feature maps

Restricted Boltzmann machines

Deep Boltzmann machines

Restricted Kernel Machines

Latent space exploration

Asymmetric kernel singular value decomposition

Primal-Attention

Regularized loss

Low-rank representations