-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:22
Supervised learning scenarios, where labels and features are possibly mismatched, have been an emerging concern in machine learning applications. For example, researchers often need to align heterogeneous data from multiple resources to the same entities without a unique identifier in the socioeconomic study. Such a mismatch problem can significantly affect the learning performance if it is not appropriately addressed. Due to the combinatorial nature of the mismatch problem, existing methods are often designed for small datasets and simple linear models but are not scalable to large-scale datasets and complex models. In this paper, we first present a new formulation of the mismatch problem that supports continuous optimization problems and allows for gradient-based methods. Moreover, we develop a computation and memory efficient method to process complex data and models. Empirical studies on synthetic and real-world data show significantly better performance of the proposed algorithms than state-of-the-art methods.