Ordered Gradient Approach for Communication-Efficient Distributed Learning
Yicheng Chen, Brian Sadler, Rick Blum
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:02
The topic of training machine learning models by
employing multiple gradient-computing workers is attracting
great interest recently. Communication efficiency in such distributed
learning settings is an important consideration, especially
for the case where the needed communications are expensive
in terms of power usage. We develop a new approach which is
efficient in terms of communication transmissions. In this scheme,
only the most informative worker results are transmitted to
reduce the total number of transmissions. Our ordered gradient
approach provably achieves the same order of convergence rate
as gradient descent for nonconvex smooth loss functions while
gradient descent always uses more communications. Experiments
show significant communication savings compared to the best
existing approaches in some cases when the system has a large
number of workers and some of the local data samples of each
worker are nearly identical to the samples of the other workers.
employing multiple gradient-computing workers is attracting
great interest recently. Communication efficiency in such distributed
learning settings is an important consideration, especially
for the case where the needed communications are expensive
in terms of power usage. We develop a new approach which is
efficient in terms of communication transmissions. In this scheme,
only the most informative worker results are transmitted to
reduce the total number of transmissions. Our ordered gradient
approach provably achieves the same order of convergence rate
as gradient descent for nonconvex smooth loss functions while
gradient descent always uses more communications. Experiments
show significant communication savings compared to the best
existing approaches in some cases when the system has a large
number of workers and some of the local data samples of each
worker are nearly identical to the samples of the other workers.