Improved Step-Size Schedules For Noisy Gradient Methods
Sarit Khirirat, Xiaoyu Wang, Sindri Magnússon, Mikael Johansson
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:58
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order methods and compressed gradient methods. For such methods to converge toward a global optimum, it is intuitive to use large step-sizes in the initial iterations when the noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient methods, but similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can be indeed applied for a broad class of noisy gradient methods. Unlike previous works, our analysis framework shows that such step-size schedules enable these methods to enjoy an optimal $\mathcal{O}(1/k)$ rate. We exemplify our results on zeroth-order methods and stochastic compression methods. Our experiments validate fast convergence of these methods with the step decay schedules.
Chairs:
Konstantinos Slavakis