Optimal stochastic and distributed algorithms for machine learning