Участник:Beznosikov.an
Материал из MachineLearning.
(→Отчеты о научно-исследовательской работе) |
|||
(3 промежуточные версии не показаны) | |||
Строка 1: | Строка 1: | ||
- | |||
== Безносиков Александр == | == Безносиков Александр == | ||
Строка 9: | Строка 8: | ||
'''Направление:''' "Интеллектуальный анализ данных" | '''Направление:''' "Интеллектуальный анализ данных" | ||
+ | |||
+ | |||
+ | == Отчеты о научно-исследовательской работе == | ||
+ | |||
+ | |||
+ | === Весна 2020, 8-й семестр=== | ||
+ | '''Gradient-Free Methods for Saddle-Point Problem''' | ||
+ | |||
+ | ''In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces n/log(n) times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases of some classical sets.'' | ||
+ | |||
+ | '''Публикация''' | ||
+ | *{{биб.статья | ||
+ | |автор = Aleksandr Beznosikov, Abdurakhmon Sadiev, Alexander Gasnikov | ||
+ | |заглавие = Gradient-Free Methods for Saddle-Point Problem | ||
+ | |издание = Arxiv preprint | ||
+ | |год = 2020 | ||
+ | |url = https://arxiv.org/abs/2005.05913 | ||
+ | }} | ||
+ | [https://arxiv.org/abs/2005.05913 Ссылка] | ||
+ | |||
+ | |||
+ | '''On Biased Compression for Distributed Learning''' | ||
+ | |||
+ | ''In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {\em biased} compressors often show superior performance in practice when compared to the much more studied and understood {\em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. Our {\em distributed} SGD method enjoys the ergodic rate O(δLexp(−K)μ+(C+D)Kμ), where δ is a compression parameter which grows when more compression is applied, L and μ are the smoothness and strong convexity constants, C captures stochastic gradient noise (C=0 if full gradients are computed on each node) and D captures the variance of the gradients at the optimum (D=0 for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose a new highly performing biased compressor---combination of Top-k and natural dithering---which in our experiments outperforms all other compression techniques.'' | ||
+ | |||
+ | '''Публикация''' | ||
+ | *{{биб.статья | ||
+ | |автор = Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan | ||
+ | |заглавие = On Biased Compression for Distributed Learning | ||
+ | |издание = Arxiv preprint | ||
+ | |год = 2020 | ||
+ | |url = https://arxiv.org/abs/2002.12410 | ||
+ | }} | ||
+ | [https://arxiv.org/abs/2002.12410 Ссылка] | ||
+ | |||
+ | |||
+ | === Осень 2019, 7-й семестр=== | ||
+ | '''Derivative-Free Method For Decentralized Distributed Non-Smooth Optimization''' | ||
+ | |||
+ | ''In this paper, we propose new derivative-free method which is based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space. We apply this method for the decentralized distributed optimization and prove the bounds for the number of communication rounds for this method that matches the lower bounds. We prove the bound for the number of zeroth-order oracle calls per node that matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space.'' | ||
+ | |||
+ | '''Публикация''' | ||
+ | *{{биб.статья | ||
+ | |автор = Aleksandr Beznosikov, Eduard Gorbunov, Alexander Gasnikov | ||
+ | |заглавие = Derivative-Free Method For Decentralized Distributed Non-Smooth Optimization | ||
+ | |издание = Arxiv preprint | ||
+ | |год = 2019 | ||
+ | |url = https://arxiv.org/abs/1911.10645 | ||
+ | }} | ||
+ | [https://arxiv.org/abs/1911.10645 Ссылка] |
Текущая версия
Содержание |
Безносиков Александр
МФТИ, ФУПМ, 674 группа
email: beznosikov.an@phystech.edu
Кафедра: "Интеллектуальные системы"
Направление: "Интеллектуальный анализ данных"
Отчеты о научно-исследовательской работе
Весна 2020, 8-й семестр
Gradient-Free Methods for Saddle-Point Problem
In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces n/log(n) times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases of some classical sets.
Публикация
- Aleksandr Beznosikov, Abdurakhmon Sadiev, Alexander Gasnikov Gradient-Free Methods for Saddle-Point Problem // Arxiv preprint. — 2020.
On Biased Compression for Distributed Learning
In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {\em biased} compressors often show superior performance in practice when compared to the much more studied and understood {\em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. Our {\em distributed} SGD method enjoys the ergodic rate O(δLexp(−K)μ+(C+D)Kμ), where δ is a compression parameter which grows when more compression is applied, L and μ are the smoothness and strong convexity constants, C captures stochastic gradient noise (C=0 if full gradients are computed on each node) and D captures the variance of the gradients at the optimum (D=0 for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose a new highly performing biased compressor---combination of Top-k and natural dithering---which in our experiments outperforms all other compression techniques.
Публикация
- Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan On Biased Compression for Distributed Learning // Arxiv preprint. — 2020.
Осень 2019, 7-й семестр
Derivative-Free Method For Decentralized Distributed Non-Smooth Optimization
In this paper, we propose new derivative-free method which is based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space. We apply this method for the decentralized distributed optimization and prove the bounds for the number of communication rounds for this method that matches the lower bounds. We prove the bound for the number of zeroth-order oracle calls per node that matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space.
Публикация
- Aleksandr Beznosikov, Eduard Gorbunov, Alexander Gasnikov Derivative-Free Method For Decentralized Distributed Non-Smooth Optimization // Arxiv preprint. — 2019.