Orateur : Alexander Nazin, Institute of Control Sciences RAS, Moscow

Titre : Application of the mirror descent method to a stochastic multi-armed bandit problem.

Résumé :

The mirror descent method represents a gradient-type recursive algorithm for convex optimization. It relates to primal-dual type of methods, performing the descent in a dual space and mapping the resulted points to a primal one. Recently, the approach was extended to a class of stochastic optimization problems, first in [Nesterov Yu., 2005] and then in [A.B. Juditsky, A.V. Nazin, A.B. Tsybakov, and N. Vayatis, 2005]. As an introduction of the talk, the approach is illustrated on a general convex stochastic optimization problem and the upper bound Theorem 2 from the latter article is stated and discussed. Furthermore, we address the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution of actions through a stochastic mirror descent type algorithm. A particular case of the algorithm is known as that of Exponential Weights. We mainly consider nonnegative losses and prove optimal bounds on the excess risk for the latter particular case. Finally, we discuss possible extensions and improvements.

Transparents