A penalized bandit algorithm

Damien Lamberton (Université Paris-Est)
Gilles Pagès (Université Paris 6)


We study a two armed-bandit recursive algorithm with penalty. We show that the algorithm converges towards its ``target" although it always has a noiseless ``trap". Then, we elucidate the rate of convergence. For some choices of the parameters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a Markov process with jumps.

Pages: 341-373

Publication Date: March 10, 2008

DOI: 10.1214/EJP.v13-489


