A penalized bandit algorithm

Damien Lamberton (Université Paris-Est)
Gilles Pagès (Université Paris 6)


We study a two armed-bandit recursive algorithm with penalty. We show that the algorithm converges towards its ``target" although it always has a noiseless ``trap". Then, we elucidate the rate of convergence. For some choices of the parameters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a Markov process with jumps.

Full Text: Download PDF | View PDF online (requires PDF plugin)

Pages: 341-373

Publication Date: March 10, 2008

DOI: 10.1214/EJP.v13-489


  1. Bai, Zhi-Dong; Hu, Feifang. Asymptotics in randomized urn models. Ann. Appl. Probab. 15 (2005), no. 1B, 914--940. MR2114994 (2005k:62055)
  2. Benaïm, Michel. Dynamics of stochastic approximation algorithms. Séminaire de Probabilités, XXXIII, 1--68, Lecture Notes in Math., 1709, Springer, Berlin, 1999. MR1767993 (2001d:62081)
  3. Bouton, Catherine. Approximation gaussienne d'algorithmes stochastiques à dynamique markovienne.(French) [Gauss approximation of stochastic algorithms with Markov dynamics] Ann. Inst. H. Poincaré Probab. Statist. 24 (1988), no. 1, 131--155. MR0937959 (89g:60223)
  4. Dubins, Lester E.; Freedman, David A. A sharper form of the Borel-Cantelli lemma and the strong law. Ann. Math. Statist. 36 1965 800--807. MR0182041 (31 #6265)
  5. D. Duffie, J. Pan, K. Singleton (2000), Transform Analysis and Asset Pricing for Affine Jump-Diffusions, Econometrica 68 (2000), no. 6, 1343--1376. MR1793362
  6. Duffie, D.; Filipovi'c, D.; Schachermayer, W. Affine processes and applications in finance. Ann. Appl. Probab. 13 (2003), no. 3, 984--1053. MR1994043 (2004g:60107)
  7. Duflo, Marie. Algorithmes stochastiques.(French) [Stochastic algorithms] Mathématiques & Applications (Berlin) [Mathematics & Applications], 23. Springer-Verlag, Berlin, 1996. xiv+319 pp. ISBN: 3-540-60699-8 MR1612815 (99e:62141)
  8. Jacod, Jean; Shiryaev, Albert N. Limit theorems for stochastic processes.Second edition.Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 288. Springer-Verlag, Berlin, 2003. xx+661 pp. ISBN: 3-540-43932-3 MR1943877 (2003j:60001)
  9. Karlin, Samuel; Taylor, Howard M. A second course in stochastic processes.Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1981. xviii+542 pp. ISBN: 0-12-398650-8 MR0611513 (82j:60003)
  10. Kushner, Harold J.; Clark, Dean S. Stochastic approximation methods for constrained and unconstrained systems.Applied Mathematical Sciences, 26. Springer-Verlag, New York-Berlin, 1978. x+261 pp. ISBN: 0-387-90341-0 MR0499560 (80g:62065)
  11. Kushner, Harold J.; Yin, G. George. Stochastic approximation and recursive algorithms and applications.Second edition.Applications of Mathematics (New York), 35. Stochastic Modelling and Applied Probability.Springer-Verlag, New York, 2003. xxii+474 pp. ISBN: 0-387-00894-2 MR1993642 (2004e:62005)
  12. S. Lakshmivarahan (1979), $varepsilon$-optimal Learning Algorithms-Non-Absorbing Barrier Type, Univ. Oklahoma, School of Electrical Engineering and Computing Science, Techn. Report EECS 7901.
  13. Lakshmivarahan, S. Learning algorithms.Theory and applications.Springer-Verlag, New York-Berlin, 1981. x+279 pp. ISBN: 0-387-90640-1 MR0666244 (84b:68117)
  14. D. Lamberton, G. Pages (2005), How fast is the bandit?, pre-print LPMA-1018, Univ. Paris 6, forthcoming in Stochastic Analysis and Applications.
  15. Lamberton, Damien; Pagès, Gilles; Tarrès, Pierre. When can the two-armed bandit algorithm be trusted? Ann. Appl. Probab. 14 (2004), no. 3, 1424--1454. MR2071429 (2005k:62183)
  16. P. Massart (2003), St-Flour Lecture Notes, Cours de l'ecole d'ete de Saint-Flour 2003, pre-print, Univ. Paris-Sud (France), http://www.math.u-psud.fr/~massart/flour.pdf.
  17. Narendra, Kumpati S.; Thathachar, M. A. L. Learning automata---a survey. IEEE Trans. Systems Man Cybernet. SMC-4 (1974), 323--334. MR0469583 (57 #9368)
  18. K.S. Narendra, M.A.L. Thathachar (1989), Learning Automata - An introduction, Prentice Hall, Englewood Cliffs, NJ, 476p.
  19. Norman, M. Frank. On the linear model with two absorbing barriers. J. Mathematical Psychology 5 1968 225--241. MR0226954 (37 #2540)
  20. Pemantle, Robin. Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18 (1990), no. 2, 698--712. MR1055428 (91e:60159)
  21. I.J. Shapiro, K.S. Narendra (1969), Use of Stochastic Automata for Parameter Self-Optimization with Multi-Modal Perfomance Criteria, IEEE Trans. Syst. Sci. and Cybern., SSC-5, 352-360.
  22. P. Tarres, Algorithmes stochastiques et marches aleatoires renforcees, These de l'ENS Cachan (France), novembre 2001.
  23. P. Tarres, P. Vandekerkhove, On the ergodic two armed-bandit algorithm, preprint, 2006.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.