Skip to main navigation Skip to search Skip to main content

High-probability regret bounds for bandit online linear optimization

  • Peter L. Bartlett
  • , Varsha Dani
  • , Thomas P. Hayes
  • , Sham M. Kakade
  • , Alexander Rakhlin
  • , Ambuj Tewari
  • University of California at Berkeley
  • The University of Chicago
  • Toyota Technological Institute at Chicago

Research output: Contribution to conferencePaperpeer-review

82 Scopus citations

Abstract

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O*(√T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high-probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Original languageEnglish
Pages335-341
Number of pages7
StatePublished - 2008
Event21st Annual Conference on Learning Theory, COLT 2008 - Helsinki, Finland
Duration: Jul 9 2008Jul 12 2008

Conference

Conference21st Annual Conference on Learning Theory, COLT 2008
Country/TerritoryFinland
CityHelsinki
Period07/9/0807/12/08

Fingerprint

Dive into the research topics of 'High-probability regret bounds for bandit online linear optimization'. Together they form a unique fingerprint.

Cite this