Abstract
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O*(√T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high-probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.
| Original language | English |
|---|---|
| Pages | 335-341 |
| Number of pages | 7 |
| State | Published - 2008 |
| Event | 21st Annual Conference on Learning Theory, COLT 2008 - Helsinki, Finland Duration: Jul 9 2008 → Jul 12 2008 |
Conference
| Conference | 21st Annual Conference on Learning Theory, COLT 2008 |
|---|---|
| Country/Territory | Finland |
| City | Helsinki |
| Period | 07/9/08 → 07/12/08 |
Fingerprint
Dive into the research topics of 'High-probability regret bounds for bandit online linear optimization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver