Skip to main navigation Skip to search Skip to main content

Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary

  • The University of Chicago

Research output: Contribution to conferencePaperpeer-review

59 Scopus citations

Abstract

We consider "online bandit geometric optimization," a problem of iterated decision making in a largely unknown and constantly changing environment. The goal is to minimize "regret," defined as the difference between the actual loss of an online decision-making procedure and that of the best single decision in hindsight. "Geometric optimization" refers to a generalization of the well-known multi-armed bandit problem, in which the decision space is some bounded subset of ℝ d, the adversary is restricted to linear loss functions, and regret bounds should depend on the dimensionality d, rather than the total number of possible decisions. "Bandit" refers to the setting in which the algorithm is only told its loss on each round, rather than the entire loss function. McMahan and Blum [10] presented the best known algorithm in this setting, and proved that its expected additive regret is O(poly(d)T 3/4). We simplify and improve their analysis of this algorithm to obtain regret O(poly(d)T 2/3). We also prove that, for a large class of full-information online optimization problems, the optimal regret against an adaptive adversary is the same as against a non-adaptive adversary.

Original languageEnglish
Pages937-943
Number of pages7
DOIs
StatePublished - 2006
EventSeventeenth Annual ACM-SIAM Symposium on Discrete Algorithms - Miami, FL, United States
Duration: Jan 22 2006Jan 24 2006

Conference

ConferenceSeventeenth Annual ACM-SIAM Symposium on Discrete Algorithms
Country/TerritoryUnited States
CityMiami, FL
Period01/22/0601/24/06

Fingerprint

Dive into the research topics of 'Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary'. Together they form a unique fingerprint.

Cite this