Skip to main navigation Skip to search Skip to main content

Online reinforcement learning for multimedia buffer control

  • University of California at Los Angeles

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

We formulate the multimedia buffer control problem as a Markov decision process. Because the application's rate-distortion-complexity behavior is unknown a priori, the optimal buffer control policy must be learned online. To this end, we adopt a low complexity reinforcement learning algorithm called Q-learning to learn the optimal control policy at run-time. We propose an accelerated Q-learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the performance. In our experiments, we show that the proposed application-aware reinforcement learning algorithm performs significantly better than existing application-independent reinforcement learning algorithms.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1958-1961
Number of pages4
ISBN (Print)9781424442966
DOIs
StatePublished - 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: Mar 14 2010Mar 19 2010

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/TerritoryUnited States
CityDallas, TX
Period03/14/1003/19/10

Keywords

  • Dynamic voltage scaling
  • Encoder complexity control
  • Markov decision processes
  • Multimedia buffer control
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Online reinforcement learning for multimedia buffer control'. Together they form a unique fingerprint.

Cite this