Skip to main navigation Skip to search Skip to main content

Sparse spatio-temporal representation with adaptive regularized dictionary learning for low bit-rate video coding

  • Shanghai Jiao Tong University

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

For promising vision-based video coding on low-quality data, this paper proposes a sparse spatio-temporal representation with adaptive regularized dictionary learning and develops a low bit-rate video coding scheme. In a reversed-complexity Wyner-Ziv coding manner, it selects a subset of key frames to code at original resolution, while the rest are down sampled and reconstructed by a sparse spatio-temporal approximation using key frames as a training dataset. Since primitive patches (geometry) are of low dimensionality and can be well learned from the primitive patches across frames in a scale space, a video frame is divided into three layers: a primitive layer, a nonprimitive coarse layer, and a nonprimitive smooth layer. The multiscale differential feature representations are invertible to facilitate reconstruction with dictionary learning, and the target is formulated as an optimization problem by constructing a sparse representation of 2-D patches and 3-D volumes over adaptive regularized dictionaries, a set of 2-D subdictionary pairs trained from primitive patches, and a 3-D dictionary trained from nonprimitive volumes. Specifically, the nonprimitive layer is constructed as volumes in to order keep it consistent along the motion trajectory, which enables sparse representations over a learned 3-D spatio-temporal dictionary. Through hierarchical bidirectional motion estimation and adaptive overlapped block motion compensation, the 3-D low-frequency and high-frequency dictionary pair is designed by the K-SVD algorithm to update the atoms for optimal sparse representation and convergence. In reconstruction, the lost high-frequency information of the down-sampled frames can be synthesized from the sparse spatio-temporal representation over the adaptive regularized dictionaries. Extensive experiments validate the compression efficiency of the proposed scheme versus H.264/AVC in terms of both objective and subjective comparisons.

Original languageEnglish
Article number6317158
Pages (from-to)710-728
Number of pages19
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume23
Issue number4
DOIs
StatePublished - 2013

Keywords

  • Atom decomposition
  • dictionary learning
  • primitive patch
  • sparse representation
  • video coding

Fingerprint

Dive into the research topics of 'Sparse spatio-temporal representation with adaptive regularized dictionary learning for low bit-rate video coding'. Together they form a unique fingerprint.

Cite this