Skip to main navigation Skip to search Skip to main content

Many Hands Make Light Work: Accelerating Edge Inference via Multi-Client Collaborative Caching

  • Wenyi Liang
  • , Jianchun Liu
  • , Hongli Xu
  • , Chunming Qiao
  • , Liusheng Huang
  • University of Science and Technology of China

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Edge inference is a technology that enables real-time data processing and analysis on clients near the data source. To ensure compliance with the Service-Level Objectives (SLOs), such as a 30% latency reduction target, caching is usually adopted to reduce redundant computations in inference tasks on stream data. Due to task and data correlations, sharing cache information among clients can improve the inference performance. However, the non-independent and identically distributed (non-IID) nature of data across different clients and the long-tail distributions, where some classes have significantly more samples than others, will reduce cache hit ratios and increase latency. To address the aforementioned challenges, we propose an efficient inference framework, CoCa, which leverages a multi-client collaborative caching mechanism to accelerate edge inference. On the client side, the model is pre-set with multiple cache layers to achieve a quick inference. During inference, the model performs sequential lookups at cache layers activated by the edge server. On the server side, CoCa uses a two-dimensional global cache to periodically aggregate information from clients, mitigating the effects of non-IID data. For client cache allocation, CoCa first evaluates the importance of classes based on how frequently and recently their samples have been accessed. CoCa then selects frequently recurring classes to address long-tail distribution challenges. Finally, CoCa dynamically activates cache layers to balance lookup overhead and accuracy. Extensive experiments demonstrate that CoCa reduces inference latency by 23.0% to 45.2% on the VGG, ResNet and AST models with a slight loss of accuracy.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PublisherIEEE Computer Society
Pages72-85
Number of pages14
ISBN (Electronic)9798331536039
DOIs
StatePublished - 2025
Event41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, China
Duration: May 19 2025May 23 2025

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference41st IEEE International Conference on Data Engineering, ICDE 2025
Country/TerritoryChina
CityHong Kong
Period05/19/2505/23/25

Keywords

  • Collaborative Caching
  • Edge Inference
  • Long-tail Distribution
  • Non-IID Data

Fingerprint

Dive into the research topics of 'Many Hands Make Light Work: Accelerating Edge Inference via Multi-Client Collaborative Caching'. Together they form a unique fingerprint.

Cite this