Skip to main navigation Skip to search Skip to main content

MindChat-R0: A Large Language Model for Emotionally Supportive Dialogue through Reinforcement Learning

  • Dong She
  • , Chenxu Zhang
  • , Xianrong Yao
  • , Yang Gao
  • , Zhanpeng Jin
  • South China University of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Emotional Support Conversation (ESC) systems are critical for assisting individuals facing mental health challenges. In this work, we present a reinforcement learning framework to improve ESC systems through structured emotional reasoning. We first collect and clean a dataset of 4,500 real-world support-seeking posts. To guide emotional generation, we introduce Empathetic Chain-of-Thought(ECoT), a structured reasoning format that encourages multi-turn empathy and coherence. Based on this, we train MindChat-R0 (Qwen3-8B as basic model), a Chinese empathetic dialogue agent, using reinforcement learning optimized by ECoT-driven reward signals. LLM-as-a-judge evaluation shows that MindChat achieves the highest average score of 3.863 out of 5.0 across fluency, empathy, and support dimensions (vs. 2.834 for Qwen3-8B-nothink and 2.547 for Qwen3-8B-think). In human preference evaluation, MindChat-R0 also outperforms strong baselines with a win rate of 71.14%, based on pairwise comparisons by human annotators.

Original languageEnglish
Title of host publicationUbiComp Companion 2025 - Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing
EditorsMichael Beigl, Giulio Jacucci, Stephan Sigg, Yu Xiao, Jakob E. Bardram, Eirini Eleni Tsiropoulou, Chenren Xu
PublisherAssociation for Computing Machinery, Inc
Pages1209-1216
Number of pages8
ISBN (Electronic)9798400714771
DOIs
StatePublished - Dec 29 2025
Event2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion 2025 - Espoo, Finland
Duration: Oct 12 2025Oct 16 2025

Publication series

NameUbiComp Companion 2025 - Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing

Conference

Conference2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion 2025
Country/TerritoryFinland
CityEspoo
Period10/12/2510/16/25

Keywords

  • chain-of-thought reasoning
  • emotional support conversation
  • empathetic dialogue systems
  • large language model
  • mental health
  • reinforcement learning

Fingerprint

Dive into the research topics of 'MindChat-R0: A Large Language Model for Emotionally Supportive Dialogue through Reinforcement Learning'. Together they form a unique fingerprint.

Cite this