Distributional Soft Actor-Critic with Adaptive Entropy Regularization: An Extended Theoretical Analysis

Fozi, Meysam; Ebadzadeh, Mohammad Mehdi

doi:10.22060/miscj.2026.23574.5387

Distributional Soft Actor-Critic with Adaptive Entropy Regularization: An Extended Theoretical Analysis

Document Type : Research Article

Authors

Meysam Fozi

Mohammad Mehdi Ebadzadeh

Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran

10.22060/miscj.2026.23574.5387

Abstract

In reinforcement learning, one of the key challenges is the overestimation of the Q-value function, which can negatively impact policy performance. This paper introduces an adaptive extension of the distributional soft actor-critic (DSAC) algorithm, designed specifically for continuous control tasks, with the goal of mitigating Q-value overestimation. In addition, the proposed approach addresses the exploration-exploitation trade-off by taking into account model variance. We develop and evaluate four distinct versions of this adaptive extension, each incorporating different entropy regularization techniques: linearly decaying, exponentially decaying, linear adaptive, and exponentially adaptive regularization. These regularization methods are applied during the training process to balance exploration and exploitation more effectively. Our experimental results, conducted on OpenAI’s MuJoCo humanoid control tasks, demonstrate that the exponentially adaptive entropy regularization version of the DSAC algorithm performs significantly better than both the baseline method and the other proposed extensions. This performance improvement highlights the importance of adaptive entropy regularization strategies in reinforcement learning, particularly for tasks requiring fine-tuned control in continuous environments. The findings suggest that the proposed adaptive DSAC algorithm not only enhances learning stability by reducing overestimation but also offers a more efficient solution to the exploration-exploitation dilemma, providing a promising direction for future research in reinforcement learning for continuous control settings.

Keywords

Reinforcement Learning

Continuous Control

Distributional Soft Actor-Critic (DSAC)

Entropy Regularization

Q-Value Overestimation

Subjects