Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran
10.22060/miscj.2026.23574.5387
Abstract
In reinforcement learning, one of the key challenges is the overestimation of the Q-value function, which can negatively impact policy performance. This paper introduces an adaptive extension of the distributional soft actor-critic (DSAC) algorithm, designed specifically for continuous control tasks, with the goal of mitigating Q-value overestimation. In addition, the proposed approach addresses the exploration-exploitation trade-off by taking into account model variance. We develop and evaluate four distinct versions of this adaptive extension, each incorporating different entropy regularization techniques: linearly decaying, exponentially decaying, linear adaptive, and exponentially adaptive regularization. These regularization methods are applied during the training process to balance exploration and exploitation more effectively. Our experimental results, conducted on OpenAI’s MuJoCo humanoid control tasks, demonstrate that the exponentially adaptive entropy regularization version of the DSAC algorithm performs significantly better than both the baseline method and the other proposed extensions. This performance improvement highlights the importance of adaptive entropy regularization strategies in reinforcement learning, particularly for tasks requiring fine-tuned control in continuous environments. The findings suggest that the proposed adaptive DSAC algorithm not only enhances learning stability by reducing overestimation but also offers a more efficient solution to the exploration-exploitation dilemma, providing a promising direction for future research in reinforcement learning for continuous control settings.
Fozi,M. and Ebadzadeh,M. M. (2026). Distributional Soft Actor-Critic with Adaptive Entropy Regularization. (e6047). AUT Journal of Modeling and Simulation, (), e6047 doi: 10.22060/miscj.2026.23574.5387
MLA
Fozi,M. , and Ebadzadeh,M. M. . "Distributional Soft Actor-Critic with Adaptive Entropy Regularization" .e6047 , AUT Journal of Modeling and Simulation, , , 2026, e6047. doi: 10.22060/miscj.2026.23574.5387
HARVARD
Fozi M., Ebadzadeh M. M. (2026). 'Distributional Soft Actor-Critic with Adaptive Entropy Regularization', AUT Journal of Modeling and Simulation, (), e6047. doi: 10.22060/miscj.2026.23574.5387
CHICAGO
M. Fozi and M. M. Ebadzadeh, "Distributional Soft Actor-Critic with Adaptive Entropy Regularization," AUT Journal of Modeling and Simulation, (2026): e6047, doi: 10.22060/miscj.2026.23574.5387
VANCOUVER
Fozi M., Ebadzadeh M. M. Distributional Soft Actor-Critic with Adaptive Entropy Regularization. AUT J Model Simul, 2026; (): e6047. doi: 10.22060/miscj.2026.23574.5387