<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Amirkabir University of Technology</PublisherName>
				<JournalTitle>AUT Journal of Modeling and Simulation</JournalTitle>
				<Issn>2588-2953</Issn>
				<Volume>57</Volume>
				<Issue>2</Issue>
				<PubDate PubStatus="epublish">
					<Year>2025</Year>
					<Month>12</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Distributional Soft Actor-Critic with Adaptive Entropy Regularization: An Extended Theoretical Analysis</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>215</FirstPage>
			<LastPage>228</LastPage>
			<ELocationID EIdType="pii">6047</ELocationID>
			
<ELocationID EIdType="doi">10.22060/miscj.2026.23574.5387</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>Meysam</FirstName>
					<LastName>Fozi</LastName>
<Affiliation>Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>

</Author>
<Author>
					<FirstName>Mohammad Mehdi</FirstName>
					<LastName>Ebadzadeh</LastName>
<Affiliation>Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>
<Identifier Source="ORCID">0000-0001-6466-5229</Identifier>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2024</Year>
					<Month>10</Month>
					<Day>06</Day>
				</PubDate>
			</History>
		<Abstract>In reinforcement learning, one of the key challenges is the overestimation of the Q-value function, which can negatively impact policy performance. This paper introduces an adaptive extension of the distributional soft actor-critic (DSAC) algorithm, designed specifically for continuous control tasks, with the goal of mitigating Q-value overestimation. In addition, the proposed approach addresses the exploration-exploitation trade-off by taking into account model variance. We develop and evaluate four distinct versions of this adaptive extension, each incorporating different entropy regularization techniques: linearly decaying, exponentially decaying, linear adaptive, and exponentially adaptive regularization. These regularization methods are applied during the training process to balance exploration and exploitation more effectively. Our experimental results, conducted on OpenAI’s MuJoCo humanoid control tasks, demonstrate that the exponentially adaptive entropy regularization version of the DSAC algorithm performs significantly better than both the baseline method and the other proposed extensions. This performance improvement highlights the importance of adaptive entropy regularization strategies in reinforcement learning, particularly for tasks requiring fine-tuned control in continuous environments. The findings suggest that the proposed adaptive DSAC algorithm not only enhances learning stability by reducing overestimation but also offers a more efficient solution to the exploration-exploitation dilemma, providing a promising direction for future research in reinforcement learning for continuous control settings.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">Reinforcement Learning</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Continuous Control</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Distributional Soft Actor-Critic (DSAC)</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Entropy Regularization</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Q-Value Overestimation</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://miscj.aut.ac.ir/article_6047_98baeb82b676b662e12a7af8ad9212f6.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
