Double Deep Q Network with Adaptive Prioritized Experience Replay

Adibian, Majid; Ebadzadeh, Mohammad Mahdi

doi:10.22060/miscj.2025.23426.5373

Double Deep Q Network with Adaptive Prioritized Experience Replay

Document Type : Research Article

Authors

Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran

10.22060/miscj.2025.23426.5373

Abstract

In deep reinforcement learning, experience replay buffers help break the correlation of sequential data and improve the efficiency of learning from past experiences. Prioritized Experience Replay (PER) enhances this process by selecting transitions based on their temporal difference (TD) error. However, PER does not account for how often a transition has been used or its overall importance. To address this, we introduce an adaptive prioritization method that incorporates three additional transition-level factors: reward, usage count (counter), and policy probability—collectively termed RCP values. Each RCP value is normalized and combined with the TD error to determine the selection probability of transitions from the replay buffer.
We test our approach on several Atari game environments and find that using any of the RCP values individually improves performance over standard PER. To leverage all three RCP components, we evaluate three aggregation strategies: taking the minimum, maximum, or mean of the RCP values. Results indicate that while the best aggregation method varies by environment, the mean function consistently delivers stable performance improvements. This is likely because it balances the influence of all three factors, preventing over-reliance on any single one. Our findings indicate that incorporating RCP values provides a straightforward and effective improvement over conventional prioritization methods in experience replay.

Keywords

Main Subjects

Cognitive Intelligence (CI) and Machine Learning

Article View: 300
PDF Download: 177

Double Deep Q Network with Adaptive Prioritized Experience Replay

Volume 57, Issue 1
June 2025
Pages 53-62

Files

Share

How to cite

Statistics

Double Deep Q Network with Adaptive Prioritized Experience Replay

Volume 57, Issue 1June 2025Pages 53-62

Files

Share

How to cite

Statistics

Volume 57, Issue 1
June 2025
Pages 53-62