Sleep & Wellness Guide
Regularized Reward-Punishment Reinforcement Learning
Key Takeaway
A robotics research paper on Regularized Reward-Punishment Reinforcement Learning.
Practical Tips
Practical tips and how-to guidance will be added by our editorial team.
中文解读
中文解读待补充:本站将优先为睡眠改善、失眠治疗、助眠方法等高价值文章补充中文说明。
Article Summary
We propose KL-Coupled Policy Regularization (KCPR), a policy coordination framework for Reward-Punishment Reinforcement Learning (RPRL). Based on KCPR, we derive KL-Coupled Soft Optimality (KCSO) and develop its deep realization, klDMP. Unlike existing RPRL approaches that optimize reward-seeking and punishment-related policies largely independently, KCPR enables direct interactions between companion policies by treating each as a dynamically learned prior for the other. KCSO yields coupled soft-optimal policies and KL-regularized Bellman operators, allowing reward and punishment information to jointly influence value propagation. To improve learning stability, we introduce a companion-prior softening mechanism and evaluate separate replay-buffer designs for balancing reward- and punishment-related experience. Experiments in grid-world and Gazebo robotic navigation tasks demonstrate that klDMP improves safety and learning stability while maintaining competitive task performance compared with DQN, SQL and softDMP. These results suggest that policy-level coordination provides an effective mechanism for integrating multiple behavioral objectives and may serve as a useful design principle for reinforcement learning systems with interacting motivational processes.
Sources & References
Need to track a shipment?
Use our free logistics tracking tool to check real-time delivery status for USPS, FedEx, UPS, DHL, Amazon and 1000+ carriers worldwide.
Track a Package Now
Comments