RA-L · 2023

GCR-PPO for IsaacLab

Abstract

GCR-PPO is a modification of PPO for multi-objective robot RL that: - uses a multi-head critic to obtain per-reward advantages and gradients, - applies priority-aware gradient surgery (PCGrad-style projection) to protect task objectives from regularisers, - runs at massively parallel GPU scale within IsaacLab/RSL-RL.

Repository arXiv