Gradient-Based

To optimize the policy, gradient-based algorithms compute the gradients of the expected return with respect to the policy parameters and use these gradients to update the policy iteratively.