Policy-Based

Policy-based algorithms learn the optimal policy directly by parameterizing the policy and updating the policy parameters based on the observed (or expected) rewards.