Value-Based

Value-based algorithms learn the optimal policy by estimating the value function (or the action-value function) and selecting the action with the highest value in each state.