Monte Carlo methods are a type of reinforcement learning algorithm that is used to learn the value function (or the action-value function) by using complete episodes (or “rollouts”) to estimate the expected return.
Monte Carlo methods work by sampling the returns observed from each episode and using the samples to update the value function. The value function is updated based on the average return observed from each episode, and the value function converges to the optimal values over time.