Model-based RL learns the optimal policy by using a model of the environment. A model is a mathematical representation of the environment that allows the agent to predict the next state and reward given the current state and action. Model-based algorithms use this model to plan the optimal action sequence and update the policy based on the predicted returns.