Model-free RL learns the optimal policy without . Model-free algorithms learn the optimal policy directly (wihout using a model of the environment) from the interactions with the environment, using methods such as temporal difference learning or policy gradient optimization. They are simpler and easier to implement than model-based algorithms.