G Reinforcement Learning Reinforcement Learning Policy Optimization Policy Optimization Reinforcement Learning->Policy Optimization seeks Value-Based Value-Based Policy Optimization->Value-Based has Policy-Based Policy-Based Policy Optimization->Policy-Based has Actor-Critic Actor-Critic Policy Optimization->Actor-Critic has Dynamic Programming Dynamic Programming Value-Based->Dynamic Programming has TD-lambda TD-lambda Value-Based->TD-lambda has Gradient-Based Gradient-Based Policy-Based->Gradient-Based has Sampling-Based Sampling-Based Policy-Based->Sampling-Based has Actor-Critic->Value-Based blends Actor-Critic->Policy-Based blends PILCO PILCO Gradient-Based->PILCO has LQR LQR Gradient-Based->LQR has Policy Gradient Policy Gradient Gradient-Based->Policy Gradient has CEM CEM Sampling-Based->CEM has NES NES Sampling-Based->NES has Model-Based Model-Based RHC/MPC RHC/MPC Model-Based->RHC/MPC with Model-Free Model-Free Value Iteration Value Iteration Dynamic Programming->Value Iteration has Policy Iteration Policy Iteration Dynamic Programming->Policy Iteration has Bellman Optimality Equation Bellman Optimality Equation Dynamic Programming->Bellman Optimality Equation solves REINFORCE REINFORCE Policy Gradient->REINFORCE has TRPO TRPO REINFORCE->TRPO derive PPO PPO REINFORCE->PPO derive TRPO->PPO vs SARSA SARSA On-Policy On-Policy SARSA->On-Policy is Q-Learning Q-Learning Q-Learning->SARSA vs Off-Policy Off-Policy Q-Learning->Off-Policy is Monte Carlo Monte Carlo Temporal Difference Temporal Difference Temporal Difference->SARSA has Temporal Difference->Q-Learning has TD-lambda->Monte Carlo blends TD-lambda->Temporal Difference blends