#reinforcement-learning

Supervised learning and reinforcement learning are the same objective

Both fit a distribution over outputs conditioned on an input. Both minimize a KL divergence between their model and an optimal target. The only differences are which distribution you sample from and which direction of the KL. Entropy regularization bridges them.

April 19, 2026 Read →