How it works: why use KL divergence as policy constrait? An information theory perspective.

less than 1 minute read

Published: April 25, 2025

KL divergence has been long used as a policy constrait in the field of reinforcement learning (RL). For example, in online RL, where agents interacts with the environment to update its policy, KL divergence is adopted to limit the search steps. Actually, KL divergence are so widely in the RL that it has become the golden standard. However, it sounds magical to me: why we adopt KL divergence as the constrait of policies?

KL divergence are first introduced by

Definition of KL divergence

Why KL divergence is useful in policy constrait?

Motivation of policy constrait

Futher thoughts on KL divergence

Share on

Twitter Facebook LinkedIn

Huaze Tang

How it works: why use KL divergence as policy constrait? An information theory perspective.

Definition of KL divergence

Why KL divergence is useful in policy constrait?

Motivation of policy constrait

Futher thoughts on KL divergence

Share on

You May Also Enjoy

Future Blog Post

Huaze Tang

KL Divergence and Related Meaning in Informative Perspective

Definition of KL divergence

Why KL divergence is useful in policy constrait?

Motivation of policy constrait

Futher thoughts on KL divergence

Share on

You May Also Enjoy

Future Blog Post