← Home

Square Root Scaling Rule

Aug 9, 2025

The square root scaling rule is a widely-used heuristic in deep learning for adjusting the learning rate when changing batch sizes.

When we change the batch size by a factor of $k$, we can scale the learning rate by $\sqrt{k}$.

Example: