FF's Notes
← Home

Square Root Scaling Rule

Aug 9, 2025

The square root scaling rule is a widely-used heuristic in deep learning for adjusting the learning rate when changing batch sizes.

When we change the batch size by a factor of $k$, we can scale the learning rate by $\sqrt{k}$.

Example:

  • original: $bs=256$, and $lr=1e-3$
  • new: $bs=1024$
  • new: $lr=2e-3$