Memory Optimization Tricks
Save tensors with 16 bit
Gradient checkpointing
Saves memory by recomputing intermediate activations during backprop instead of storing them
Saves memory by recomputing intermediate activations during backprop instead of storing them