Causal Attention
Causal attention is a type of self-attention mechanism that only allows a model to consider tokens that appear at or before the current position in a sequence, preventing it from "seeing" or attending to future tokens.
This is crucial for autoregressive tasks like language modeling, where the model must predict the next token based only on the preceding ones, similar to how a human reads a sentence. It is often referred to as "masked attention" because future positions are masked out during the calculation of attention scores.