In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the “amount of information” (in units such as shannons (bits), nats or hartleys) obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected “amount of information” held in a random variable.

The mutual information can be defined as:

\begin{equation*} I(X;Y) = D_{KL}( P_{X,Y} || P_{X} P_{Y} ) \end{equation*}

where is the KL Divergence.

If two variables are independent, , which means we can not obtain any information about by observing , and currently the KL divergence is 0. Thus mutual information is actually the distance between and .

Specifically, For discrete distributions:

For continuous distributions:

The relation between mutual information and Entropy and Conditional Entropy is: