The Fundamental Theorem of Markov Chains (Proof, Intuition + Uniqueness)
Explore the Fundamental Theorem of Markov Chains with a rigorous proof of stationary distribution existence and uniqueness using linear algebra and convex combinations.
References: Foundations of Data Science by A.Blum, J.Hopcroft, and S.Saravanan
This article explains the fundamental theorem of Markov chains, including the existence and uniqueness of stationary distributions. We walk through the formal proof using linear algebra concepts such as rank, null space, and convex combinations.
Stationary distribution
Let be the probability distribution at time . Then
The long-term average probability distribution is given by
The fundamental Theorem of Markov Chains
Let be a regular transition matrix of a finite state Markov chain. Then there exists a unique stationary distribution such that
Before moving to the theorem's proof, we shall discuss an important lemma.
Let be the transition probability matrix for a connected Markov chain, then the matrix obtained by augmenting the matrix with an additional column of ones has rank .
Proof:
We shall approach the proof by contradiction.
By rank nullity theorem, if the rank of the matrix was less than , say , then the null space of would be atleast of 2 dimensions.
By rank-nullity theorem,
since the rank of is , and the no. of columns in is
This means that there are atleast 2 vectors in the null space of . Let us find those vectors. We can obviously say that the vector is in the null space of since the rows of the matrix sums to 1 as it is a probability vector, and the in the matrix is used to subtract 1 from each row, which makes the row-sum of to be 0, and the last column is 1. So, the product of and is 0.
Since we are dones with one vector which belongs to the nullspace of the matrix , we shall find the other vector. Let us say it is perpendicular to the vector , which we used earlier. We do this to show that there exist 2 independent vectors, that let you span atleast 2 dimensions, which was our assumption. Let us say it is .
We definitely know that there will be some negative values in the vector since if that is perpendicular to the vector then the dot product is 0, which will happen only if there are some negative values in the vector .
if you multiply the matrix with the vector we get the following:
the extra comes from the part where the diagonal matrix is subtracted from the matrix . From this, we can see that each is the convex combination of the values in the i-th row of the matrix plus .
Convex sum of some objects is defined as, If there are objects , then their convex combination is given by
where for all and .
and here the s are none other than the values in the i-th row of the matrix .
Let be the set of all indices for which is having the maximum values. Let be the set containing the minimal values. Connectedness of a markov chain implies that some of maximum values is adjacent to some of lower value. From this, we can say that
This is because the maximal value can never be less than or equal to the average (weighted sum / convex sum) of the nearby values.
This leads us to being greater than 0 to be
If we turn the tables and do the same for the minimal values and see then it will be
This leads us to being less than 0.
This is a contradiction due to our assumption of having 2 independent vectors in the null space of , as we assumed the nullspace of the matrix to be of 2 dimensions as we assumed the rank of to be .
Hence, we can say that the rank of the matrix is .
Let us come back to the Theorem,
Let be a regular transition matrix of a finite state Markov chain. Then there exists a unique stationary distribution such that . Moreover, for any starting distribution, .
Here we need to note that is the long-term average probability distribution, not the probability distribution at time .
Let us run one step of the Markov chain starting with distribution : the distriburiton after the step is . Let us calculate the change in probabilities due to this step.
Thus is the difference between the average distribution after steps and the average distribution after steps. It satisfies for all as , due to the triangular law of vector addtion.
By lemma 1.1, we know that the rank of the matrix is . Let us also give a closed form solution for . Let us take the matrix and remove the first column to make a submatrix of size . Let us call it and it is invertible because, if it was not invertible then the rank of would be less than , which contradicts our lemma 1.1. Let be obtained by removing the first entry of . Since , we have
Then
establishing the theorem with .