The Method of Types

1. The Method of Types

The method of types refers to a set of combinatorial arguments used to establish Large Deviation Principles (LDP) for i.i.d. random variables taking values in a finite alphabet .

To demonstrate the argument, we consider i.i.d. random variables taking values in . Let (which can be identified as a subset of ) denote the space of all probability measures (laws) on . Assuming the law of each is , we denote by the joint law of on the product space .

Definition 1.1. The empirical measure of a finite sequence is the measure in that records the frequency of occurrences of each ; namely, with

𝟙

Below are some related notions:

(Support) for a measure .
(Empirical distribution) .
(Types) for .
(Variational distance) For , .

We have the following fundamental estimates of the size of and .

Lemma 1.2.

Proof. (1) It follows from a combinatorial argument.

(2) Note that

where the latter set has cardinality

as desired.⁠

In the following, we estimate the large deviation probabilities.

Definition 1.3 (entropy). Let .

The entropy of is
The relative entropy (or Kullback–Leibler divergence) of with respect to is

We adopt the convention that and .

Remark 1.4. Let be endowed with the variational distance. Then,

is compact.
is continuous on
is continuous on the compact set .

Lemma 1.5. If , then the following hold.

Proof. (1) The case is trivial, as and . Assuming , we have by definition

(2) The second inequality follows immediately from

To prove the first, we claim that if , then

Indeed, the last expression consists of terms of the form where the inequality is validated using the fact :

To prove the desired inequality, observe that

The proof is concluded by applying Lemma 1.2.

(3) Combining (1) and (2), we arrive at the estimate

whereas the other inequality follows from the lower bound on .⁠

Theorem 1.6 (Sanov). For every set of probability measures,

Proof. By Lemma 1.5, we obtain the upper bound:

which yields

To prove the lower bound, we begin with an analogous estimate

Since , for every with , one can find for every sufficiently large some such that . By the continuity of the relative entropy on the restricted domain,

The theorem follows since is arbitrary as long as , and whenever .⁠

As an application of Sanov’s theorem, we prove a version of Cramér’s theorem concerning Large Deviations for the empirical mean. Let be a function. We identify with the vector so that the empirical average can be written as the inner product . Sanov’s theorem then implies the following result.

Theorem 1.7 (Cramér). For any set ,

where . The rate function is continuous on the interval and satisfies

where

Proof. We provide two proofs of the theorem.

(1) The formula

follows from that and that is a continuous function. Moreover, in view of the formula, is clearly continuous on its effective domain . We note that the optimization problem here is a classical problem in convex optimization where strong duality holds (see, for example, [1, Section 5]); specifically, one can swap and without changing the value of the expression:

The inner minimization can then be solved using Gibbs inequality (or Jensen’s inequality applied to ), yielding

This establishes the Fenchel–Legendre transform representation.

(2) Alternatively, this result is a special case of Cramér’s theorem for real-valued random variables, and its rate function , a fortiori, is the Fenchel–Legendre transform of the logarithmic moment generating function

It remains to show that , for, once established, the continuity of follows from that of immediately. To this end, it suffices to show that , since by Gibbs’ inequality,

for all and , implying .

Recall that is convex and differentiable in the interior of its effective domain. By asymptotic analysis of the derivative , for every , there exists such that

Choosing , we have and

as desired. For the boundary point (the case for is similar), we have that

and, through direct evaluation, that

Combining these results proves . Finally, for , one can show . This concludes the proof.⁠

References

[1]Boyd, Stephen, and Vandenberghe, Lieven, Convex Optimization, Cambridge University Press, 2004. ↩