6
$\begingroup$

I am trying to learn more about Probability Generating Functions. Here is my basic understanding:

For a discrete random variable $X$, the probability generating function $G_X(s)$:

$$ G_X(s) = E(s^X) = \sum_{x=0}^{\infty} s^x P(X = x) $$

  • $E(s^X)$ is the expected value of $s^X$
  • $P(X = x)$ is the probability that the random variable $X$ equals $x$

It seems to me that the Probability Generator Function is similar to a Moment Generating Function, just that one is for the discrete case (PGF) and one is the for the continuous case (MGF). The derivatives of the PGF can be used to describe the mean and variance of the original random variable:

$$ G'_X(1) = E[X] $$ $$ G''_X(1) = E[X(X-1)] $$ $$ P(X=k) = \frac{1}{k!} \frac{d^k G_X(s)}{ds^k} \Bigg|_{s=0} $$

This leads me to my question: Why are Probability Generator Functions useful in the real world? In what kinds of problems/applications can PGF's be helpful? Why would life be difficult without PGF's?

Doing some reading into this, I came across some materials where they indicate that PGF's are useful for finding out the properties of the distributions for functions of random variables. I tried to create an example to test my understanding:

Suppose we have:

  • X1: A regular six-sided die (Die 1), where each face {1, 2, 3, 4, 5, 6} has an equal probability of 1/6.
  • X2: An irregular six-sided die (Die 2), where the faces have different probabilities: Probabilities {1/12, 1/12, 1/12, 1/12, 1/6, 1/2} for faces {1, 2, 3, 4, 5, 6} .

The (PGF) for each of these die is given by:

$$ G_X(s) = \sum_{x=1}^{6} s^x P(X = x) $$

$$ G_{X1}(s) = \frac{1}{6}s + \frac{1}{6}s^2 + \frac{1}{6}s^3 + \frac{1}{6}s^4 + \frac{1}{6}s^5 + \frac{1}{6}s^6 $$

$$ G_{X2}(s) = \frac{1}{12}s + \frac{1}{12}s^2 + \frac{1}{12}s^3 + \frac{1}{12}s^4 + \frac{1}{6}s^5 + \frac{1}{2}s^6 $$

If we want to find the distribution of the sum of the two dice (assuming that there is no correlation between them and both are independently and identically distributed iid ), we can use the fact that the PGF of the sum of two independent random variables is the product of their individual PGFs. Without PGF's, it seems like this problem would be much harder and involve manually enumerating all outcomes and their probabilities.

If I understand this correctly, if we define $Y = X1 + X2$, then:

$$ G_Y(s) = G_{X1}(s) * G_{X2}(s) $$ $$ G_Y(s) = \left(\frac{1}{6}s + \frac{1}{6}s^2 + \frac{1}{6}s^3 + \frac{1}{6}s^4 + \frac{1}{6}s^5 + \frac{1}{6}s^6\right) * \left(\frac{1}{12}s + \frac{1}{12}s^2 + \frac{1}{12}s^3 + \frac{1}{12}s^4 + \frac{1}{6}s^5 + \frac{1}{2}s^6\right) $$

Suppose I want to find out the probability of rolling a 3. As I understand, if I expand this multiplication and take the coefficient with the 3rd power - this should correspond to the probability of rolling a 3.

I used symbolic multiplication in R to do this:

    # https://stackoverflow.com/questions/39979884/calculate-the-product-of-two-polynomial-in-r
    library(polynom)
    
    GX1 <- polynomial(c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6))
    GX2 <- polynomial(c(0, 1/12, 1/12, 1/12, 1/12, 1/6, 1/2))
    
    result <- GX1 * GX2
    
    print(result)
0.01388889*x^2 + 0.02777778*x^3 + 0.04166667*x^4 + 0.05555556*x^5 + 0.08333333*x^6 + 0.1666667*x^7 + 0.1527778*x^8 + 0.1388889*x^9 + 0.125*x^10 +  
0.1111111*x^11 + 0.08333333*x^12 

Therefore, it seems like there is a 0.027 probability of rolling a 3 in this situation.

Did I get the right idea? Is this the main application of Probability Generator Functions?

  • PS: I tried to confirm this result using a simulation in R

It looks like I got the same number:

dice1 <- c(1, 2, 3, 4, 5, 6)
dice2 <- c(1, 2, 3, 4, 5, 6)

# Define the probabilities
prob1 <- rep(1/6, 6) # equal probabilities for dice1
prob2 <- c(1/12, 1/12, 1/12, 1/12, 1/6, 1/2) # different probabilities for dice2

sums <- numeric(1000000)

for (i in 1:1000000) {
    roll1 <- sample(dice1, size = 1, prob = prob1)
    roll2 <- sample(dice2, size = 1, prob = prob2)
    sums[i] <- roll1 + roll2
}

relative_percent <- table(sums) / length(sums)

result_df <- data.frame('Sum' = as.numeric(names(relative_percent)), 
                        'Relative_Percent' = as.numeric(relative_percent))

print(result_df, row.names = FALSE)


    


 Sum Relative_Percent
   2         0.013672
   3         0.027447
   4         0.041687
   5         0.055431
   6         0.083465
   7         0.166814
   8         0.152676
   9         0.137949
  10         0.125130
  11         0.111553
  12         0.084176
$\endgroup$
1
  • 1
    $\begingroup$ Many discrete distributions also have Moment Generating Functions. All distributions have Characteristic Functions. Which you find useful in what circumstances will depend on the particular question. Your example is indeed one particular case. $\endgroup$
    – Henry
    Commented Jul 6 at 7:34

2 Answers 2

6
$\begingroup$

As you noticed, the characteristic, moment generating and probability generating functions are related to each others, in that they represent different faces of the same coin. Indeed, they correspond respectively the Fourier, Laplace and Mellin transforms of the probability mass/density function, which are themselves related through changes of variable.

As always in Mathematics, they constitute available tools, which you may take advantage of, whenever the context requires them. It is to be highlighted that they are also unique to each distribution, in such a way they serve as a criterion allowing us to distinguish/recognize the distributions behind a random variable.

Nonetheless, their main purpose is very down-to-earth in the end : they permit to simplify some computations. For example, you have already computed moments of a random variable most certainly, the computations are more or less lengthy when dealing with mean or variance, but they can complexify for higher moments. That is why the probability generating function provides a quicker way to compute them through differentiation, because all the moments are contained and calculated once for all inside the said function, in a way.

Another well-known example is given by linear combinations of random variables. In that case, the probability mass/density function will be the ($n$-fold) convolution product of the individual probability functions, which may turn out to be quite hard to handle. However, thanks to the convolution theorems associated to Fourier/Laplace/Mellin transforms, the linear combination is described simply by the usual product of their characteristic / moment generating / probability generating function respectively. For example, the probability generating function of $X = \lambda_1X_1 + \ldots + \lambda_nX_n$, with $X_1,\ldots,X_n$ independent, is given by $G_X(z) = G_{X_1}\left(z^{\lambda_1}\right) \cdots G_{X_n}\left(z^{\lambda_n}\right)$. When the number of random variables is itself a random variable $N$ (cf. compound processes) and the $X_i \sim X$ are iid, we have even : $G_{X_1+\ldots+X_N}(z) = G_N(G_X(z))$.

Finally, it is to be noted that some distributions involve so nasty expressions that they are even devoid of a closed-form probability mass/density function; in consequence, the characteristic/generating function is the only way to describe them.

$\endgroup$
5
  • $\begingroup$ thank you for this answer! I had a question about something you wrote: $\endgroup$
    – konofoso
    Commented Jul 6 at 17:22
  • $\begingroup$ "some distributions involve so nasty expressions that they are even devoid of a closed-form probability mass/density function; in consequence, the characteristic/generating function is the only way to describe them." $\endgroup$
    – konofoso
    Commented Jul 6 at 17:22
  • $\begingroup$ Can you please give some examples of this if you have time? thank you so much! $\endgroup$
    – konofoso
    Commented Jul 6 at 17:23
  • 2
    $\begingroup$ @konofoso For example, it occurs when describing stable distributions (see en.wikipedia.org/wiki/…. or en.wikipedia.org/wiki/Stable_distribution), whose linear combinations follow the same distribution. Another common example is given by random variable with divergent moments, such as Cauchy distribution (cf. en.wikipedia.org/wiki/Cauchy_distribution). $\endgroup$
    – Abezhiko
    Commented Jul 6 at 20:16
  • $\begingroup$ Thank you so much for your answer! Do you know if it possible to go backwards from a PGF to a PDF? math.stackexchange.com/questions/4942630/… $\endgroup$
    – konofoso
    Commented Jul 7 at 2:00
4
$\begingroup$

I'd say the importance of the PGF boils down to two things.

  1. Some formulas are simpler to write with the PGF than other transforms.
  2. The PGF of a nonnegative integer random variable has a nice probabilistic interpretation.

Here's a bit more about each.

Nicer formulas

Let $N$ be a nonnegative integer random variable, let $X_1, X_2, \dots \sim X $ be i.i.d. random variables (not necessarily integers), and let $S = \sum_{i=1}^N X_i$. Then $$ G_S(z) = G_N(G_X(z)). $$

If we let $M_X(\theta) = G_X(e^\theta)$ be the MGF, then we also have $$ M_S(\theta) = G_N(M_X(\theta)). $$ Notice that we still have a $G_N$. If we wanted to use $M_N$, we'd get the slightly messier $$ M_S(\theta) = M_N(\log M_X(\theta)). $$

Probabilistic interpretation

Again let $N$ be a nonnegative integer random variable. Suppose also that $z\in [0, 1]$. Then $G_N(z)$ is the probability that a weighted coin with heads probability $z$—call this a $z$-coin—will come up heads when flipped $N$ times.

This intuition lets one quickly compute some PGFs. For instance, suppose $N \sim \operatorname{Geometric}(p)$ (with support $1, 2, 3, \dots$), i.e. $N$ is the number of flips of a $p$-coin until a heads occurs. Then we can compute $G_N(z)$ by thinking about flipping a $z$-coin before each $p$-coin.

  • If the $z$-coin is tails, then we know that no matter how big $N$ is, we'll have at least one $z$-coin tails. So let's stop flipping and call this stopping with failure.
  • If the $z$-coin is heads, then flip the next $p$-coin.
    • If the $p$-coin is tails, then we move on to the next flip. Call this continuing.
    • If the $p$-coin is heads, then we've done $N$ total $p$-coin flips, and each had an accompanying $z$-coin heads. Call this stopping with success.

The process will eventually stop, and $G_N(z)$ is the probability of eventually stopping with success. But this is the same as an individual round stopping with success given that it didn't continue, so $$ G_N(z) = \frac{p z}{1 - (1-p) z}. $$

Exercise. Explain the $G_S(z) = G_N(G_X(z))$ formula using the $z$-coin intuition. (You can assume $z\in [0, 1]$ and that $X$ is nonnegative integer.)

$\endgroup$
1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .