The key step in the Evans et al. proof (discussed in the previous post) is to construct a cross-embedded experiment with two ancillary statistics. In this post I explain how that construction works.

Let’s start with a simple example (from Evans et al. 1985, p. 3) to illustrate the fact that an experiment *can* have two ancillary statistics. Consider an experiment whose sampling distribution is represented by the 2x2 table below:

relative to the normalizing constant 2. The variable *y* is ancillary with respect to *x*: the unconditional probability distribution of *y* is Bernoulli with Pr(y=1)=1/2 independent of θ. Conditioning on *y* changes the probability distribution of *x* from Bernoulli with Pr(x=1)=1/2 to Bernoulli with either Pr(x=1|y=1)=θ or Pr(x=1|y=0)=1-θ depending on the value of y. The 2x2 table is symmetric with respect to interchange between *x* and *y*, so exactly the same holds true interchanging *x* and *y*. Thus, *y* is ancillary with respect to *x*, and *x* is ancillary with respect to *y*.

Note that this experiment is mathematically equivalent to a mixture experiment that involves first observing *x* to decide whether to perform the experiment represented by column 1 (renormalized) or the experiment represented by column 2 (renormalized) and then performing that experiment. It is also mathematically equivalent to a mixture experiment that involves first observing *y* to decide whether to perform the experiment represented by row 1 (renormalized) or the experiment represented by row 2 (renormalized) and then performing that experiment. However, a physical instantiation of this experiment can’t actually be both types of mixture experiment simultaneously. If one were to restrict (C) so that it applied only to genuine mixture experiments and not to experiments that are mathematically equivalent to mixture experiments, the Evans et al. proof would not succeed. Whether this restriction can be made precise and defended remains to be seen.

After constructing the above example, Evans et al. consider scaling down the probabilities in the table and reallocating the deleted probability in a θ-free way. Such a modification does not affect the ancillarity of x and y. For instance, consider multiplying each cell probability by 2c/(1+c), where 0≤c≤1, and then inserting the deleted 1-2c/(1+c) probability mass into the lower left cell. The following table results:

y\x | 1 | 0 |

1 | cθ | c-cθ |

0 | 1-cθ | cθ |

relative to the normalizing constant 1+c. Again, the variable y is ancillary with respect to x and vice versa, because each has an unconditional distribution that is independent of θ: *y* is Bernoulli with Pr(y=1)=c/(1+c), and *x* is Bernoulli with Pr(x=1)=1/(1+c).

Consider the result (x,y)=(1,1) from the experiment above. According to (C), this result is evidentially equivalent to the result x=1 from the conditional experiment given by y=1, which is Bernoulli(θ), and to the result y=1 from the conditional experiment given by x=1, which is Bernoulli(cθ) for arbitrary 0≤c≤1.

I’m not sure how to interpret the consequence that outcome y=1 from Bernoulli(θ) is evidentially equivalent to outcome x=1 from Bernoulli(cθ). I suppose it would apply if one had a choice between flipping Coin A and Coin B, and all one knew about the biases of the two coins was that the bias of Coin B was some particular fraction that of Coin A. The result purports to show that a flip of Coin A that lands heads tells you the same thing about the bias of Coin A as a flip of Coin B that lands heads. Intuitively, this result strikes me as wrong. Make the fraction very, very small: suppose we know that the bias of Coin B is one trillionth that of Coin A. It seems that a head on a flip of Coin B would provide very strong evidence that the bias of Coin A is very close to one, while a head on Coin A would not be so telling. If I’m interpreting correctly the claim that outcome y=1 Bernoulli(θ) is evidentially equivalent to outcome x=1 from Bernoulli(cθ), and my intuitions in this case are sound, then this example could provide an argument against (C). Unfortunately, Evans et al. seem to have discussed this result only in an unpublished manuscript.

Evans et al. next move beyond the binary case to construct a more general “discrete embedding model.” They start with an experiment with probability function f(x; θ). They then “cross” f(x; θ) with Bernoullis to yield the following joint distribution:

y\x | 1 | 2 | … |

1 | *f*(1; θ)
| *f*(2; θ)
| … |

0 | g(1) - f(1; θ) | g(2) - f(2; θ) | … |

relative to the normalizing constant G=Σg(x). Again, *x* and *y* are mutually ancillary: *x* has unconditional probability distribution Pr(x=i)=g(i)/G, while *y* has unconditional distribution Bernoulli(1/g(i)).

Evans et al. also consider a continuous embedding model, but that need not concern us here; continuous models are an idealization, so issues that arise only for continuous models are not relevant for practice.

Evans et al. develop a modified version of the discrete embedding model to prove that (C) entails (L). In my next post, I will discuss that model and the Evans et al. proof.