# How is p2 calculated for Mahalanobis distances?

The following table of Mahalanobis distances was obtained from an analysis of data with 73 cases. Only the first five rows of the table are shown here.

Observation number | Mahalanobis d-squared | p1 | p2 |

42 | 18.7468824 | .0046132 | .2864768 |

20 | 17.2011378 | .0085718 | .1299040 |

3 | 13.2641516 | .0390278 | .5461262 |

35 | 12.9541160 | .0437704 | .3973690 |

28 | 12.7304279 | .0475222 | .2662369 |

... |
... |
... |
... |

In what follows, I will write d^{2} for d-squared, p_{1}
for p1 and p_{2} for p2.

## The meaning of p_{1} and p_{2}

The first row of the table shows that p_{1} =
.0046132 and p_{2} =
.2864768 for case 42, which is the one case out of 73
cases that is furthest from the centroid in Mahalanobis d^{2}
units. This means that

p_{1} = P(d^{2} for case 42 >
18.7468824) = .0046132

and

p_{2} = P(The largest d^{2} >
18.7468824) = .2864768

## Calculating p_{2} for the case with the largest d^{2}

Here is how p_{2} was calculated for the case furthest
from the centroid:

p_{2} = P(The largest d^{2} >
18.7468824)

= 1 - P(The largest d^{2} <=
18.7468824)

= 1 - P(All 73 d^{2} values are <=
18.7468824)

= 1 - (1 -
.0046132)^{73} = 0.28648

## Calculating p_{2} for the case with the second largest d^{2}

p_{2} for the case that is second-furthest from the centroid
(the case in the second row of the table) was
calculated as follows.

p_{2} = P(The second-largest d^{2} >
17.2011378)

= 1 - P(The second-largest d^{2} <=
17.2011378)

= 1 - P(exactly 72 or 73 cases have d^{2} <=
17.2011378)

= 1 - P(exactly 72 cases have d^{2} <=
17.2011378)

- P(exactly 73
cases have d^{2} <=
17.2011378)

= 1 - _{73}C_{72}(1 - .0085718)^{72}(.0085718)^{1}
- _{73}C_{73}(1 - .0085718)^{73}(.0085718)^{0}

= .12990

where _{N}C_{k} is the number of subsets of k
objects in a set of N objects.

## Calculating p_{2} for the case with the k-th largest d^{2}

In general, for the case that is k-th furthest from the centroid
(meaning that there are k-1 cases further from the centroid), p_{2} is
calculated by first evaluating p_{1} for that case and then
calculating

p_{2} = 1 - _{N}C_{N-0}(1-p_{1})^{N}(p_{1})^{0}

- _{N}C_{N-1}(1-p_{1})^{N-1}(p_{1})^{1}

- _{N}C_{N-2}(1-p_{1})^{N-2}(p_{1})^{2}

...

- _{N}C_{N-k+1}(1-p_{1})^{N-k+1}(p_{1})^{k-1}

where N is the number of cases.