How is p2 calculated for Mahalanobis distances?
The following table of Mahalanobis distances was obtained from an analysis of data with 73 cases. Only the first five rows of the table are shown here.
Observation number | Mahalanobis d-squared | p1 | p2 |
42 | 18.7468824 | .0046132 | .2864768 |
20 | 17.2011378 | .0085718 | .1299040 |
3 | 13.2641516 | .0390278 | .5461262 |
35 | 12.9541160 | .0437704 | .3973690 |
28 | 12.7304279 | .0475222 | .2662369 |
... | ... | ... | ... |
In what follows, I will write d2 for d-squared, p1 for p1 and p2 for p2.
The meaning of p1 and p2
The first row of the table shows that p1 = .0046132 and p2 = .2864768 for case 42, which is the one case out of 73 cases that is furthest from the centroid in Mahalanobis d2 units. This means that
p1 = P(d2 for case 42 > 18.7468824) = .0046132
and
p2 = P(The largest d2 > 18.7468824) = .2864768
Calculating p2 for the case with the largest d2
Here is how p2 was calculated for the case furthest from the centroid:
p2 = P(The largest d2 > 18.7468824)
= 1 - P(The largest d2 <= 18.7468824)
= 1 - P(All 73 d2 values are <= 18.7468824)
= 1 - (1 - .0046132)73 = 0.28648
Calculating p2 for the case with the second largest d2
p2 for the case that is second-furthest from the centroid (the case in the second row of the table) was calculated as follows.
p2 = P(The second-largest d2 > 17.2011378)
= 1 - P(The second-largest d2 <= 17.2011378)
= 1 - P(exactly 72 or 73 cases have d2 <= 17.2011378)
= 1 - P(exactly 72 cases have d2 <= 17.2011378)
- P(exactly 73 cases have d2 <= 17.2011378)
= 1 - 73C72(1 - .0085718)72(.0085718)1 - 73C73(1 - .0085718)73(.0085718)0
= .12990
where NCk is the number of subsets of k objects in a set of N objects.
Calculating p2 for the case with the k-th largest d2
In general, for the case that is k-th furthest from the centroid (meaning that there are k-1 cases further from the centroid), p2 is calculated by first evaluating p1 for that case and then calculating
p2 = 1 - NCN-0(1-p1)N(p1)0
- NCN-1(1-p1)N-1(p1)1
- NCN-2(1-p1)N-2(p1)2
...
- NCN-k+1(1-p1)N-k+1(p1)k-1
where N is the number of cases.