﻿ How is p2 calculated for Mahalanobis distances?

How is p2 calculated for Mahalanobis distances?

The following table of Mahalanobis distances was obtained from an analysis of data with 73 cases. Only the first five rows of the table are shown here.

 Observation number Mahalanobis d-squared p1 p2 42 18.7468824 .0046132 .2864768 20 17.2011378 .0085718 .1299040 3 13.2641516 .0390278 .5461262 35 12.9541160 .0437704 .3973690 28 12.7304279 .0475222 .2662369 ... ... ... ...

In what follows, I will write d2 for d-squared, p1 for p1 and p2 for p2.

The meaning of p1 and p2

The first row of the table shows that p1 = .0046132 and p2 = .2864768 for case 42, which is the one case out of 73 cases that is furthest from the centroid in Mahalanobis d2 units. This means that

p1 = P(d2 for case 42 > 18.7468824) = .0046132

and

p2 = P(The largest d2 > 18.7468824) = .2864768

Calculating p2 for the case with the largest d2

Here is how p2 was calculated for the case furthest from the centroid:

p2 = P(The largest d2 > 18.7468824)

= 1 -  P(The largest d2 <= 18.7468824)

= 1 -  P(All 73 d2 values are <= 18.7468824)

= 1 - (1 - .0046132)73 = 0.28648

Calculating p2 for the case with the second largest d2

p2 for the case that is second-furthest from the centroid (the case in the second row of the table) was calculated as follows.

p2 = P(The second-largest d2 > 17.2011378)

= 1 - P(The second-largest d2 <= 17.2011378)

= 1 - P(exactly 72 or 73 cases have d2 <= 17.2011378)

= 1 - P(exactly 72 cases have d2 <= 17.2011378)

- P(exactly 73 cases have d2 <= 17.2011378)

= 1 - 73C72(1 -  .0085718)72(.0085718)1 - 73C73(1 - .0085718)73(.0085718)0

= .12990

where NCk is the number of subsets of k objects in a set of N objects.

Calculating p2 for the case with the k-th largest d2

In general, for the case that is k-th furthest from the centroid (meaning that there are k-1 cases further from the centroid), p2 is calculated by first evaluating p1 for that case and then calculating

p2 = 1 - NCN-0(1-p1)N(p1)0

- NCN-1(1-p1)N-1(p1)1

- NCN-2(1-p1)N-2(p1)2

...

- NCN-k+1(1-p1)N-k+1(p1)k-1

where N is the number of cases.