Among the different methods for testing deviations from Hardy-Weimberg equilibrium (HWE) the chi-square goodness-of-fit test is the most popular. In the case of a Single Nucleotide Polymorphism (SNP) for the application of this test a 2x3 contingency table, like that shown below, is usually employed:
CC
|
CT
|
TT
|
|
Observed
|
200
|
100
|
25
|
Expected
|
192
|
115
|
15
|
In the case of no evidence for departure from HWE (null hypothesis), this test statistics is asymptotically distributed as a central χ2. For this reason the determination of the significance of this departure should be relatively easy. However, in the case of HWE testing, although the chi-square value is relatively easy to calculate, the determination of its degrees of freedom is not so clear. As pointed out in the book of Andreas Ziegler ed Inke Koing published in 2010, in the application of this test two different scenario can be distinguished: if the allelic frequencies of the analyzed polymorphism are estimated form the current data, the number of degrees of freedom to adopt is equal to 1 (number of expected genotypes – 1 – 1); if the allelic frequencies are specified in advance, the number of degrees of freedom to adopt is equal to 2 (number of expected genotypes – 1).
Suppose we are interested in testing for HWE for a generic SNP for which we observed that the genotypic frequencies were 200 for genotype CC, 100 for genotype CT and 20 for genotype TT. Suppose that (a) the allelic frequencies of the analyzed polymorphism are estimated form the current data; (b) the allelic frequencies are specified in advance.
Case A
The frequencies of the C and T allele are given by:
Fr_C = (2x200 + 100)/[2x(200+100+25)] = 0.77
Fr_T = 1-Fr_C = (2x25 + 100)/[2x(200+100+25)] = 0.23
Fr_T = 1-Fr_C = (2x25 + 100)/[2x(200+100+25)] = 0.23
The expected frequencies under HWE are 192 (0.77x0.77x325), 115 (2x0.77x0.23x325) and 15 (0.23x0.23x325) for the CC, CT and TT genotypes, respectively. The quantile of the chi-square distribution corresponding to 0.95 with a single degree of freedom is 3.86. Using these data we obtained a test statistic of 5.78. Since this value is higher than the critical value, we reject the null hypothesis of HWE for the analysed SNP.
Case B
Suppose that in advance (i.e. from literature data) we know that the allelic frequencies of the previous analyzed SNP are 0.77 for the C allele and 0.23 for the T allele. As in the previous case, on the basis of these data, the expected frequencies under HWE are 192, 115 and 15 for the CC, CT and TT genotypes, respectively. The quantile corresponding to 0.95 with two degrees of freedom in this case is 5.99. Using the same genotypic data of the previous example we obtained a test statistic of 5.78. Since this value is below the critical value, we do not reject the null hypothesis of HWE for the analysed SNP.
Suppose that in advance (i.e. from literature data) we know that the allelic frequencies of the previous analyzed SNP are 0.77 for the C allele and 0.23 for the T allele. As in the previous case, on the basis of these data, the expected frequencies under HWE are 192, 115 and 15 for the CC, CT and TT genotypes, respectively. The quantile corresponding to 0.95 with two degrees of freedom in this case is 5.99. Using the same genotypic data of the previous example we obtained a test statistic of 5.78. Since this value is below the critical value, we do not reject the null hypothesis of HWE for the analysed SNP.