Tuesday, 29 May 2012

On the use of chi-square test for checking Hardy-Weimberg equilibrium

Among the different methods for testing deviations from Hardy-Weimberg equilibrium (HWE) the chi-square goodness-of-fit test is the most popular. In the case of a Single Nucleotide Polymorphism (SNP) for the application of this test a 2x3 contingency table, like that shown below, is usually employed:


CC
CT
TT
Observed
200
100
25
Expected
192
115
15


In the case of no evidence for departure from HWE (null hypothesis), this test statistics is asymptotically distributed as a central χ2. For this reason the determination of the significance of this departure should be relatively easy. However, in the case of HWE testing, although the chi-square value is relatively easy to calculate, the determination of its degrees of freedom is not so clear. As pointed out in the book of Andreas Ziegler ed Inke Koing published in 2010, in the application of this test two different scenario can be distinguished: if the allelic frequencies of the analyzed polymorphism are estimated form the current data, the number of degrees of freedom to adopt is equal to 1 (number of expected genotypes – 1 – 1); if the allelic frequencies are specified in advance, the number of degrees of freedom to adopt is equal to 2 (number of expected genotypes – 1). 

Suppose we are interested in testing for HWE for a generic SNP for which we observed that the genotypic frequencies were 200 for genotype CC, 100 for genotype CT and 20 for genotype TT. Suppose that (a) the allelic frequencies of the analyzed polymorphism are estimated form the current data; (b) the allelic frequencies are specified in advance. 



Case A
The frequencies of the C and T allele are given by:


Fr_C = (2x200 + 100)/[2x(200+100+25)] = 0.77


Fr_T = 1-Fr_C = (2x25 + 100)/[2x(200+100+25)] = 0.23


The expected frequencies under HWE are 192 (0.77x0.77x325), 115 (2x0.77x0.23x325) and 15 (0.23x0.23x325) for the CC, CT and TT genotypes, respectively. The quantile of the chi-square distribution corresponding to 0.95 with a single degree of freedom is 3.86. Using these data we obtained a test statistic of 5.78. Since this value is higher than the critical value, we reject the null hypothesis of HWE for the analysed SNP. 


Case B
Suppose that in advance (i.e. from literature data) we know that the allelic frequencies of the previous analyzed SNP are 0.77 for the C allele and 0.23 for the T allele. As in the previous case, on the basis of these data, the expected frequencies under HWE are 192, 115 and 15 for the CC, CT and TT genotypes, respectively. The quantile corresponding to 0.95 with two degrees of freedom in this case is 5.99. Using the same genotypic data of the previous example we obtained a test statistic of 5.78. Since this value is below the critical value, we do not reject the null hypothesis of HWE for the analysed SNP.

Thursday, 24 May 2012

Robust association tests

In order to determine a potential association between the variability of a set of genetic markers and a given complex trait, several statistical tests are commonly employed.  Among these, the Fisher’s exact test, the chi-square test and the Cochran-Armitage trend test are the most popular and frequently used in case-control genetic association studies. All these tests and their characteristics are presented in the recent book of Andreas Ziegler ed Inke Koing "A Statistical Approach to Genetic Epidemiology" published in 2010. However, in addition to several other important factors (power, assumptions, etc), a critical aspect in the application of such tests is represented by the coding scheme adopted in the association analysis. Indeed, the effectiveness of these tests (power) depends also on the coding scheme adopted for handle the available genetic information (categorical data). This is especially true for some complex traits for which the effect of the analyzed genetic variants (dominant, recessive, additive, etc) is generally unknown. As a result, to take into account the genetic model uncertainty, some authors adopted the strategy to use in a single association test not a unique, but several coding schemes. Although this kind of approach increases the power do detect possible associations, in parallel it also dramatically increases the number of false-positive results. To take into account the problem of the genetic model uncertainty and also the problem due to multiple comparisons, several “robust” tests have been developed. Among these the most popular was MAX test (or MAX3) originally proposed by Freidlin and co-workers (2002) and subsequently modified and improved by Zang and co-workers. This last version was implemented in the packages SNPassoc and Rassoc of R (Zang et al., 2010). The most recent versions of these tests, RobustSNP and the Robust Mantel-Haenszel Test, allow also adjustment for covariate effects (So and Sham, 2011; Zang and Fung, 2011). Both for the robustness with respect to the adopted genetic models (dominant, additive and recessive), and because they are able to handle genome-wide association (GWA) studies, these tests should represent the standard methodology to adopt in the future case-control genetic association studies.