TFE 2011-2012 (final year project)

The influence of linkage disequilibrium between markers on the outcome of an epistasis screening

Many common human diseases and traits are believed to be influenced by several genetic and environmental factors, each factor potentially having a modifying effect on the other. Understanding the interplay between genetic and non-genetic factors that underlies these complex diseases and traits is one of the major goals of genetic epidemiology. In genetic association studies for common complex diseases, single nucleotide polymorphisms (SNPs) are the most commonly used type of genetic markers (Marnellos, 2003). This is in part understood by their dense distribution across the genome and their low mutation rate. Genome-wide association analysis (GWA), using a dense map of SNPs, has become one of the standard approaches for disentangling the genetic basis of complex genetic diseases (Hardy & Singleton, 2009). Despite the fact that GWAs have provided convincing evidence for identifying important genetic variants influencing a wide variety of common diseases and traits (Manolio et al., 2008, Seng & Seng, 2008), most of the genetic heritability cannot be explained by the (major) genetic loci discovered so far (Manolio et al., 2009). So, would combinations of multiple loci be able to explain more?

The topic of this thesis is to investigate the performance of MB-MDR (Calle et al 2008), and Epiblaster, relatively new epistasis screening methods in scenarios of indirect associations, hence, in which one or all of the actual causal loci are not observed directly, but are in LD with typed genetic markers. To simplify the study, restriction will be made to dichotomous traits only (i.e., binary outcomes, diseased or not diseased).

In practice, data will need to be simulated from different epistasis models (Ritchie et al 2001) encompassing relevant scenarios. Second, the simulated data are used to estimate the power of epistasis detection methods in the considered scenarios. Third, the obtained results are compared with those where the actual causal loci have been observed. Fourth, the results are encapsulated in a broader context: How do these results alter the view on the use of tagging SNPs or imputed data sets (using LD-based haplotype blocks) in epistasis analysis? A nice publication in this context is Chapman et al (2007).

This thesis may lead to a first scientific publication.

Références:

Calle, M.L., Urrea, V., Vellalta, G., Malats, N. & Van Steen, K. (2008) Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data. U.O.V. Department of Systems Biology (ed.).
Dixon, M.S., Golstein, C., Thomas, C.M., Van Der Biezen, E.A. & Jones, J.D. (2000) Genetic complexity of pathogen perception by plants: the example of Rcr3, a tomato gene required specifically by Cf-2. Proc Natl Acad Sci U S A, 97, 8807-14.
Hardy, J. & Singleton, A. (2009) Genomewide association studies and human disease. N Engl J Med, 360, 1759-68.
Manolio, T.A., Brooks, L.D. & Collins, F.S. (2008) A HapMap harvest of insights into the genetics of common disease. J Clin Invest, 118, 1590-605.
Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., Mccarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., Cho, J.H., Guttmacher, A.E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C.N., Slatkin, M., Valle, D., Whittemore, A.S., Boehnke, M., Clark, A.G., Eichler, E.E., Gibson, G., Haines, J.L., Mackay, T.F., Mccarroll, S.A. & Visscher, P.M. (2009) Finding the missing heritability of complex diseases. Nature, 461, 747-53.
Marnellos, G. (2003) High-throughput SNP analysis for genetic association studies. Curr Opin Drug Discov Devel, 6, 317-21.
Ritchie, M. D.; Hahn, L. W.; Roodi, N.; Bailey, L. R.; Dupont, W. D.; Parl, F. F. & Moore, J. H. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am.J.Hum.Genet., 2001, 69, 138-147.
Ritchie, M. D.; Hahn, L. W. & Moore, J. H. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet.Epidemiol., Wiley-Liss, Inc, 2003, 24, 150-157.
Seng, K.C. & Seng, C.K. (2008) The success of the genome-wide association approach: a brief story of a long struggle. Eur J Hum Genet, 16, 554-64.

Renseignements:

Kristel Van Steen (kristel.vansteen@ulg.ac.be)