TFE 2010-2011 (final year project)

Comparison of MB-MDR with Chapman omnibus testing

Many common human diseases and traits are believed to be influenced by several genetic and environmental factors, each factor potentially having a modifying effect on the other. Understanding the interplay between genetic and non-genetic factors that underlies these complex diseases and traits is one of the major goals of genetic epidemiology. In genetic association studies for common complex diseases, single nucleotide polymorphisms (SNPs) are the most commonly used type of genetic markers (Marnellos, 2003). This is in part understood by their dense distribution across the genome and their low mutation rate. Genome-wide association analysis (GWA), using a dense map of SNPs, has become one of the standard approaches for disentangling the genetic basis of complex genetic diseases (Hardy & Singleton, 2009). Despite the fact that GWAs have provided convincing evidence for identifying important genetic variants influencing a wide variety of common diseases and traits (Manolio et al., 2008, Seng & Seng, 2008), a lot of the genetic heritability cannot be explained by the (major) genetic loci discovered so far (Manolio et al., 2009). This may be attributed to the fact that reality shows multiple small associations, whereas common statistical techniques in this context only exhibit sufficient power to detect moderate to large associations. Also, looking beyond singular genetic effects and beyond the boundaries of additive inheritance of SNP polymorphisms should better reflect biological pathways that are involved in disease etiology (Dixon et al., 2000).

Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and non-genetic exposures. Several data mining methods have been proposed for interaction analysis, among them, the Model-Based Multifactor Dimensionality Reduction (MB-MDR, Calle et al 2008). This is a relatively new MDR-based technique that is able to unify the best of both non-parametric and parametric worlds. It was developed to address some of the remaining concerns that go along with an MDR-analysis. One of the major advantages is that it offers a general yet flexible framework to handle both dichotomous and continuous traits.

The topic of this thesis is to compare the performance of the non-parametric or semi-parametric MB-MDR technique to a promising parametric method, in a variety of epistasis settings. This will involve carrying out the following steps: First, dig into the MDR and MB-MDR literature to fully understand the dimensionality reduction aspect and the pros and cons of the MB-MDR approach (Figure 1) Second, read Chapman et al (2007) and try to implement the omnibus approach. This will be the most tedious job. Third, apply MB-MDR and the Chapman method to simulated data with 500 replicates. These data will be provided to you, so that you can focus on extracting the results and to summarize the power performance of both methods. Fourth, discuss your findings.

Depending on the progress made in this project, the work may lead to a genuine scientific publication.

See van_steen_4_2011_comparison_mb-mdr_chapman_omnibus_testing.doc for references and figures.

Renseignements, Promoteur: