Pathol Biol 2001 ; 49 : 405 2001 Éditions scientifiques et médicales Elsevier SAS. Tous droits réservés S0369-8114(01)00192-4/ABS
Résumé
Fine-structure of linkage disequilibrium in candidate genes for cardiovascular disease A. Clark 1 , K.M. Weiss 2 , S.M. Fullerton 2 , D.A. Nickerson 3 , C.F. Sing 4 1 Institute of molecular evolutionary genetics, department of biology Pennsylvania State university, university Park, PA, USA ; 2 departments of biology and anthropology, Penn State university, university Park, PA ; 3 department of molecular biotechnology, university of Washington, Seattle, WA ; 4 department of human
genetics, university of Michigan medical school, Ann Arbor, MI
Dense arrays of SNP genotypes in candidate genes provide an excellent resource for testing methods to detect statistical associations among genetic variants. Sequences from multiple alleles of lipoprotein lipase, apolipoprotein E, and the apolipoprotein AI-CIII-AIV cluster will be used to quantify the patterns of linkage disequilibrium and to model the inference of disease association by linkage disequilibrium. The first approach to model the association between disease risk and genetic variation is to simply assume that one of the segregating nucleotides is directly causal to disease. We have clear evidence to suggest that reality is more complicated than this, but it makes a tractable starting point. If we arbitrarily assign one of the SNPs to be causal to a disease, we can ask what methods make optimal use of flanking SNPs to identify the state at the unscored, causal site. We do this in the context of complete haplotype (linkage phase) knowledge. Theory predicts a close relationship between the pattern of linkage disequilibrium among sites in a sample of gene sequences and the gene genealogy of that sample. For example, only pairs of mutations that fall on the same branch of a gene tree are expected to show absolute association. This relationship extends to multiple sites as well, so that sets of three or four sites that all fall in the same part of the gene tree are expected to exhibit statis-
tical association. An important problem is to determine how likely it is that at sample of SNPs can identify unmeasured nucleotide sites that cause increased risk. The above considerations suggest that a multi-site approach may have merit. We explored this possibility by simulation and using the LPL and ApoE data of Nickerson et al., whose sample size now exceeds 2000 individuals. Building 2 × n contingency tables to represent counts of haplotypes with the two states at a causal site vs. the n haplotypes at the scored SNP sites. Log linear models and Monte Carlo methods were used to assess the tail probability of these tables. We find that the multi-site models do provide a gain in the probability of detecting associations. However, despite the gain in power afforded by a multiple-site approach, it is still sobering how often samples of SNPs within a candidate gene may fail to show any association with an unscored causal site within the gene. This may happen, for example, whenever the causal site is a recurrently mutating CpG site, or for sites that fall in recombination hotspots. It is an over-simplification to imagine that the cause of elevated disease risk may be additive over unitary ‘causal’ sites, so more complex scenarios of causation (two or more sites) will also be considered. Supported by NHLBI grants HL39107, HL58238, HL58239, and HL58240.