Genome-Wide Association Studies in Nephrology Research

Genome-Wide Association Studies in Nephrology Research

IN TRANSLATION Genome-Wide Association Studies in Nephrology Research Anna Köttgen, MD, MPH1,2 Kidney diseases constitute a serious public health burd...

815KB Sizes 0 Downloads 35 Views

IN TRANSLATION Genome-Wide Association Studies in Nephrology Research Anna Köttgen, MD, MPH1,2 Kidney diseases constitute a serious public health burden worldwide, with substantial associated morbidity and mortality. The role of a genetic contribution to kidney disease is supported by heritability studies of kidney function measures, the presence of monogenic diseases with renal manifestations, and familial aggregation studies of complex kidney diseases, such as chronic kidney disease. Because complex diseases arise from the combination of multiple genetic and environmental risk factors, the identification of underlying genetic susceptibility variants has been challenging. Recently, genome-wide association studies have emerged as a method to conduct searches for such susceptibility variants. They have successfully identified genomic loci that contain variants associated with kidney diseases and measures of kidney function. For example, common variants in the UMOD and PRKAG2 genes are associated with risk of chronic kidney disease; variants in CLDN14 with risk of kidney stone disease; and variants in or near SHROOM3, STC1, LASS2, GCKR, NAT8/ALMS1, TFDP2, DAB2, SLC34A1, VEGFA, FAM122A/PIP5K1B, ATXN2, DACH1, UBE2Q2/FBXO22, and SLC7A9, with differences in glomerular filtration rate. The purpose of this review is to provide an overview of the genome-wide association study method as it relates to nephrology research and summarize recent findings in the field. Results from genome-wide association studies of renal phenotypes represent a first step toward improving our knowledge about underlying mechanisms of kidney function and disease and ultimately may aid in the improved treatment and prevention of kidney diseases. Am J Kidney Dis 56:743-758. © 2010 by the National Kidney Foundation, Inc. INDEX WORDS: Genome-wide association study; kidney; chronic kidney disease (CKD); glomerular filtration rate; complex disease; single-nucleotide polymorphism; diabetic nephropathy.

BACKGROUND Kidney diseases pose a significant global disease burden.1,2 The most common form, chronic kidney disease (CKD), affects an estimated 10% of adults in many countries and the prevalence is increasing.1,3-5 Individuals with impaired kidney function are at increased risk of disease progression to end-stage renal disease (ESRD),6,7 as well as increased risk of cardiovascular morbidity and mortality.8,9 Differences in known socioeconomic and cardiovascular risk factors for kidney disease, such as diabetes and hypertension, contribute to the observed differences in estimated glomerular filtration rate (eGFR) and CKD progression.7,10,11 However, CKD progression in individuals with hypertension and/or diabetes is variable, pointing toward the importance of additional factors, including genetic risk factors. A genetic contribution to kidney function and kidney disease is supported by the presence of monogenic (Mendelian) diseases of the kidney, such as autosomaldominant polycystic kidney disease (OMIM [Online Mendelian Inheritance in Man] #173900), heritability studies of kidney function measures, and familial aggregation studies of complex diseases, such as CKD. Heritability estimates for

the most commonly used measure of kidney function, GFR, range from 0.33 to 0.82,12,13 indicating that 33%-82% of the interindividual variation in GFR estimates in these studies could be explained by additive genetic effects. Familial aggregation studies show that ESRD and earlier stages of CKD cluster in families.14-16 In contrast to monogenic diseases, common complex diseases, such as CKD, are thought to arise from a combination of multiple genetic and environmental factors, with harmful environmental exposures acting on genetically susceptible individuals.17 Despite intensive research, it has been challenging to identify genetic determiFrom the 1Renal Division, University Hospital Freiburg, Germany; and 2Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD. Received January 7, 2010. Accepted in revised form May 11, 2010. Originally published online as doi:10.1053/j.ajkd. 2010.05.018 on August 23, 2010. Address correspondence to Anna Köttgen, MD, MPH, Department of Internal Medicine IV, University Hospital Freiburg, Hugstetter Str 55, 79106 Freiburg, Germany. E-mail: [email protected] © 2010 by the National Kidney Foundation, Inc. 0272-6386/10/5604-0018$36.00/0 doi:10.1053/j.ajkd.2010.05.018

American Journal of Kidney Diseases, Vol 56, No 4 (October), 2010: pp 743-758

743

744

Anna Köttgen

nants and pathophysiologic mechanisms underlying complex diseases. Many results from candidate gene studies have been difficult to replicate for a number of reasons, including the selection of appropriate candidate genes.18,19 Challenges related to candidate gene selection do not apply to genome-wide approaches, in which the entire genome is surveyed for the presence of genetic susceptibility loci free of prior biological hypotheses. Genome-wide association studies (GWAS) have emerged during the past few years as a novel technique to identify risk variants for complex traits and diseases20-22 and have been applied successfully in the field of nephrology. Through gene discovery, GWAS can lay the foundation for improved understanding of physiologic and pathophysiologic mechanisms. Better understanding of these mechanisms and potential newly identified disease biomarkers and drug targets hold the promise to eventually result in improved disease prevention and treatment. This review focuses on the GWAS method and its application in nephrology research; studies using other methods to identify genetic risk variants for kidney diseases and underlying pathogenic mechanisms recently have been reviewed elsewhere.23,24 A glossary of terms, which are shown in italic font at first use in text, is provided as Box 1.

Box 1. Glossary ●















● ● ●

CASE VIGNETTE Two separate patients both have important known risk factors for the development and progression of CKD: one is a nonsmoking 55-year-old man of European ancestry with a body mass index of 33 kg/m2, fasting glucose level of 127.0 mg/dL (7.05 mmol/L), and fasting serum triglyceride level of 131.98 mg/dL (1.49 mmol/L). The other patient is a nonsmoking 58-year-old African American woman with a body mass index of 32 kg/m2, fasting glucose level of 96.0 mg/dL (5.33 mmol/L) on chlorpropamide therapy, and fasting serum triglyceride level of 55.8 mg/dL (0.63 mmol/L). Both patients are normotensive, the first on treatment. Despite the presence of CKD risk factors, such as untreated or treated diabetes and obesity, in both patients, different rates of CKD progression are observed. Although the first patient has stable kidney function (eGFR, 58 mL/min/1.73 m2 [0.97 mL/s/1.73 m2] in year 1 and 60 mL/min/1.73 m2 [1.0 mL/s/1.73 m2] after 10 years), kidney function of the second patient decreases from eGFR of 56 mL/min/1.73 m2 (0.93 mL/s/1.73 m2) to ⬍15 mL/min/1.73 m2 (⬍0.25 mL/s/1.73 m2) during the same period. To identify genetic risk factors that contribute to the variable course of CKD, large numbers of unrelated individuals, such as the 2 patients in the case vignette, can be studied

● ● ●

Allele: Alternative DNA sequences at the same physical position in the genome. SNPs typically have 2 alleles (biallelic) that correspond to the 2 variants at this position found in the population Complex disease: Diseases thought to arise as a combination of genetic effects and environmental influences; often contrasted to Mendelian disease Data cleaning: Process of detecting and removing errors or inconsistencies from data, such as genotyping errors or samples of low quality DNA, with the goal of improving data quality Exome: The subset of the genome formed by exons. Exons are genomic sequences that are expressed, ie, used for synthesis of proteins and other gene products Heritability: The proportion of phenotypic variance in a population that can be attributed to additive genetic variation Imputation methods: Techniques to infer missing genotype data in a study population based on the genotype data available and the known correlation patterns between variants in a reference population (eg, provided by the HapMap Project) International HapMap Project: An international effort to identify and catalogue the common patterns of human genetic variation for several groups of different ancestries, the HapMap study samples Linkage disequilibrium (LD): Alleles at nearby loci occurring together more often than would be expected by chance alone and therefore providing some degree of information about each other Mendelian disease: Disease caused by a mutation in a single gene Minor allele frequency: For a SNP, the frequency of the less common allele in a population Phenotype: Any observable characteristic or trait of an organism Polymorphism: Genetic variation in which each allele occurs in ⱖ1% of the population SNP: DNA sequence variation resulting from a change of a single nucleotide base Trait: A feature or quantity in an individual that can be measured and differs between individuals. Disease status is a dichotomous trait (eg, presence of ESRD), many clinical measurements are continuous traits (eg, GFR)

Abbreviations: ESRD, end-stage renal disease; GFR, glomerular filtration rate; SNP, single-nucleotide polymorphism.

using GWAS as a technique. Common genetic variants associated with several kidney diseases were identified successfully during the past 2 years using this approach.

RECENT ADVANCES Since 2005, when the first GWAS using a high-throughput single-nucleotide polymorphism (SNP) genotyping array was published,25 a mul-

Genome-Wide Association Studies in Nephrology

titude of genetic risk variants for many complex diseases and traits have been identified using this method. By June of 2009, a total of 439 GWAS were published reporting SNP-phenotype associations at P ⬍ 5 ⫻ 10⫺8,22 illustrating the feasibility of this approach. GWAS Method GWAS use naturally occurring genetic variation in the form of SNP markers to identify regions of the genome in which genetic variation associates with disease status or variation of a clinical measure in a study population. Table 1 lists characteristics of monogenic and complex diseases, their underlying genetic risk variants, and suitable methods to detect them. As opposed to mutations causing monogenic diseases, risk variants for complex diseases such as CKD mostly confer moderate disease risk (odds/risk ratio ⬍1.5; Table 1). In this setting, association studies in large study populations of related or unrelated individuals are a powerful method for gene discovery.26 The availability of large cohort studies with information for many well-defined observable or measurable characteristics (phenotypes) promotes efficient GWAS study design. GWAS have become possible as a result of resources created by the international multidisciplinary research community in a tremendous effort during the past decade: (1) sequencing of the human genome to provide a reference

745 Box 2. Steps to Conduct a GWAS 1. Assemble adequately phenotyped study population 2. Obtain DNA and genotype 3. Careful data cleaning and quality control, SNP imputation 4. Conduct statistical association tests 5. Inference: an associated SNP itself or a correlated unknown genetic variant causes disease 6. Replicate findings externally 7. Follow up projects (clinical and epidemiologic characterization, identification of causal variants, functional studies) Abbreviations: GWAS, genome-wide association study; SNP, single-nucleotide polymorphism.

sequence27,28; (2) systematic cataloging of SNPs in public databases29; (3) the HapMap Project and International HapMap Consortium to characterize common genomic sequence variants, their frequencies, and the correlation between nearby SNPs in various populations30-32; (4) advances in technology to allow affordable massive parallel genotyping of more than 1 million SNPs; (5) advances in statistical analysis methods and computational resources33-37; and (6) the availability of large well-characterized disease-specific and/or population-based study samples. Specific steps to conduct a GWAS are outlined in Box 2. As a start, it is critical to have access to a study population of adequate size with appropriate phenotype measures. Second, DNA has to be available to conduct genome-

Table 1. Characteristics of Monogenic and Complex Diseases and Their Underlying Genetic Risk Variants Monogenic Diseases

Disease prevalence Public health impact Example Magnitude of associated disease risk Frequency of genetic risk variants Cause of disease Mendelian inheritance pattern observed Study samples to identify risk variants Genome-wide method to identify risk variantsa a

Other methods exist.

Complex Diseases

Mostly rare Small to moderate Polycystic kidney disease High (often relative risk ⬎5)

Mostly common Large Chronic kidney disease Moderate (often relative risk/odds ⬍1.5)

Rare Single gene mutations, highly penetrant Yes

Common; ability to detect rare variants of moderate effect is still limited Multifactorial, many genetic and environmental risk factors jointly cause disease No

Family studies: usually small study size Linkage

Large studies of related or unrelated individuals Association

746

wide genotyping. Typical genotyping arrays contain about 1 million common SNPs per individual and are manufactured mostly by Illumina (www.illumina.com) and Affymetrix (www.affymetrix.com). Data from the International HapMap Consortium show that most of the then known over 10 million common SNPs in the genome have highly correlated nearby SNPs.32 The nonrandom co-occurrence of the alleles at these SNPs is termed linkage disequilibrium; information from a genotyped SNP therefore can provide information about a variant nearby by exploiting this correlation (Fig 1). Thus, genotyping a subset of SNPs on a genotyping array is sufficient to provide information about much of the untyped common genomic variation. Genotypes at SNPs that are not directly genotyped can be inferred: imputation methods combine genotype data from each sample with a more densely genotyped reference sample of known SNP correlations, such as the HapMap samples, to then infer genotypes at untyped SNPs probabilistically.34,36 Current GWAS therefore examine approximately 2.5 million genotyped and imputed SNPs per individual. Because of the enormous

Figure 1. Genome-wide association studies use singlenucleotide polymorphism (SNP) markers to assess association of genotypic variation with phenotypic variation. Linkage disequilibrium (LD) is the nonrandom co-occurrence of certain alleles at nearby SNPs. If SNP1 and SNP2 are perfectly correlated, a person with allele T at SNP1 will carry allele C at SNP2. It therefore is sufficient to only genotype SNP2 if alleles at SNP1 and correlation are known from a reference source. Abbreviation: CKD: chronic kidney disease.

Anna Köttgen

amount of data generated, careful data cleaning and quality control are indispensable.38,39 After sufficient data quality is ensured, association tests can be conducted. Mean values of a clinical measure or proportions of affected individuals can be compared across the 3 genotype classes that exist for biallelic SNPs in a population (Fig 1). A statistical summary measure for association is obtained, and this procedure is iteratively carried out for all available SNPs. Because the large number of conducted tests would give rise to many false-positive findings based on a conventional significance threshold, it is necessary to use strict thresholds to indicate statistical significance. Different procedures are used in the field,26,40 most commonly a conservative Bonferroni correction for the number of independent SNPs tested. For individuals of European ancestry, this results in a threshold of 5 ⫻ 10⫺8.41 In other words, significant association is claimed only if P for association is ⬍5 ⫻ 10⫺8. An essential point is that association does not necessarily imply a causal relationship between the SNP and the phenotype. Rather, the SNP association implicates the importance of a genomic region, where the true causal variant can be the SNP or, more likely, is unknown and correlated with the known SNP. Increases in the initial sample size to detect associated variants leads to an increase in the number of identified risk loci.42,43 Therefore, GWAS discovery now commonly is conducted as a meta-analysis of GWAS results from many individual studies, with a total sample size approaching up to 100,000 individuals and more. Independent replication of findings is essential for GWAS results from single studies and desirable for results from GWAS metaanalyses to minimize the chance of falsepositive associations, but also to obtain more generalizable risk estimates and characterize genetic risk variants in different populations. Common Ways to Summarize GWAS Findings GWAS results often are summarized in the form of graphs. One such graph is a quantilequantile plot in which ordered values of the observed ⫺log10(P values) for association are compared with their expected distribution under the null hypothesis of no association. Fig-

Genome-Wide Association Studies in Nephrology

ure 2A shows such a plot from a recently published GWAS of eGFR.43 The deviation from the line of identity for extremely small P values indicates the presence of SNPs that show stronger association than expected by chance alone. At the same time, no deviation from the expected distribution over the main part of the distribution suggests the absence of systematic biases. Another commonly generated plot is the so-called Manhattan plot, which shows the observed ⫺log 10 (P values) by genomic position for each evaluated SNP. Figure 2B shows the Manhattan plot corresponding to Fig 2A. Genome-wide significant associations (P ⬍ 5 ⫻ 10⫺8) with GFR are found on most chromosomes in this large study of more than 65,000 individuals.43

747

RECENT RESULTS Association Results With Kidney Diseases Tables 2 and 3 provide an overview of genomic regions identified in GWAS of kidney diseases and measures of kidney function to date. Table 2 summarizes results for studies with a dichotomous outcome (disease). Several studies were conducted to identify genetic risk variants for diabetic nephropathy. For example, a study of type 1 diabetic nephropathy was conducted in individuals in the GoKinD (Genetics of Kidneys in Diabetes) collection.44 Although no SNP showed significant association after correction for multiple testing, the investigators were able to replicate suggestive results for SNPs in the FRMD3 and CARS gene regions in an independent population. Other candidate genes for dia-

Figure 2. (A) Quantile-quantile (QQ) plot comparing observed and expected ordered ⫺log10(P values) from a genomewide association study (GWAS) of glomerular filtration rate (GFR) estimated from serum creatinine. Lower P values correspond to higher –log10(P values). (B) Manhattan plot of –log10(P values) by chromosomal location from a GWAS of GFR estimated from serum creatinine. Reproduced from Köttgen et al43 with permission of Nature Publishing Group.

748

Anna Köttgen Table 2. Genomic Regions Identified in GWAS of Kidney Diseases

Discovery Sample Size (no. of cases)

SNP; Chromosomal Location

Implicated Gene

OR; P

Allele Frequency

Reference

Comment

Type 1 Diabetic Nephropathy 1,705 (820) 1,705 (820) 260 (112) 1,069 (547) 1,069 (547)

rs10868025; chr9: 85353996 rs451041; chr11: 3017301 rs11886047; chr2: 43704094 rs9298190; chr8: 73006888 rs174982; chr10: 80593868

FRMD3

1.45; 5 ⫻ 10⫺7

CARS

1.36; 3 ⫻ 10⫺6

ZMIZ1

0.47; 8 ⫻ 10⫺5

rs2648875; chr8: 129141343 b SNP ⫹78; chr 16q13

PVT1

2.97; 2 ⫻ 10⫺6

SLC12A3

2.53; 2 ⫻ 10⫺5

PLEKHH2 1.4; 8 ⫻ 10⫺3 (CT vs CC ⫹ TT) MSC 1.56; 2 ⫻ 10⫺5

G: 0.56-0.59 (controls), 0.66 (cases) A: 0.46-0.48 (controls), 0.54-0.56 (cases) C: 0.78

44

European ancestry

44

See above

45

T: 0.34

46

G: 0.40

46

European ancestry; f/u in larger sample size European ancestry, ESRD, discovery in pooled data See above

Type 2 Diabetic Nephropathy 207 (105) 188 (94)

188 (94)

b

SNP ⫹9170; chr 7p14

ELMO1

A: 0.53 (controls), 0.77 (cases) G: 0.93 (controls), 0.97 (cases)

2.67; 8 ⫻ 10⫺6 (GG G: 0.30 (controls), vs GA & AA) 0.39 (cases)

47, 48a 49, 50a

46,a 51, 52,a 53a

Pima Indians; ESRD, pooled samples Japanese population; f/u in larger sample size, gene-based SNPs Japanese population; f/u in larger sample size; gene-based SNPs

IgA Nephropathy Unclear 728 (94)

PIGR-17; chr1:q31- PIGR q41 rs2275996; chr11: IGHMBP2 68462396

1.59; 3 ⫻ 10⫺4 1.85; 3 ⫻ 10⫺5

C: 0.79 (cases), 0.86 (controls) A: 0.14 (cases), 0.08 (controls)

54 55, 56

Japanese population; f/u in 854 (389 cases) Japanese population; f/u in 1,099 (465 cases)

CKD 1,010

rs6495446; chr15: 77942037 19,877 (2388) rs12917707; chr16: 20275191 62,237 (5807) rs7805747; chr7: 151038734

MTHFS

1.24; 1 ⫻ 10⫺3

C: 0.73 overall

57, 58

0.8; 2 ⫻ 10⫺12

T: 0.18 overall

43,a 59

PRKAG2

1.19; 9 ⫻ 10⫺14

A: 0.24 overall

43

European ancestry, f/u in 22,503

35,540 (1507) rs219778; chr21: 36755177

CLDN14

1.25; 4 ⫻ 10⫺12

60

European ancestry populations

Study 1: rs4821481; chr22: 1,475 (669); 35025888 study 2: 412 (190)

MYH9

UMOD

European ancestry; f/u in 15,747 European ancestry

Kidney Stones T: 0.75 (controls), 0.80 (cases)

ESRD, FSGS C: 0.65 (study 1), Study 1: 2.4; 0.62 (controls, 3 ⫻ 10⫺8 (CC vs CT/TT); study 2: study 2) ⫺13 4.1; 1 ⫻ 10 (CC vs CT/TT)

African Americans, MALD Study 1: 61, method; f/u in larger study 2: samples in both studies; 62; 63a-68a study 1: nondiabetic ESRD, study 2: FSGS

Note: Only the most significant variant at each locus is reported, variants that did not replicate in the initial report are not included. SNP position based on dbSNP (www.ncbi.nlm.nih.gov/projects/SNP/) build 130 when not provided in the article. Abbreviations: chr, chromosome; ESRD, end-stage renal disease; FSGS, focal segmental glomerulosclerosis; f/u, follow-up; GWAS, genome-wide association study; IgA, immunoglobulin A; MALD, mapping by admixture linkage disequilibrium; rs, reference single-nucleotide polymorphism identification number; SNP, single-nucleotide polymorphism. a Replication study. b SNP number is provided as reported by the authors and refers to the nucleotide position. A reference SNP (rs) identification number is not available.

Genome-Wide Association Studies in Nephrology

749

Table 3. Genomic Regions Identified in GWAS of Measures of Kidney Function and Damage Discovery Sample Size

SNP; Chromosomal Location

Proximate Gene(s)

Effect Size; P

Allele Frequency

Reference

Comment

European ancestry; age- and sex-adjusted residual of ln(eGFR) in mL/min/1.73 m2 per minor allele See above

eGFR (based on SCr) 0.018; 5 ⫻ 10⫺16

T: 0.18

43,a 59, 69,a 70a

SHROOM3

⫺0.012; 1 ⫻ 10⫺12

A: 0.44

43,a 59, 69a

GATM

⫺0.013; 6 ⫻ 10⫺14

G: 0.38

ANXA9

0.010; 1 ⫻ 10⫺12

C: 0.20

GCKR

0.009; 3 ⫻ 10⫺14

T: 0.41

43,a 59, 69,a Positive control; see above 70a 43 European ancestry; age- and sex-adjusted residual of ln(eGFR) in mL/min/1.73 m2 per minor allele 43 See above

NAT8, ALMS1

0.009; 5 ⫻ 10⫺14

G: 0.23

43

See above

⫺0.009; 1 ⫻ 10⫺15

A: 0.32

43

See above

0.009; 3 ⫻ 10⫺11

C: 0.28

43

See above

DAB2

⫺0.009; 1 ⫻ 10⫺17

A: 0.44

43

See above

SLC34A1

⫺0.011; 1 ⫻ 10⫺14

G: 0.34

43

See above

0.011; 9 ⫻ 10⫺14

G: 0.28

43

See above

SLC22A2

⫺0.013; 6 ⫻ 10⫺12

G: 0.12

43

See above

TMEM60

⫺0.008; 2 ⫻ 10⫺9

C: 0.39

43

See above

PRKAG2

⫺0.012; 1 ⫻ 10⫺18

A: 0.25

43

See above

PIP5K1B, FAM122A WDR37

⫺0.008; 8 ⫻ 10⫺14

A: 0.39

43

See above

⫺0.014; 1 ⫻ 10⫺8

T: 0.08

43

See above

SLC6A13

0.008; 1 ⫻ 10⫺9

C: 0.36

43

See above

DACH1

0.009; 3 ⫻ 10⫺11

C: 0.40

43

See above

WDR72

0.009; 3 ⫻ 10⫺13

C: 0.22

43

See above

⫺0.009; 3 ⫻ 10⫺17

A: 0.35

43

See above

⫺0.011; 1 ⫻ 10⫺15

C: 0.19

43

See above

0.008; 3 ⫻ 10⫺15

C: 0.39

43

See above

18,127

rs12917707; chr16: 20275191

UMOD

18,127

rs17319721; chr4: 77587871 rs2467853; chr15: 43486085 rs267734; chr1:149218101

18,127 67,093

67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093 67,093

rs1260326; chr2: 27584444 rs13538; chr2:73721836 rs7422339; chr2: 211248752 rs347685; chr3:143289827 rs11959928; chr5: 39432889 rs6420094; chr5: 176750242 rs881858; chr6:43914587 rs2279463; chr6: 160588379 rs6465825; chr7: 77254375 rs7805747; chr7: 151038734 rs4744712; chr9: 70624527 rs10794720; chr10: 1146165 rs10774021; chr12: 219559 rs626277; chr13: 71245697 rs491567; chr15: 51733885 rs1394125; chr15: 73946038 rs9895661; chr17: 56811371 rs12460876; chr19: 38048731

CPS1 TFDP2

VEGFA

UBE2Q2, FBXO22 TBX2/BCAS3 SLC7A9

eGFR (based on serum cystatin C) 12,266

rs13038305; chr20: 23558262

CST3

12,266

rs1731274; chr8: 23822264 rs653178; chr12: 110492139

STC1

20,957

ATXN2

0.076; 2 ⫻ 10⫺88

⫺0.017; 5 ⫻ 10⫺8 0.013; 4 ⫻ 10⫺11

T: 0.21

43,a 59

G: 0.43

43,a 59

T: 0.50

43

(Continued on following page)

European ancestry; effect is the change in age- and sex-adjusted residual of ln(eGFR) per minor allele See above See above

750

Anna Köttgen Table 3 (Cont’d). Genomic Regions Identified in GWAS of Measures of Kidney Function and Damage

Discovery Sample Size

SNP; Chromosomal Location

Proximate Gene(s)

Allele Frequency

Reference

⫺0.068; 2 ⫻ 10⫺8

C: 0.40

71

0.079; 2 ⫻ 10⫺4

A: 0.30

69

⫺0.116; 6 ⫻ 10⫺5

A: 0.13

69

Japanese population; ln(SCr) in mg/dL Population isolates, European ancestry; residuals of age- and sex-adjusted normalized SCr (mg/dL) See above

GABRR2

0.102; 7 ⫻ 10⫺5

C: 0.18

69

See above

NAT8/ALMS1

⫺1.0; 1 ⫻ 10⫺15

G: 0.22

70

1.1; 7 ⫻ 10⫺10

G: 0.13

70

Effect size corresponds to percentage of change in SCr; European ancestry See above

⫺10

G: 0.27

70

See above

⫺1.0; 5 ⫻ 10⫺11

G: 0.13

70

See above

57

European ancestry; positive control

Effect Size; P

Comment

SCr 14,345 3,999

3,997 3,995 23,812

21,857 23,812 23,812

rs10518733; chr15: 51727599 rs4588898; chr8: 139815390

WDR72

rs12300068; chr12: 78005502 rs2064831; chr6: 90089661 rs10206899; chr2: 73754408

SYT1

COL22A1

rs3127573; chr6: 160601383 rs8068318; chr17: 56838548 rs4805834; chr19: 38145499

SLC22A2

rs1158167; chr20: 23526189

CST3-CST9 region

TBX2/BCAS3 SLC7A9

0.8; 3 ⫻ 10

Serum Cystatin C 981

8 ⫻ 10⫺9

G: 0.21

Note: Only the most significant variant at each locus is reported, variants that did not replicate in the initial report are not included. SNP position based on dbSNP (www.ncbi.nlm.nih.gov/projects/SNP/) build 130 when not provided in the article. Abbreviations: chr, chromosome; eGFR, estimated glomerular filtration rate; f/u, follow up; GWAS, genome-wide association study; rs, reference single-nucleotide polymorphism identification number; SCr, serum creatinine; SNP, single-nucleotide polymorphism. a Replication study.

betic nephropathy identified through GWAS include PLEKHH2,45 MSC, and ZMIZ146 for type 1 diabetic nephropathy and PVT1,47 SLC12A3,49 and ELMO151 for type 2 diabetic nephropathy (Table 2). Several of these findings still await replication. Early GWAS of immunoglobulin A nephropathy examined only a small study sample,54,55 and association between variants in the IGHMBP2 gene and immunoglobulin A nephropathy did not replicate in a recent study of individuals of Chinese and European ancestry.56 In contrast to these studies of mostly moderate size of individuals with advanced kidney disease, the first GWAS of CKD was conducted in a large number of individuals with mostly CKD stage 3 from population-based studies.59 This study examined almost 20,000 participants of European ancestry within the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) Consortium.72 The CHARGE in-

vestigators identified SNPs in the UMOD gene region as associated with CKD (defined as eGFR ⬍60 mL/min/1.73 m2, when GFR was estimated from serum creatinine using the Modification of Diet in Renal Disease [MDRD] Study equation). The UMOD gene encodes for uromodulin, also known as Tamm-Horsfall protein.73 TammHorsfall protein is the most abundant protein in urine of healthy individuals and is transcribed exclusively in the thick ascending limb of the loop of Henle.74 Interestingly, rare mutations in UMOD cause autosomal dominant forms of kidney disease: medullary cystic kidney disease type 2 (OMIM #603860), familial juvenile hyperuricemic nephropathy (OMIM #162000), and glomerulocystic kidney disease (OMIM #609886).75-77 The association of the UMOD risk variant with CKD was observed in the presence and absence of major known risk factors for CKD, including diabetes and hypertension (Fig 3). This suggests that the study of a disease of heterogeneous cause, such

Genome-Wide Association Studies in Nephrology

Figure 3. Meta-analysis of the odds of chronic kidney disease (CKD) per each additional copy of the minor T allele at the UMOD single-nucleotide polymorphism rs12917707 in the absence and presence of major kidney disease risk factors. Abbreviation: rs, reference singlenucleotide polymorphism identification number. Reproduced from Köttgen et al59 with permission of Nature Publishing Group.

as CKD, may aid in the identification of disease mechanisms common to different disease subgroups. However, further research in larger numbers of individuals with advanced forms of very specific kidney disease diagnoses are likely to identify additional CKD risk genes. In addition to the UMOD gene, a subsequent effort of the international CKDGen Consortium led to the identification of the PRKAG2 gene locus as influencing CKD risk in a very large population-based study including more than 5,000 CKD cases.43 PRKAG2 encodes for a subunit of adenosine monophosphate–activated protein kinase; potential connections to kidney disease are less clear than for the UMOD gene. In the first GWAS of kidney stone disease, Thorleifsson et al60 identified genetic variants at the CLDN14 locus in individuals from Europe (Table 2). Claudin 14 is expressed in the kidney, where it contributes to the regulation of paracellular permeability at epithelial tight junctions. The investigators found that the genetic risk variant was associated not only with increased risk of kidney stone disease, but also with decreased bone mineral density, and provided evidence for a role of altered calcium homeostasis. Rare mutations in CLDN14 cause a form of autosomal-recessive deafness without known renal involvement.78 Finally, genetic risk variants for nondiabetic ESRD and focal segmental glomerulosclerosis in

751

the MYH9 gene region were identified by 2 studies.61,62 This association has been replicated in numerous study populations of different ancestries and various forms of kidney disease, confirming the MYH9 locus as a very important kidney disease risk locus. The discovery studies were conducted in African Americans using a different genome-wide method based on the recent genetic admixture in this population.79 The protein encoded by MYH9 is expressed in podocytes, and rare MYH9 mutations cause autosomaldominant syndromes (May-Hegglin [OMIM #155100], Sebastian [OMIM #605249], Fechtner [OMIM #153640], and Epstein [OMIM #153650]), which can involve glomerular disease.80 The studies by Kao et al62 and Kopp et al62 are remarkable because they show that a large proportion of the higher observed risk of ESRD in African Americans compared with individuals of European ancestry may be related to risk variants in MYH9. In contrast to the usual scenario presented in Table 1, MYH9 variants appear to both be common in African Americans and confer substantial disease risk. The discovery of MYH9 may aid in the reclassification of glomerular diseases: recent findings indicate that the entity “hypertensive glomerulosclerosis” in African Americans may be attributable predominantly to genetic variation in MYH9 and falls within the spectrum of focal segmental glomerulosclerosis.24,61,62 Association Results With Measures of Kidney Function In Table 3, results from GWAS studying continuous kidney function measures are summarized. In comparison to GWAS results for dichotomous disease outcomes, GWAS of continuous renal traits, such as eGFR, have identified many more associated genomic regions. This highlights the power of using continuous clinical measures rather than a dichotomous disease trait to identify regions of interest.81 In the first large GWAS of eGFR, investigators of the CHARGE Consortium were able to identify common risk variants in or near UMOD, SHROOM3, and STC1.59 SHROOM3 encodes an actin-binding protein expressed in the kidney, where it may have an important role in the morphogenesis of epithelial tissues during development.82 STC1 encodes stanniocalcin 1, which is highly ex-

752

pressed in the nephron.83 It may regulate calcium concentrations in a paracrine fashion and has been reported to have cytoprotective and antiinflammatory roles in a rodent model of glomerulonephritis.84 Very recently, investigators of the CKDGen Consortium identified an additional 20 genomic regions related to GFR estimated from serum creatinine level (Table 3). In this largest study of kidney function to date, the investigators combined data from more than 65,000 individuals from 20 population-based studies of Europeanancestry participants and replicated their findings in more than 20,000 additional individuals.43 They identified variants at 13 new loci associated with kidney function (in or near LASS2, GCKR, NAT8/ALMS1, TFDP2, DAB2, SLC34A1, VEGFA, PRKAG2, FAM122A/PIP5K1B, ATXN2, DACH1, UBE2Q2/FBXO22, and SLC7A9) and 7 loci suspected to affect creatinine production and secretion (CPS1, SLC22A2, TMEM60, WDR37, SLC6A13, WDR72, and TBX2/BCAS3). Different measures of kidney function were used to distinguish genetic variants affecting kidney function from those related to creatinine metabolism, as detailed next. Most of the identified genes have not been reported previously as associated with kidney disease, but several important findings include the following: (1) the presence of known monogenic kidney disease syndromes for rare mutations in SLC7A9 (cystinuria, OMIM #220100), SLC34A1 (hypophosphatemic nephrolithiasis/osteoporosis, OMIM #612286), and ALMS1 (Almstrom syndrome, OMIM #203800); (2) the interaction of the DAB2 and MYH9 products on a protein level85; and (3) the role of vascular endothelial growth factor A (encoded by VEGFA) in animal models of glomerulogenesis.86 SNPs in the GATM and CST3 gene regions have been associated repeatedly with GFR. Variants in the GATM gene, involved in creatine synthesis,87 associate with GFR estimated from serum creatinine (Fig 2B),43,59,69,70 and variants in the CST3 gene, encoding for cystatin C,88 associate with GFR estimated from cystatin C.43,57,59 These positive controls are valuable for 2 reasons: when data from different studies are combined, the consistent presence of their significant association in all data sets can help detect possible data formatting errors. Moreover, Fig 4

Anna Köttgen

Figure 4. Comparison of the strength of association with glomerular filtration rate (GFR) estimated from serum creatinine (eGFRcrea) to GFR estimated from serum cystatic C (eGFRcys). *Units on the x and y axes correspond to age- and sex-adjusted residuals of ln-transformed eGFR, an effect measure of 0.02 corresponds to approximately 1 mL/min/1.73 m2 of GFR. Error bars represent 95% confidence intervals. The grey line indicates associations of equal magnitude with both measures of kidney function, which would be expected for a genomic risk variant for decreased kidney function. Data from Köttgen et al.59

illustrates how the availability of different GFR estimation markers can be used to discriminate genes truly related to decreased kidney function from those related merely to serum concentrations of the estimation marker.43,59 Only a variant that shows association with GFR estimated from serum creatinine and with GFR estimated from serum cystatin C, such as the UMOD risk variant, likely is related to true GFR. Several GWAS of serum creatinine concentrations also have been published: a study in a sample of the general Japanese population identified variants in WDR72 (Table 3).71 This genomic region also was associated with eGFR in the recent study by the CKDGen Consortium.43 Pattaro et al69 identified variants in COL22A1, SYT1, and GABRR2 in association with serum creatinine levels in European population isolates, pointing toward the potential role of GABAA receptors in kidney function. Finally, a large study by Chambers et al70 very recently identified variants in SLC7A9, NAT8, SLC22A2, and TBX2 in association with serum creatinine, loci also reported in the study of eGFR by the CKDGen Consortium. Chambers et al70 emphasize the NAT8 gene at the chromosome 2p13 locus, which also con-

Genome-Wide Association Studies in Nephrology

tains the ALMS1 gene. They show that the encoded protein, N-acetyltransferase 8, is expressed in tubular cells of the renal cortex and highlight the role of a nonsynonymous SNP in NAT8, which may influence acetylation pathways in the kidney.70 However, results from the CKDGen Consortium investigators indicate that a SNP in NAT8 may influence expression levels of the neighboring ALMS1.43 This exemplifies that GWAS are suitable to identify genomic regions of interest, but that additional studies, including studies in model organism, are warranted to pinpoint the specific genes and mutations involved, as well as their mechanism of action. General Observations From GWAS of Kidney Diseases and Kidney Function Measures Several general observations can be made from GWAS of renal traits and kidney diseases, similar to other complex conditions.20,21 First, GWAS are a suitable method to successfully and reproducibly identify common risk variants associated with kidney diseases. Second, the associated variants each confer only moderate changes in disease risk (relative risk ⬍1.5) or mean levels of kidney function measures (see Tables 2 and 3; Fig 4). However, the combination of small effects across multiple independent risk variants can be additive.43,59 Third, the number of identified genomic susceptibility regions is dependent on the sample size used for gene discovery.43,59,70 Fourth, several of the implicated genomic loci, such as UMOD, SLC7A9, and SLC34A1, contain common variants of moderate effect identified through GWAS, as well as rare mutations of large effect, which were already known to cause monogenic diseases of the kidney. The “rediscovery” of so-called Mendelian genes through GWAS also has been observed for numerous other traits.89 This observation points toward the potential value of sequencing GWAS-discovered risk genes to identify rare variants of large effect, which cannot be detected using commercially available GWAS genotyping arrays. Fifth, although some genomic susceptibility regions were known through rare monogenic diseases, most of the identified associated regions have not been linked to kidney disease previously. Sixth, some of the genomic variants identified also show association to other common complex traits, such

753

as variants in PRKAG2 with hemoglobin concentration and hematocrit.90 It therefore will be an interesting and important question to study whether this represents common underlying mechanisms of diseases that currently are thought of as separate entities. Finally, it is important to keep in mind that in most cases, the associated SNPs are not causing disease themselves, but are merely markers of correlated causal variants. Even if the associated SNPs lead to nonsynonymous amino acid changes in the encoded proteins, experimental studies are necessary to determine whether these variants are functional and how these variants exert their effect. Strengths and Limitations of GWAS The main strength of a GWAS is its potential for gene discovery, identifying novel diseaseassociated regions and laying the foundation for novel hypotheses to be tested.89 This carries the potential to identify unknown underlying mechanisms of disease, extend current knowledge about physiologic and pathophysiologic processes, and ultimately lead to novel approaches for diagnosis, treatment, and prevention. The required concerted effort of multidisciplinary research teams can open the door to fruitful collaborations across disciplines. It recently has become apparent that the multiple novel risk variants discovered using the GWAS can account for only a small proportion of the familial clustering for most diseases.91 Interesting future questions therefore are to what extent genetic variation not currently captured through GWAS (rare variants and non-SNP variants), as well as gene-environment interactions, can add to the explanation of the “missing heritability.”91 A potential limitation to a GWAS is population stratification: the subdivision of a population into different groups with different marker allele frequencies and different disease prevalences potentially can cause spurious associations. However, statistical methods to address the effects of population stratification exist.33,92 In addition, little evidence was found of population stratification, and statistical correction was applied in published GWAS of kidney diseases such as CKD or kidney stone disease.43,44,59,60,70 In addition to potential false-positive findings (type I errors), false-negative findings (type II errors)

754

are a common concern in GWAS because many true findings do not reach the conservative threshold used to indicate statistical significance. Limitations specifically in nephrology research relate to phenotype definition. Kidney function is difficult to measure directly and GFR therefore is estimated from filtration markers in serum, which are influenced to some degree by nonrenal determinants and may not increase until advanced stages of disease.93

FOLLOW-UP OF GWAS FINDINGS, CLINICAL UTILITY, AND FUTURE PROSPECTS After initial gene discovery, follow-up on GWAS findings is essential to establish gene function, identify causal variants, characterize the genetic effect in diverse populations and under various exposures, and improve biological insights. Although it remains to be determined if and how knowledge obtained through GWAS may best be translated into clinical practice, follow-up projects of some of the findings have been initiated with exciting early results. For example, GWAS have led to major advances in understanding uric acid metabolism in humans. Initial GWAS of serum urate concentrations identified 2 associated genomic regions containing the genes SLC2A9 and ABCG2.94-97 It could be shown in subsequent experiments that both genes encode proteins that function as previously unknown urate transporters.96,98-100 In addition, it was possible to identify a functional variant in ABCG2 that causes deficient elimination of urate and increases gout susceptibility in humans.100 Another example is a follow-up project to the identification of CKD risk variants in the UMOD gene.59 In this study, the protein encoded by UMOD, Tamm-Horsfall protein, was measured in urine of study participants and related to incident CKD case-control status 10 years later, as well as to genotype at the UMOD CKD risk variant. This analysis showed that individuals with the genetic UMOD risk variant had higher levels of Tamm-Horsfall protein in urine, and increased Tamm-Horsfall protein levels in urine were associated with increased CKD risk.101 This initial small study can serve as a proof of principle that despite moderate disease risk conferred by genetic variants identified through

Anna Köttgen

GWAS, potential new markers of disease can be identified based on biological insights. This is of special interest in the field of nephrology research because serum concentrations of the most commonly used marker to estimate kidney function, serum creatinine, increase only after GFR has decreased by approximately 50%. Potential future applications could include measures on a population level (novel kidney disease biomarkers and innovative therapies based on novel target proteins and pathways), as well as on an individual level (risk prediction and identification of high-risk individuals for targeted prevention and intervention). Because genetic sequence information is believed to be stable over the life course and can be measured with little error, screening for the presence of high-risk variants could already occur early in life. In individuals at high “genetic” risk, other risk factors for the respective disease could be monitored more closely and modified more aggressively. The next few years will provide additional insight about the extent to which an individual’s entire genomic sequence needs to be known to truly be able to practice individualized medicine. The next steps in kidney disease genetics research include the exploration of additional phenotypes; for example, CKD progression, acute kidney injury, transplant rejection, or cancers of the kidney. Other steps include the extension of GWAS to study populations of non-European ancestry. Moreover, to better understand how genetic variation gives rise to the GWAS signals, it will be important to identify causal variants at the implicated loci through sequencing. These causal variants can help understand molecular mechanisms of disease through subsequent functional studies. Experimental studies also are needed to provide insights into gene function for new discoveries. Gene expression studies are a first step toward understanding how identified variants may exert their function; ideally, these studies should be conducted using gene expression data generated from renal tissues. It will be interesting to extend current statistical models to incorporate nonadditive genetic effects and specifically model gene-gene and gene-environment interaction. Other important steps include the identification of rare variants of large effects, which will be facilitated by exome and whole-

Genome-Wide Association Studies in Nephrology

genome sequencing, as well as targeted resequencing of genomic loci identified through GWAS. In this context, we will gather important insights about the role of rare variants,102 as well as about the importance of non-SNP genetic variation for kidney diseases. Finally, in the future, whole-genome sequencing in affected families may directly identify the underlying cause of rare kidney diseases. In summary, GWAS of kidney diseases and kidney function have provided exciting novel insights into genomic risk loci for diabetic nephropathy, CKD, kidney stone disease, decreased GFR, and other renal phenotypes during the past few years. Multidisciplinary research teams are required to follow up GWAS signals to identify and understand the underlying causal variants, as well as characterize the effect of these risk variants across populations and over time. GWAS represent a first step toward an improved understanding of biological mechanisms and carry the potential to eventually open up new avenues to the prevention and treatment of kidney diseases.

ACKNOWLEDGEMENTS Support: The author was supported by the Emmy Noether Programme of the German Research Foundation. Financial Disclosure: The author declares that she has no relevant financial interests.

REFERENCES 1. Meguid El Nahas A, Bello AK. Chronic kidney disease: the global challenge. Lancet. 2005;365(9456):331340. 2. Levey AS, Atkins R, Coresh J, et al. Chronic kidney disease as a global public health problem: approaches and initiatives—a position statement from Kidney Disease: Improving Global Outcomes. Kidney Int. 2007;72(3):247-259. 3. Coresh J, Selvin E, Stevens LA, et al. Prevalence of chronic kidney disease in the United States. JAMA. 2007; 298(17):2038-2047. 4. Zhang QL, Rothenbacher D. Prevalence of chronic kidney disease in population-based studies: systematic review. BMC Public Health. 2008;8:117. 5. Hallan SI, Coresh J, Astor BC, et al. International comparison of the relationship of chronic kidney disease prevalence and ESRD risk. J Am Soc Nephrol. 2006;17(8): 2275-2284. 6. Hallan SI, Ritz E, Lydersen S, Romundstad S, Kvenild K, Orth SR. Combining GFR and albuminuria to classify CKD improves prediction of ESRD. J Am Soc Nephrol. 2009;20(5):1069-1077. 7. Bash LD, Astor BC, Coresh J. Risk of incident ESRD: a comprehensive look at cardiovascular risk factors and 17

755 years of follow-up in the Atherosclerosis Risk in Communities (ARIC) Study. Am J Kidney Dis. 2010;55(1): 31-41. 8. Go AS, Chertow GM, Fan D, McCulloch CE, Hsu CY. Chronic kidney disease and the risks of death, cardiovascular events, and hospitalization. N Engl J Med. 2004;351(13): 1296-1305. 9. Hillege HL, Fidler V, Diercks GF, et al. Urinary albumin excretion predicts cardiovascular and noncardiovascular mortality in general population. Circulation. 2002; 106(14):1777-1782. 10. Fox CS, Larson MG, Leip EP, Culleton B, Wilson PW, Levy D. Predictors of new-onset kidney disease in a community-based population. JAMA. 2004;291(7):844-850. 11. Lash JP, Go AS, Appel LJ, et al. Chronic Renal Insufficiency Cohort (CRIC) Study: baseline characteristics and associations with kidney function. Clin J Am Soc Nephrol. 2009;4(8):1302-1311. 12. Fox CS, Yang Q, Cupples LA, et al. Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J Am Soc Nephrol. 2004;15(9):2457-2461. 13. Bochud M, Elston RC, Maillard M, et al. Heritability of renal function in hypertensive families of African descent in the Seychelles (Indian Ocean). Kidney Int. 2005;67(1): 61-69. 14. Lei HH, Perneger TV, Klag MJ, Whelton PK, Coresh J. Familial aggregation of renal disease in a populationbased case-control study. J Am Soc Nephrol. 1998;9(7):12701276. 15. Freedman BI, Volkova NV, Satko SG, et al. Population-based screening for family history of end-stage renal disease among incident dialysis patients. Am J Nephrol. 2005;25(6):529-535. 16. Satko SG, Sedor JR, Iyengar SK, Freedman BI. Familial clustering of chronic kidney disease. Semin Dial. 2007;20(3):229-236. 17. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118(5):1590-1605. 18. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of genetic association studies. Genet Med. 2002;4(2):45-61. 19. Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet. 2003;361(9360):865-872. 20. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9(5): 356-369. 21. Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322(5903):881-888. 22. Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362-9367. 23. McKnight AJ, Currie D, Maxwell AP. Unravelling the genetic basis of renal diseases; from single gene to multifactorial disorders. J Pathol. 2010;220(2):198-216.

756 24. Divers J, Freedman BI. Susceptibility genes in common complex kidney disease. Curr Opin Nephrol Hypertens. 2010;19(1):79-84. 25. Klein RJ, Zeiss C, Chew EY, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385-389. 26. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273(5281):15161517. 27. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature. 2001; 409(6822):860-921. 28. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001;291(5507):13041351. 29. Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucl Acids Res. 2001; 29(1):308-311. 30. International HapMap Consortium. The International HapMap Project. Nature. 2003;426(6968):789-796. 31. International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437(7063):1299-1320. 32. Frazer KA, Ballinger DG, Cox DR, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851-861. 33. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904-909. 34. Li Y, Willer C, Sanna S, Abecasis G. Genotype Imputation. Annu Rev Genomics Hum Genet. 2009;10:387406. 35. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23(10):1294-1296. 36. Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39(7): 906-913. 37. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559-575. 38. Neale BM, Purcell S. The positives, protocols, and perils of genome-wide association. Am J Med Genet B Neuropsychiatr Genet. 2008;147B(7):1288-1294. 39. Teo YY. Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure. Curr Opin Lipidol. 2008;19(2):133-143. 40. Gordon A, Glazko G, Qui X, Yakovlev A. Control of the mean number of false discoveries, Bonferroni and stability of multiple testing. Ann Appl Stat. 2007;1(1):179-190. 41. Pe’er I, Yelensky R, Altshuler D, Daly MJ. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008;32(4):381-385. 42. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40(5):638-645.

Anna Köttgen 43. Kottgen A, Pattaro C, Boger CA, et al. New loci associated with kidney function and chronic kidney disease. Nat Genet. 2010;42(5):376-384. 44. Pezzolesi MG, Poznik GD, Mychaleckyj JC, et al. Genome-wide association scan for diabetic nephropathy susceptibility genes in type 1 diabetes. Diabetes. 2009;58(6): 1403-1410. 45. Greene CN, Keong LM, Cordovado SK, Mueller PW. Sequence variants in the PLEKHH2 region are associated with diabetic nephropathy in the GoKinD study population. Hum Genet. 2008;124(3):255-262. 46. Craig DW, Millis MP, DiStefano JK. Genome-wide SNP genotyping study using pooled DNA to identify candidate markers mediating susceptibility to end-stage renal disease attributed to type 1 diabetes. Diabet Med. 2009; 26(11):1090-1098. 47. Hanson RL, Craig DW, Millis MP, et al. Identification of PVT1 as a candidate gene for end-stage renal disease in type 2 diabetes using a pooling-based genome-wide single nucleotide polymorphism association study. Diabetes. 2007; 56(4):975-983. 48. Millis MP, Bowen D, Kingsley C, Watanabe RM, Wolford JK. Variants in the plasmacytoma variant translocation gene (PVT1) are associated with end-stage renal disease attributed to type 1 diabetes. Diabetes. 2007;56(12):30273032. 49. Tanaka N, Babazono T, Saito S, et al. Association of solute carrier family 12 (sodium/chloride) member 3 with diabetic nephropathy, identified by genome-wide analyses of single nucleotide polymorphisms. Diabetes. 2003;52(11): 2848-2853. 50. Ng DP, Nurbaya S, Choo S, Koh D, Chia KS, Krolewski AS. Genetic variation at the SLC12A3 locus is unlikely to explain risk for advanced diabetic nephropathy in Caucasians with type 2 diabetes. Nephrol Dial Transplant. 2008;23(7):2260-2264. 51. Shimazaki A, Kawamura Y, Kanazawa A, et al. Genetic variations in the gene encoding ELMO1 are associated with susceptibility to diabetic nephropathy. Diabetes. 2005; 54(4):1171-1178. 52. Leak TS, Perlegas PS, Smith SG, et al. Variants in intron 13 of the ELMO1 gene are associated with diabetic nephropathy in African Americans. Ann Hum Genet. 2009; 73(2):152-159. 53. Pezzolesi MG, Katavetin P, Kure M, et al. Confirmation of genetic associations at ELMO1 in the GoKinD collection supports its role as a susceptibility gene in diabetic nephropathy. Diabetes. 2009;58(11):2698-2702. 54. Obara W, Iida A, Suzuki Y, et al. Association of single-nucleotide polymorphisms in the polymeric immunoglobulin receptor gene with immunoglobulin A nephropathy (IgAN) in Japanese patients. J Hum Genet. 2003;48(6):293299. 55. Ohtsubo S, Iida A, Nitta K, et al. Association of a single-nucleotide polymorphism in the immunoglobulin mubinding protein 2 gene with immunoglobulin A nephropathy. J Hum Genet. 2005;50(1):30-35. 56. Lou T, Zhang J, Gale DP, et al. Variation in IGHMBP2 is not associated with IgA nephropathy in independent studies of UK Caucasian and Chinese Han patients. Nephrol Dial Transplant. 2010;25(5):1547-1554.

Genome-Wide Association Studies in Nephrology 57. Hwang SJ, Yang Q, Meigs JB, Pearce EN, Fox CS. A genome-wide association for kidney function and endocrinerelated traits in the NHLBI’s Framingham Heart Study. BMC Med Genet. 2007;8(suppl 1):S10. 58. Kottgen A, Kao WH, Hwang SJ, et al. Genome-wide association study for renal traits in the Framingham Heart and Atherosclerosis Risk in Communities Studies. BMC Med Genet. 2008;9:49. 59. Kottgen A, Glazer NL, Dehghan A, et al. Multiple loci associated with indices of renal function and chronic kidney disease. Nat Genet. 2009;41(6):712-717. 60. Thorleifsson G, Holm H, Edvardsson V, et al. Sequence variants in the CLDN14 gene associate with kidney stones and bone mineral density. Nat Genet. 2009;41(8):926-930. 61. Kao WH, Klag MJ, Meoni LA, et al. MYH9 is associated with nondiabetic end-stage renal disease in African Americans. Nat Genet. 2008;40(10):1185-1192. 62. Kopp JB, Smith MW, Nelson GW, et al. MYH9 is a major-effect risk gene for focal segmental glomerulosclerosis. Nat Genet. 2008;40(10):1175-1184. 63. Freedman BI, Hicks PJ, Bostrom MA, et al. Polymorphisms in the non-muscle myosin heavy chain 9 gene (MYH9) are strongly associated with end-stage renal disease historically attributed to hypertension in African Americans. Kidney Int. 2009;75(7):736-745. 64. Pattaro C, Aulchenko YS, Isaacs A, et al. Genomewide linkage analysis of serum creatinine in three isolated European populations. Kidney Int. 2009;76(3):297-306. 65. Freedman BI, Hicks PJ, Bostrom MA, et al. Nonmuscle myosin heavy chain 9 gene MYH9 associations in African Americans with clinically diagnosed type 2 diabetes mellitus-associated ESRD. Nephrol Dial Transplant. 2009; 24(11):3366-3371. 66. Franceschini N, Voruganti VS, Haack K, et al. The association of the MYH9 gene and kidney outcomes in American Indians: the Strong Heart Family Study. Hum Genet. 2010;127(3):295-301. 67. Nelson GW, Freedman BI, Bowden DW, et al. Dense mapping of MYH9 localizes the strongest kidney disease associations to the region of introns 13 to 15. Hum Mol Genet. 2010;19(9):1805-1815. 68. Behar DM, Rosset S, Tzur S, et al. African ancestry allelic variation at the MYH9 gene contributes to increased susceptibility to non-diabetic end-stage kidney disease in Hispanic Americans. Hum Mol Genet. 2010;19(9):1816-1827. 69. Pattaro C, De Grandi A, Vitart V, et al. A meta-analysis of genome-wide data from five European isolates reveals an association of COL22A1, SYT1, and GABRR2 with serum creatinine level. BMC Med Genet. 2010;11(1):41. 70. Chambers JC, Zhang W, Lord GM, et al. Genetic loci influencing kidney function and chronic kidney disease. Nat Genet. 2010;42(5):373-375. 71. Kamatani Y, Matsuda K, Okada Y, et al. Genomewide association study of hematological and biochemical traits in a Japanese population. Nat Genet. 2010;42(3):210215. 72. Psaty BM, O’Donnell CJ, Gudnason V, et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective metaanalyses of genome-wide association studies from five cohorts. Circulation Cardiovasc Genet. 2009;2(1);73-80.

757 73. Tamm I, Horsfall FL Jr. Characterization and separation of an inhibitor of viral hemagglutination present in urine. Proc Soc Exp Biol Med. 1950;74(1):106-108. 74. Serafini-Cessi F, Malagolini N, Cavallone D. TammHorsfall glycoprotein: biology and clinical relevance. Am J Kidney Dis. 2003;42(4):658-676. 75. Hart TC, Gorry MC, Hart PS, et al. Mutations of the UMOD gene are responsible for medullary cystic kidney disease 2 and familial juvenile hyperuricaemic nephropathy. J Med Genet. 2002;39(12):882-892. 76. Vylet’al P, Kublova M, Kalbacova M, et al. Alterations of uromodulin biology: a common denominator of the genetically heterogeneous FJHN/MCKD syndrome. Kidney Int. 2006;70(6):1155-1169. 77. Rampoldi L, Caridi G, Santon D, et al. Allelism of MCKD, FJHN and GCKD caused by impairment of uromodulin export dynamics. Hum Mol Genet. 2003;12(24): 3369-3384. 78. Wilcox ER, Burton QL, Naz S, et al. Mutations in the gene encoding tight junction claudin-14 cause autosomal recessive deafness DFNB29. Cell. 2001;104(1):165-172. 79. Patterson N, Hattangadi N, Lane B, et al. Methods for high-density admixture mapping of disease genes. Am J Hum Genet. 2004;74(5):979-1000. 80. Dong F, Li S, Pujol-Moix N, et al. Genotypephenotype correlation in MYH9-related thrombocytopenia. Br J Haematol. 2005;130(4):620-627. 81. Plomin R, Haworth CM, Davis OS. Common disorders are quantitative traits. Nat Rev Genet. 2009;10(12):872-878. 82. Lee C, Le MP, Wallingford JB. The shroom family proteins play broad roles in the morphogenesis of thickened epithelial sheets. Dev Dyn. 2009;238(6):1480-1491. 83. Chang AC, Janosi J, Hulsbeek M, et al. A novel human cDNA highly homologous to the fish hormone stanniocalcin. Mol Cell Endocrinol. 1995;112(2):241-247. 84. Huang L, Garcia G, Lou Y, et al. Anti-inflammatory and renal protective actions of stanniocalcin-1 in a model of anti-glomerular basement membrane glomerulonephritis. Am J Pathol. 2009;174(4):1368-1378. 85. Hosaka K, Takeda T, Iino N, et al. Megalin and nonmuscle myosin heavy chain IIA interact with the adaptor protein Disabled-2 in proximal tubule cells. Kidney Int. 2009;75(12):1308-1315. 86. Eremina V, Quaggin SE. The role of VEGF-A in glomerular development and function. Curr Opin Nephrol Hypertens. 2004;13(1):9-15. 87. Wyss M, Kaddurah-Daouk R. Creatine and creatinine metabolism. Physiol Rev. 2000;80(3):1107-1213. 88. Abrahamson M, Olafsson I, Palsdottir A, et al. Structure and expression of the human cystatin C gene. Biochem J. 1990;268(2):287-294. 89. Hirschhorn JN. Genomewide association studies— illuminating biologic pathways. N Engl J Med. 2009;360(17): 1699-1701. 90. Ganesh SK, Zakai NA, van Rooij FJ, et al. Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium. Nat Genet. 2009;41(11):1191-1198. 91. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature. 2009; 461(7265):747-753.

758 92. Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4):997-1004. 93. Stevens LA, Coresh J, Greene T, Levey AS. Assessing kidney function–measured and estimated glomerular filtration rate. N Engl J Med. 2006;354(23):2473-2483. 94. Li S, Sanna S, Maschio A, et al. The GLUT9 gene is associated with serum uric acid levels in sardinia and chianti cohorts. PLoS Genet. 2007;3(11):e194. 95. Doring A, Gieger C, Mehta D, et al. SLC2A9 influences uric acid concentrations with pronounced sex-specific effects. Nat Genet. 2008;40(4):430-436. 96. Vitart V, Rudan I, Hayward C, et al. SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout. Nat Genet. 2008; 40(4):437-442. 97. Dehghan A, Kottgen A, Yang Q, et al. Association of three genetic loci with uric acid concentration and risk of gout: a genome-wide association study. Lancet. 2008; 372(9654):1953-1961.

Anna Köttgen 98. Anzai N, Ichida K, Jutabha P, et al. Plasma urate level is directly regulated by a voltage-driven urate efflux transporter URATv1 (SLC2A9) in humans. J Biol Chem. 2008; 283(40):26834-26838. 99. Caulfield MJ, Munroe PB, O’Neill D, et al. SLC2A9 is a high-capacity urate transporter in humans. PLoS Med. 2008;5(10):e197. 100. Woodward OM, Kottgen A, Coresh J, Boerwinkle E, Guggino WB, Kottgen M. Identification of a urate transporter, ABCG2, with a common functional polymorphism causing gout. Proc Natl Acad Sci U S A. 2009;106(25):1033810342. 101. Kottgen A, Hwang SJ, Larson MG, et al. Uromodulin levels associate with a common UMOD variant and risk for incident CKD. J Am Soc Nephrol. 2010;21(2):337– 344. 102. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB. Rare variants create synthetic genome-wide associations. PLoS Biol. 2010;8(1):e1000294.