Introduction

Serum urate concentrations represent a complex genetic trait, influenced by genetic variation in many genomic regions1,2,3, and by environmental influences such as intake of purine-rich food, alcohol, fructose, and certain diuretic medications4,5. Elevated serum urate levels, hyperuricemia, can lead to monosodium urate crystal deposition and thereby cause gout6, the most common type of inflammatory arthritis among adults. Moreover, observational studies show that serum urate levels are associated with various cardiometabolic risk factors including hypertension and obesity7,8,9. Gaining insights into the regulation of serum urate levels is important not only to uncover targets to develop urate-lowering and anti-inflammatory therapies to treat gout, but may also provide insights into shared pathways with cardiometabolic risk factors.

Previous studies reported the estimated heritability of serum urate levels in the general population as 30–70%3,10,11. Large-scale genome-wide association studies (GWAS) of serum urate levels have identified urate-associated genetic variants at >200 loci1,2,3,12. Genes in the identified loci suggest two major themes: the first is related to urate transport in the kidney and intestine, which determines net urate excretion, and the second is related to genetic co-regulation. The genes in loci most strongly associated with serum urate levels encode for urate transporters (e.g., SLC2A9, SLC22A12, ABCG2, SLC22A11, SLC17A1) or of substrates that may be exchanged for urate (e.g., SLC16A9), as well as for regulatory proteins of urate transporters (e.g., PDZK1)13,14. While the underlying causal variant rs2231142 at ABCG2, a missense variant causing reduced function, has been identified15, the causal mechanisms underlying the strongest genetic association signal with serum urate, which maps to the urate transporter gene SLC2A9, remain largely unknown16.

Regarding the potential genetic co-regulation between serum urate levels and multiple cardiometabolic risk factors such as triglyceride levels, blood pressure, obesity, and insulin resistance, we observed high genetic correlation between these risk factors and serum urate based on genome-wide genetic association statistics3. Such genetic co-regulation could explain the observed association of serum urate and cardiometabolic risk factors from epidemiological studies. It may, at least in part, be mediated by transcription factors with major regulatory roles in both liver and kidney such as HNF1A and HNF4A3. Genetic variants in these transcription factors are associated not only with serum urate levels, but also with impaired glucose handling and type 2 diabetes mellitus as well as serum cholesterol and triglyceride levels17,18,19, suggesting that coordinated gene regulation may have joint effects on serum urate and hepatic metabolism.

DNA methylation is an important mechanism of gene regulation and may reveal new insights into the biological processes that influence serum urate levels. We thus performed epigenome-wide association studies (EWAS) of serum urate levels with two scientific aims: first, to detect CpGs associated with serum urate levels and to investigate whether differential methylation may connect genetic risk variants of unknown molecular mechanism with serum urate levels. Second, to evaluate whether urate-associated differentially methylated CpGs are associated with cardiometabolic risk factors, pointing toward shared regulation of the implicated genes.

Here we perform meta-analyses of EWAS of serum urate levels followed by replication, using data from 17,996 individuals from four ancestry groups. Downstream characterization of urate-associated CpGs includes association with differential gene expression, Mendelian randomization (MR), and mediation as well as enrichment analyses. We show that significantly associated CpGs explain 11.6% of the serum urate variance in a dataset not included in the EWAS. Differential methylation of CpGs in SLC2A9 has causal effects on serum urate levels and mediates the effect of known urate-associated genetic variants. Urate-associated CpGs show associations with several cardiometabolic traits, consistent with their observational relationships with serum urate. Our study generates both independent and complementary insights to those obtained from GWAS of serum urate levels.

Results

Characteristics of the study participants

The discovery analysis included up to 12,474 participants from 16 cohorts (European ancestry [EA]: 6968, African Americans [AA]: 2101, South Asian ancestry [SA]: 2720, and Sub-Saharan Africans [SSA]: 685). The replication analysis included 5522 participants from 8 cohorts with one study (the Normative Aging Study) including only men (EA: 3338, AA: 2184). Across these cohorts, the median of the average age within each cohort was 56 (25th, 75th percentile: 51, 63), the median of the proportion of men was 47% (25th, 75th percentile: 39%, 49%), and the median of mean serum urate levels was 5.3 mg/dL (25th, 75th percentile: 5.1, 5.6; Supplementary Data 1 and Supplementary Note 1). Study-specific methods and information on blood cell type proportions used to estimate associations between urate and DNA methylation independent of cell type composition are reported in the Methods section and Supplementary Data 2 and 3. A flowchart of the meta-analyses and follow-up characterization is presented in Fig. 1.

Fig. 1: Flowchart of analyses.
figure 1

MR Mendelian randomization, GO Gene Ontology. Icons were downloaded from smart.servier.com under the Creative Commons Attribution 3.0 Unported License.

Discovery and replication of urate-associated CpGs

The discovery analysis identified 140 significant CpGs (p < 1.1E–7 = 0.05/441,854 CpGs analyzed), of which 100 replicated (consistent effect direction, p < 0.05 in the replication analysis, and overall meta-analysisp < 1.1E–7; Supplementary Data 4 and Fig. 2). The estimated inflation factor of the meta-analysis combining discovery and replication was 1.06 (Supplementary Fig. 1). The study-specific results had a median inflation factor of 0.99 (min: 0.86, max: 1.06, Supplementary Data 2), and were corrected prior to any meta-analysis when >1. The heterogeneity of most replicated CpGs was low to moderate: the median I2 from the meta-analysis of the 17 EA studies was 11% (25th, 75th percentile: 0.0, 31.3), from the five AA studies was 0% (25th, 75th percentile: 0.0, 35.1), and from the overall meta-analysis of the four ancestry groups was 30.2% (25th, 75th percentile: 0.0, 60.0, Supplementary Data 4). Of the 32 CpGs that showed heterogeneity >50% in the meta-analysis of the four ancestry groups, 90% (29 CpGs) had effect estimates in the same direction across the ancestry groups (Supplementary Fig. 2A–AF). Ancestry-specific results of CpGs with p < 1.1E–7 are reported in Supplementary Data 5 [EA], 6 [AA], and 7 [SA] if present, and with p < 1E–5 otherwise (Supplementary Data 8 [SSA]).

Fig. 2: CpGs in the EWAS of serum urate from the combined meta-analysis of discovery and replication results (n = 17,996).
figure 2

The CpGs are ordered by their chromosomal position on the x-axis with their –log10(p value) of the association on the y-axis. CpGs with positive and negative effect estimates are plotted in the upper and lower panels, respectively. The dotted horizontal lines represent the level of significance corrected for multiple testing (two-sided p < 1.1E–7). Black: replicated CpGs; brown: gene with replicated CpGs associated with gene expression in monocytes and significant causal effects for serum urate; blue: replicated CpGs with significant causal effects on serum urate or gout; red: replicated CpGs associated with gene expression in monocytes or whole blood. Gene names are displayed if the gene had at least one CpG with p value < 1E–14. MR Mendelian randomization.

The follow-up analyses focused on the replicated CpGs from the overall meta-analysis, in which the CpG with the lowest p value was at SLC7A11 (cg06690548, p = 2.18E–59). Among the ten CpGs with the lowest p values, two mapped to PHGDH (cg14476101, p = 5.87E–38; cg16246545, p = 1.77E–24), which encodes for 3-phosphoglycerate dehydrogenase, and five to SLC2A9 (cg21795255, cg20479063, cg11266682, cg00071950, cg13841979; p < 1E–25), which encodes the urate transporter GLUT9 and is the locus with the largest effect size in GWAS of serum urate3. The probe cg21795255 at SLC2A9 had a common SNP at the extension base of the DNA methylation probe (rs3796839, minor allele frequency of 46% in GnomAD)20 and was excluded from all follow-up analyses, which therefore included 99 replicated CpGs annotated to 81 genes. Of these, 24 genes contained replicated CpGs with significant findings from our follow-up analyses and are featured in Table 1.

Table 1 Genes having replicated CpG(s) associated with gene expression in whole blood or monocytes, with significant causal effects for urate or gout, associated with cardiometabolic traits, or annotated as a GWAS locus for urate or gout.

Heritability of urate-associated CpGs and explained serum urate variance

DNA methylation can be influenced by genetics and environmental factors21. Heritability of DNA methylation levels at a CpG estimates the proportion of variance in DNA methylation levels at the CpG that can be attributed to additive genetics; it does not refer to germline inheritance of DNA methylation. Across three datasets from populations of EA, the heritability estimates of the replicated CpGs varied (Supplementary Data 9). The mean heritability estimates for the 99 replicated, urate-associated CpGs was higher than the mean heritability across all CpGs assessed in each of the three studies: 0.36 (urate-associated CpGs) vs. 0.16 (HM450K array) based on Hannon et al.22, 0.54 vs. 0.19 for van Dongen et al.23, and 0.52 vs. 0.19 for McRae et al.24. Among the seven replicated sites at SLC2A9, the mean heritability combining the three datasets ranged from 0.39 at cg03725404 to 0.93 at cg11266682. These observations suggest that the DNA methylation at some of the urate-associated CpGs may reflect the effect of common genetic variants on serum urate levels.

The proportion of serum urate variance explained by the replicated, urate-associated CpGs was estimated in a separate study sample using the coefficient of determination (R2) obtained from linear regression (Methods). Compared to an age- and sex-adjusted model, the replicated CpGs explained an additional 11.6% of the serum urate variance. In a recent GWAS of serum urate based on data from 457,690 individuals, replicated genetic index variants explained 7.7% of the age- and sex-adjusted serum urate variance3. The higher proportion of urate variance explained by the replicated CpGs as compared to common genetic variants may partly reflect environmental influences on serum urate levels.

Causal effects of CpGs on serum urate and gout

To evaluate whether the urate-associated CpGs might causally affect serum urate levels, we used two-sample MR analysis, where genetic variants associated with DNA methylation of a CpG (methylation quantitative trait locus, meQTL) in cis (<500 kb) were used as proxies or genetic instruments of the CpGs. Cis meQTLs were available for 27 of the 99 replicated CpGs from a meta-analysis of cohorts of EA in the Genetics of DNA Methylation Consortium (GoDMC, Methods). Combined with summary statistics of a GWAS of serum urate among EA individuals, we found evidence for significant causal effects of DNA methylation on serum urate levels at four CpGs at SLC2A9 (p < 1.9E–3 = 0.05/27) based on the multiplicative random effect inverse-variance weighted method, our primary method (Methods, Table 2). In sensitivity analysis, all significant causal effects were supported by three or more other methods that are robust to pleiotropy (Supplementary Data 10). After additionally removing the meQTLs that were correlated with any of the five GWAS index SNPs at SLC2A9 among EA persons (Methods), cg13841979 was the only CpG with ≥4 meQTLs for MR analysis and showed nominally significant causal effects on serum urate levels (p = 2.25E–2, Supplementary Data 10). This attenuation of causal effect is consistent with the observation that DNA methylation at cg13841979 mediated the effects of two urate GWAS index SNPs on serum urate, as reported below.

Table 2 CpGs with significant causal effects on serum urate or gout from Mendelian randomization analysis.

As expected, the causal effects of the four CpGs on serum urate were consistent with their effect direction from EWAS. At cg11266682, a promoter-associated CpGs, higher DNA methylation levels were associated with higher serum urate levels (effect size: 0.21 mg/dL per standard deviation [SD] of DNA methylation beta value, p = 8.8E–04). In contrast, for three CpGs further downstream of the transcription, higher DNA methylation levels were associated with lower serum urate levels (effect size: −0.65 to −0.46 mg/dL per SD of DNA methylation beta value, p < 2.1E–4). At all four CpGs, little evidence of pleiotropy, a threat to MR assumptions, was detected based on the Egger intercept test (all p > 0.14). The observed heterogeneity of the meQTL effects on serum urate levels (p heterogeneity <7.54E–239; Table 2 and Supplementary Figs. 3A to 3D) suggests complex genetic influences on DNA methylation in this region. Together, these observations at SLC2A9 are consistent with a causal effect of differential DNA methylation on serum urate levels.

The same 27 CpGs that were tested for causal effects on serum urate were also tested for their causal effects on gout. The CpG cg03725404 at SLC2A9 showed a significant causal effect on gout (p < 1.9E–3 = 0.05/27, Table 2). Consistent with its causal effect on lower serum urate, cg03725404 conferred lower odds for gout, with an odds ratio of 0.43 per SD of DNA methylation beta value (p = 2.98E–9). In sensitivity analysis, the causal effect of cg03725404 for gout was supported by three MR methods robust to pleiotropy (Supplementary Data 11). The scatter plots of the effects of the meQTLs on DNA methylation and gout along with the regression slope from the MR methods showed consistency among MR methods (Supplementary Fig. 4). The meQTLs included in the MR analysis for serum urate and gout were independent of each other (Supplementary Figs. 5A–D and 6, respectively). Leave-one-out analysis showed that the significant causal effects of DNA methylation on serum urate and gout were not driven by any single meQTL at any of the CpGs (Supplementary Fig. 7A–D for serum urate and Supplementary Fig. 8 for gout). Forest plots of the effects of the meQTLs showed that the majority of the meQTLs supported the significant causal effects, despite the presence of heterogeneity (Supplementary Fig. 9A–D for serum urate and Supplementary Fig. 10 for gout). Together, these findings show robust results for the performed MR analyses and support a causal effect of differential DNA methylation at SLC2A9 on serum urate levels and gout risk. The reverse MR analysis that assessed the potential causal effects of serum urate on DNA methylation levels did not detect any significant findings among the 99 tested CpGs (Supplementary Note 2).

CpGs at SLC2A9 mediate genetic effects

Given that four CpGs at SLC2A9 had significant causal effects on serum urate, we hypothesized that these causal effects may mediate genetic effects at this locus. We previously identified four independent intronic or intergenic independent index SNPs in two neighboring 1 Mb regions at SLC2A9 in a large-scale GWAS meta-analysis of serum urate of EA populations3. Therefore, we first tested whether the four CpGs mediated the effects of any of four independent SNPs on serum urate among EA participants of the ARIC study (n = 637). Two CpGs (cg11266682 and cg13841979) had significant mediating effects (p < 0.0125) for two of the independent index SNPs (rs10017305 and rs6825187; Supplementary Data 12). We next assessed these findings in two additional studies, KORA (Cooperative Health Research in the Region of Augsburg, n = 1636) and SHIP (Study of Health in Pomerania, n = 223), and observed significant evidence for DNA methylation as a mediator of the effects of these SNPs on serum urate levels. From the meta-analysis of all three studies, the mediating effects ranged from 21% by cg13841979 for rs10017305 (mediating effect: 0.05 mg/dL, p = 4.7E–07) to 43% by cg11266682 for rs6825187 (mediating effect: 0.10 mg/dL, p = 3.0E–5, Supplementary Data 13).

Urate-associated CpGs, gene expression, and chromatin accessibility

An important function of DNA methylation is the regulation of gene expression25. To determine whether the urate-associated CpGs were associated with gene expression in blood cells, we used summary statistics of DNA methylation and gene expression from monocytes and whole blood. Gene expression in monocytes is particularly relevant for SLC2A9, which is mainly expressed in monocytes and myeloid dendritic cells among the blood cell types (Supplementary Fig. 11)26. Of the 99 replicated CpGs, 26 CpGs were significantly associated with gene expression in cis in monocytes (p < 5E–4; Supplementary Data 14)27. Some of the genes harbored multiple urate-associated CpGs (two at PHGDH, five at SLC2A9, six at SLC1A5). At PHGDH and SLC1A5, higher DNA methylation levels were associated with lower gene expression, with CpGs mapping to intron 1 or intron 2 of the gene, depending on the reference isoform (Fig. 3). The inverse association between DNA methylation levels in intron 1 and gene expression has been reported to be conserved across vertebrates and is found in multiple human tissues28. Interestingly, at SLC2A9, the effect direction on gene expression for the five CpGs differed by genomic location (Fig. 4A, B). The three CpGs annotated as promoter-associated and located at introns 1 or 2 of isoform 1 (cg02734326, cg00071950, cg11266682) showed inverse associations with gene expression (e.g., cg11266682: effect −0.037, p = 4.16E–15; Fig. 4A and Supplementary Data 14). These three promoter-associated CpGs were associated with higher serum urate levels in EWAS (Fig. 4C). In contrast, cg03725404, located further downstream in the direction of transcription, also showed inverse associations with gene expression (effect −0.024, p = 4.75E–6, Fig. 4B) but was associated with lower serum urate levels in EWAS (Fig. 4C). Both cg11266682 and cg03725404 were also shown to have significant causal effects on serum urate, as reported above. The different directions of association across CpGs in SLC2A9 indicate complex regulation. Assuming that DNA methylation affects gene expression levels, which in turn influence serum urate levels, the observation that higher DNA methylation levels at the three promoter-associated CpGs in SLC2A9 were associated with lower gene expression in monocytes and higher serum urate levels (Fig. 4B, C) allows for the inference that SLC2A9 expression in monocytes due to differential methylation at these promoter-associated CpGs has an inverse relationship with serum urate levels. As depicted in Fig. 5A, cg11266682 was associated with lower gene expression levels but showed a positive, causal effect on serum urate levels. In contrast, cg03725404, a CpG further downstream in the direction of transcription, was also associated with lower gene expression but had an inverse, causal effect on serum urate levels (Fig. 5B).

Fig. 3: Associations of CpGs at PHGDH and SLC1A5 with serum urate levels.
figure 3

For both PHGDH (A) and SLC1A5 (B), the upper part shows chromosomal positions on the x-axis and the –log10(p value) on the y-axis, and the lower part shows the correlations between DNA methylation levels of the CpGs. The replicated CpGs are labeled. Promoter-associated annotation was based on the HM450K annotation file. The gene models were based on RefSeq curated genes. Associations of DNA methylation with monocyte gene expression were from Kennedy et al. BMC Genomics 2018 and with whole blood from the KORA study (Methods). The correlations between the CpGs were generated using DNA methylation data from 804 European American participants of the ARIC study. Gene models were based on Genbank RefSeq (Accession numbers. PHGDH, NM_006623; SLC1A5 isoform 1, NM_005628; isoform 2, NM_001145144; isoform 3 NM_001145145). assoc. associated.

Fig. 4: Associations of CpGs at SLC2A9 with serum urate levels and gene expression in monocytes.
figure 4

CpGs in the SLC2A9 region (A) with the effect size of DNA methylation on gene expression in monocytes at five replicated CpGs (B) and the effect size of serum urate on DNA methylation levels at these same five CpGs (C). All labeled CpGs were replicated. In the legend, MR indicates CpGs with a significant causal effect on serum urate levels based on Mendelian randomization analysis. Colors of the estimates in panels B and C match the color legend in panel A. Promoter-associated annotation was based on the Illumina HM450K annotation file. The gene models were based on RefSeq curated genes. The correlations between the CpGs were generated using DNA methylation data from 804 European Americans participants of the ARIC study. The association between DNA methylation and monocyte gene expression in panel B is based on Kennedy et al. BMC Genomics 2018 (n = 1202). The DNA methylation estimates in panel C are based on the meta-analysis combining discovery and replication cohorts in the present study (n = 17,996). Genbank RefSeq accession numbers: isoform 1 (NM_020041), isoform 2 (NM_001001290). MR Mendelian randomization, assoc. associated.

Fig. 5: Conceptual figure summarizing patterns of relationships between DNA methylation at SLC2A9, its expression in monocytes, and serum urate levels.
figure 5

The figure focuses on the two CpGs with significant causal effects on serum urate and association with gene expression in monocytes. The inferred relationship between gene expression and serum urate levels is inverse for the promoter-associated CpG cg11266682 (A, orange shading), and concordant for CpG cg03725404 (B, blue shading). Solid compound arrows indicate both observed cross-sectional association and causal effect of DNA methylation on serum urate based on Mendelian randomization analysis. Solid arrows indicate observed cross-sectional associations. Dashed arrows indicate inferred relationships.

For gene expression in whole blood, eight CpGs showed significant associations (Supplementary Data 15). For all seven CpGs with significant associations with gene expression in both monocytes and whole blood (two in PHGDH, one in PARP3, two in UAP1L1, and two in SLC1A5), the directions of effect in the two gene expression datasets were consistent. CpGs at SLC2A9 were not associated with gene expression in whole blood.

There are two main transcripts of SLC2A9. Isoform 1 encodes the long version of the GLUT9 protein, which has been reported to localize to the basolateral membrane of renal proximal tubule cells, where it mediates urate reabsorption29. We investigated whether the DNA methylation pattern of the urate-associated CpGs in the kidney may be similar to those observed in whole blood. Whole genome bisulfite sequencing (WGBS) of kidney samples (n = 5) showed lower DNA methylation levels at the urate-associated CpGs in the promoter-associated region of isoform 1, and the pattern of DNA methylation level aligned with gene expression levels from RNA-sequencing (Fig. 6A). We further studied whether the urate-associated CpGs mapped into open chromatin regions, where transcription factor binding may occur. ATAC-seq data from primary human kidney tissue (n = 3 each from micro-dissected cortex and medulla) showed that the three promoter-associated CpGs, and—to a lesser extent cg20479063—mapped to DNA sequences with high chromatin accessibility in the kidney (Fig. 6B). Taken together, the data from the kidney samples suggest that the urate-associated CpGs at SLC2A9 are in a regulatory region with potential effects on the expression of SLC2A9.

Fig. 6: Whole genome bisulfite sequencing and ATAC-sequencing data of kidney tissue at SLC2A9.
figure 6

Whole genome bisulfite sequencing of kidney samples (n = 5) showed that lower DNA methylation levels at CpGs localized to the promoter-associated region in isoform 1 on SLC2A9 and aligned with gene expression levels from RNA-sequencing (A). ATAC-sequencing data from primary human kidney tissue (n = 3 from micro-dissected cortex and medulla each) showed that the three promoter-associated CpGs, and—to a lesser extent cg20479063—mapped to DNA sequences with high chromatin accessibility in the kidney (B). Gene models were based on Genbank RefSeq (accession numbers: isoform 1, NM_020041; isoform 2, NM_001001290).

Lastly, we investigated differential gene expression in kidney tissue using two resources. Of the six urate-associated CpGs in SLC2A9 that were associated with differential SLC2A9 expression in monocytes, four did not show significant associations based on DNA methylation and gene expression levels quantified from micro-dissected human tubule tissue, and two did not meet quality control criteria (Methods, Supplementary Data 16). In the future, investigation of specific SLC2A9 isoforms and/or of specific kidney cell types may shed light on the potential different effects of DNA methylation on gene expression levels in the kidney and monocyte. Second, we evaluated whether the genes implicated by the urate-associated CpGs from our study showed evidence for differential expression in kidney or intestine of a humanized mouse model of hyperuricemia (Methods). Of the genes nearest to the urate-associated CpGs, four genes were differentially expressed at false discovery rate (FDR) < 0.05 (ANKRD11, CEBPB, SLC7A11, and UAP1L1). SLC7A11 had lower expression in the intestine of mice with hyperuricemia (log2 fold change: −1.34, FDR = 2.9E–2). ANKRD11 had lower expression in the kidney (log2 fold change: −0.24, FDR = 5.6E–3; Supplementary Data 17). These findings are consistent with the urate-associated CpGs being downstream of factors influencing serum urate levels in mice.

Enrichment of urate-associated CpGs in DNase I hypersensitive sites and transporters

The urate-associated CpGs were enriched in DNase I hypersensitive sites of all blood cells tested (FDR q < 0.01), with the strongest enrichment observed in CD3 cells (q < 1e–9). Among histone modifications, H3K4me1, often found at active and primed enhancers, and H3K4me3, often found at active promoters, were enriched in all blood cells tested. While the former were most enriched in primary monocytes and primary hematopoietic stem cells, H3K4me3 showed the strongest enrichment in primary hematopoietic stem cells (Supplementary Figs. 12 and 13)30,31. In contrast, H3K9me3 and H3K27me3, generally associated with heterochromatin30,31,32,33, were not significantly enriched in any blood cells or tissues tested (Supplementary Figs. 14 and 15). In addition, among transcription factor binding sites, the urate-associated CpGs were most enriched in POLR2A, the largest subunit of RNA polymerase II, the major enzyme synthesizing mRNA in eukaryotes (p < 1E–11, Supplementary Fig. 16). Together, the observed enrichments suggest a role of the urate-associated CpGs in transcriptional regulation.

We also assessed the enrichment of genes implicated by urate-associated CpGs in Gene Ontology (GO) terms, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathways (Methods). After correction for multiple testing, significant enrichment was observed for 55 terms (Supplementary Data 18). The terms or pathways with the strongest enrichment were related to transmembrane transport of organic acids, carboxylic acids, amino acids, organic and inorganic anions, as well as processes in leukocytes and myeloid cells (Fig. 7). These terms suggest that differentially methylated CpGs capture information about small molecule transport in urate homeostasis.

Fig. 7: GO terms and KEGG and Reactome pathways that were enriched for urate-associated CpGs.
figure 7

Significance level was set at false discovery rate <0.05. GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes.

Urate-associated CpG sites and other traits

Our prior work on the genetics of serum urate revealed significant genetic correlations between serum urate and many cardiometabolic traits3. Using published EWAS summary statistics, we investigated whether these relationships were also observed when studying epigenetic variation by examining the association between replicated urate-associated CpGs and other cardiometabolic traits (Methods, Supplementary Data 19). Liver and kidney traits were also included due to the important role of these two organs in urate production and excretion34. Most of these traits were included as covariates in the EWAS of serum urate with the goal of detecting urate-specific DNA methylation signals (Methods). Of the replicated urate-associated CpGs, 17 had significant associations with at least one cardiometabolic trait (Fig. 8 and Supplementary Data 20). Interestingly, the directions of association were always consistent with the correlation between serum urate and these traits in epidemiological studies (Fig. 8)7,35,36,37. A few urate-associated CpGs showed associations with a particularly high number of cardiometabolic traits: cg14476101 annotated to PHGDH (six traits), cg06690548 annotated to SLC7A11 (seven traits), and cg02711608 annotated to SLC1A5 (six traits). Lower DNA methylation level were associated with higher serum urate levels, as were as higher levels of body mass index (BMI), blood pressure, triglycerides, liver enzymes, C-reactive protein (CRP), and incident diabetes in their respective EWAS (Fig. 8). These traits represent components of the metabolic syndrome or are strongly correlated with the components38, as is serum urate37. Given that the EWAS of the cardiometabolic traits adjusted for white blood cell composition in their analysis, the associations between DNA methylation and traits at the CpGs are independent of cell type composition. Together, these associations at these CpGs may reflect DNA methylation signatures of metabolic syndrome in blood. In contrast, none of the 99 replicated urate-associated CpGs were associated with fasting glucose and insulin among individuals without diabetes, who are less likely to be affected by the metabolic syndrome. The urate-associated CpGs were also not found among those associated with high-density and low-density cholesterol (HDL-C, LDL-C) and estimated glomerular filtration rate (eGFR). In addition, 4 of the 17 CpGs that were associated with serum urate and cardiometabolic traits (cg16246545 at PHGDH, cg19693031 at NBPF20/TXNIP, cg06690548 at SLC7A11, and cg06690548 at CPT1A) were associated with mostly lipid-related metabolites, quantified using nuclear magnetic resonance (Supplementary Data 21)39. The associations of these four CpGs were largely observed with triglycerides in various lipid subfractions or lipid metabolites that are part of very low-density lipoprotein and had concordant effect directions as their associations with serum urate.

Fig. 8: Urate-associated CpGs that were associated with cardiometabolic and kidney traits.
figure 8

All traits that were included in the lookup are shown, including those without reported associations with urate-associated CpGs. Blue: positive association between DNA methylation and trait levels; Red: inverse association between DNA methylation and trait levels. BMI body mass index, SBP systolic blood pressure, DBP diastolic blood pressure, TG triglycerides, HDL-C high-density lipoprotein cholesterol, LDL-C low-density lipoprotein cholesterol, inc. incident, GGT Gamma-glutamyl transferase, ALT alanine aminotransferase, CRP C-reactive protein, eGFR estimate glomerular filtration rate. Genetic correlations were reported in Tin et al. Nat Genet 2019. *Genetic correlation between serum urate and hypertension was 0.39. No EWAS of hypertension was available; SBP and DBP were used instead.

Discussion

This large-scale EWAS, based upon data from up to 17,996 participants largely drawn from population-based studies, identified and replicated 99 CpGs, at which differential DNA methylation was significantly associated with serum urate levels. The genes implicated by these CpGs were strongly enriched in terms and/or pathways related to small molecule transport, including organic anions and amino acids. MR analyses supported significant causal effects of some CpGs on serum urate levels and gout at SLC2A9, the strongest GWAS locus for serum urate that encodes a major urate transport protein in the kidney. Mediation analyses supported that genetic variants at SLC2A9 affect serum urate levels partly via epigenetic mechanisms. Moreover, 17 urate-associated CpGs showed significant associations with numerous cardiometabolic traits, with the direction of association always consistent with the ones between serum urate and the respective cardiometabolic trait reported from observational studies. Differential DNA methylation at these sites may therefore reflect an epigenetic signature of cardiometabolic traits in whole blood.

Results from dedicated EWAS of serum urate levels have not been reported previously. One previous EWAS of blood metabolite levels in one of our contributing studies, KORA F4, examined urate as one of 649 metabolites, and reported significant associations between DNA methylation at cg00071950 in SLC2A9 and serum urate levels40. This EWAS meta-analysis of measured serum urate levels lends further support to the reproducibility and validity of this prior finding through successful replication in different study populations, association with gene expression, and biological plausibility.

In addition to replicated associations between DNA methylation and serum urate, our study reveals significant causal effects of DNA methylation at some CpGs of SLC2A9 on serum urate and gout. Prior exposure to serum urate in its soluble or crystal form has been shown to heighten the proinflammatory response of myeloid cells in vitro and in animal models potentially through epigenetic mechanisms41. This is also known as urate-induced training immunity. In our study, MR analysis did not identify significant causal effects of serum urate on DNA methylation, but it is conceivable that serum urate might act on other forms of epigenetic mechanisms, such as histone modification42.

Serum urate levels show high genetic and observational correlations with kidney function measures and with cardiometabolic traits. Previous large EWAS of kidney function measures focused on eGFR in population-based studies43 or on changes in kidney function among patients with kidney disease44. Despite the strong, inverse genetic and observational correlations between serum urate and eGFR and the well-established role of the kidney in serum urate homeostasis, the replicated urate-associated CpGs in our study did not show significant associations when assessed in EWAS of eGFR. In contrast, many urate-associated CpGs showed significant associations in results from EWAS of cardiometabolic traits, including triglycerides45, blood pressure46, BMI47, liver enzymes48, CRP49, as well as diabetes50. Many of these traits are either conditions that define the metabolic syndrome, or directly related to such conditions (high triglyceride levels, high blood pressure, high BMI, high blood glucose levels)38. There has been no robust evidence supporting causal effects of serum urate on cardiometabolic traits51,52. Instead, our observations are consistent with shared gene regulatory programs resulting in a common DNA methylation signature of serum urate and metabolic syndrome in whole blood.

Our study also provides additional insights into a major transporter of serum urate. SLC2A9 has consistently been detected as the strongest locus in GWAS of serum urate with multiple independent signals1,3,16,53,54,55,56,57, but the mechanism underlying the GWAS signal is largely unknown. The association between genetic variants at SLC2A9 and serum urate levels may be mediated by transcript- or tissue-specific regulatory mechanisms. SLC2A9 has two described isoforms whose gene products have equivalent urate transport activity but differ in the N-terminal cytosolic portion of the protein58. SLC2A9 shows high complexity in human kidney tissue; the two isoforms of SLC2A9 have been reported to differ in cellular localization possibly affecting their function in urate homeostasis. Isoform 1 was found to be expressed at the basolateral side of the proximal tubule, while isoform 2 was expressed at the apical side of the collecting duct29. The functional consequences of this complexity are not yet clear. In our study, we observed complex relations of DNA methylation levels at SLC2A9 with gene expression in monocytes and serum urate. Our findings support an inferred inverse relationship between DNA methylation-relatedSLC2A9 expression in monocytes and serum urate levels at some CpGs. An inverse relationship between GLUT9 expression in the kidney and serum urate would be difficult to explain. Loss-of-function mutations at SLC2A9 in humans result in type 2 familial renal hypouricemia (OMIM# 612076) supporting a role for GLUT9 as a high-capacity urate transporter in tubular urate reuptake from urine into blood. Therefore, a positive association between SLC2A9 gene expression in the kidney and serum urate levels would be expected58.

However, SLC2A9 is expressed in a range of extra-renal tissues, with high abundance in the liver and lower abundance in the heart, lung, and intestine59,60,61. In mouse hepatocytes, GLUT9 is expressed in the basolateral membrane60, across which urate uptake from blood into the cell can occur62. Therefore, the role of GLUT9-mediated transport of urate may potentially affect serum urate levels in different directions, depending on transcript isoform and tissue localization. For example, loss of function mutations in dogs and liver-specific inactivation of SLC2A9 in a mouse model result in hyperuricemia62,63. In addition, basolateral localization of GLUT9 has also been reported in mouse enterocytes of the small intestine64, and enterocyte-specificSlc2a9knock-out resulted in reduced intestinal urate excretion and hyperuricemia65. A role for SLC2A9 in intestinal excretion would also be consistent with an inverse relationship between DNA methylation-relatedSLC2A9 expression and serum urate levels. Therefore, it is conceivable that the DNA methylation signals we detected at SLC2A9 in blood represent a signature for urate transport into enterocytes, hepatocytes, and/or other cell types, as a constituent of a urate secretion pathway, a mechanism that would lower serum urate levels. Roles for SLC2A9 in urate uptake into cells in both secretory and reabsorption pathways may also help explain some of the heterogeneity of associations of SNPs in the SLC2A9 locus3. Regardless of the mechanism, the finding that urate-associated genetic variants at SLC2A9 could affect serum urate levels via epigenetic mechanisms may be of potential clinical relevance, as each SD higher DNA methylation at CpGs at SLC2A9 conferred up to 57% lower odds of gout, the most common form of inflammatory arthritis in adults.

Terms or pathways enriched for urate-associated CpGs were mainly related to transmembrane transport of organic acids, carboxylic acids, amino acids, and organic and inorganic anions. This does not only align with uric acid as an organic acid, but also with several genes encoding enzymes and transporters related to amino acid generation and movement across membranes. For example, PHGDH encodes 3-phosphoglycerate dehydrogenase, which catalyzes the oxidation of 3-phosphoglycerate to 3-phosphohydroxypyruvate, the committed step in the biosynthesis of L-serine. Inhibition of PHGDH alters nucleotide metabolism via disruption of mass balance within central carbon metabolism, leading to changes in the pentose phosphate pathway and the tri-carboxylic acid cycle66. In addition, PHGDH also has a role in lipid metabolism. Knockdown of PHGDH in liver cell lines resulted in lower expression of LPL, the gene encoding lipoprotein lipase, and higher expression of LDLR and ABCA1, both important in lipid transport67. The genes SLC1A5, SLC7A11, and SLC43A1 encode transport proteins for neutral amino acids68, cysteine as well as glutamate69, and for branched-chain amino acids70, respectively. The CpGs in genes involved in the transport of small molecules such as amino acids, represented the CpGs with the strongest associations in this EWAS of serum urate. These CpGs were also associated with other cardiometabolic traits. Taken together, these results suggest that shared epigenetic regulatory mechanisms may have an important role in urate homeostasis15,71,72,73.

Strengths of our study include its large sample size, the replication of findings in separate study samples, and the use of rigorous statistical methods. The agreement of findings based on multiple layers of complementary evidence and the biological plausibility of many of the results support the validity of our findings. These findings provide new insights into the epigenetic regulation of serum urate levels that are complementary to genetic association studies of serum urate. Potential limitations of our study relate to the quantification of DNA methylation from whole blood or from specific blood cell populations, which may not be representative of other important organs of urate homeostasis such as the kidney or liver. However, serum urate was also measured in blood, where its levels may be influenced by metabolic functions beyond simply reflecting purine metabolism. In addition, serum urate concentrations play an important role in gout, making blood an attractive target tissue for studies of urate homeostasis and gout.

In summary, meta-analyses of EWAS of serum urate levels and downstream characterization identify urate-associated CpGs that explain a substantial proportion of serum urate variance, mediate some of the genetic effects on serum urate levels, and show associations with several cardiometabolic traits consistent with the observed relationships between serum urate and these traits. These findings constitute complementary evidence to insights from GWAS of serum urate levels.

Methods

Study samples and workflow

The discovery and replication analyses were conducted based on a pre-specified analysis plan. A script (https://github.com/genepi-freiburg/ckdgen-pheno/tree/ckdgen-ewas-pheno) was circulated to all participating cohorts (Supplementary Data 1) to generate summary files that were inspected by a central analysis team. All studies were community-based or non-clinical populations. Each cohort conducted study-specific EWAS and uploaded epigenome-wide summary statistics. Before the meta-analysis, we conducted quality control of study-specific results, which included the assessment of the distributions of effect sizes, standard errors, and p values within each study and across studies. Supplementary Data 2 reports study-specific methods, including the DNA methylation detection p value and processing pipeline. Supplementary Data 3 reports the white blood cell proportions in each cohort, which were used as covariates to estimate associations between urate and DNA methylation independent of cell type composition. Study research protocols were approved by the respective ethics committees. All participants in all studies provided written informed consent.

DNA methylation quantification and quality control

Genomic DNA was extracted from peripheral blood. Levels of DNA methylation were quantified using the Infinium MethylationEPIC BeadChip array (EPIC, six studies), the Illumina Infinium HumanMethylation450K BeadChip array (HM450K, 17 studies), or the Illumina Infinium HumanMethylation27 BeadChip array (HM27K, one study). DNA methylation data preprocessing was performed according to individual study protocols, which included background correction, quantile normalization, probe filtering, sample filtering, SNP matching to the SNP control probe locations, outlier filtering, and assay type correction (Supplementary Data 2).

Study-specific EWAS

Ancestry-specific association analyses were conducted within each study. Among the 24 contributing cohorts, 14 cohorts excluded participants who were on urate-lowering medications, whereas the other cohorts did not have this information available (Supplementary Data 1). The methylation levels at each CpG probe were represented as beta values, which can be interpreted as the proportion of CpG sites that were methylated. The beta values were analyzed as the dependent variable with serum urate as the independent variable in a linear regression model adjusting for age, sex, BMI, current smoking, eGFR, HDL-C, systolic blood pressure (SBP), log-transformed levels of CRP and triglycerides, genetic principal components (PC), and estimated or measured blood cell type proportions in order to allow for detecting associations between serum urate and DNA methylation levels independent of differences in cell type composition74. Additional study-specific technical covariates included control probe PCs5, study center, processing batch, sentrix ID, and sentrix position, as applicable.

Discovery and replication meta-analysis

Studies were separated into discovery and replication cohorts by chronological order of contribution to this project (Supplementary Data 1). Only CpG sites common to both the EPIC and HM450K were included in the meta-analysis (n = 441,854). CpG probes overlapping with SNPs were annotated using information from Illumina.

Prior to meta-analysis, each set of study-specific results was adjusted for test statistic inflation using the BACON method, which was developed to control for bias and inflation in EWAS using an empirical null distribution approach and assumes that the test statistics are a mixture of three distributions: negative, positive and null associations75. The inflation factor is estimated from the distribution of null association. We conducted inverse-variance weighted fixed effects meta-analysis using the R package metafor (version 2.1-0) for discovery, replication, and for combining the results of discovery and replication76. CpG sites were excluded if their available sample size was less than half of the sample size in the respective meta-analyses, or if their I2 heterogeneity estimate was >95%.

The statistical significance threshold for the discovery step was p < 1.1E–7 (=0.05/441,854). The replication criteria were: Pdiscovery < 1.1E–7, Preplication < 0.05 with consistent effect direction, and Pcombined < 1.1E–7. For the replicated CpGs, we quantified study heterogeneity within an ancestry using the I2 from a meta-analysis including all studies within an ancestry, namely EA (17 studies) and AA (5 studies). We further quantified heterogeneity among the four ancestry groups using the I2 from a transethnic meta-analysis that combined the summary statistics of the meta-analyses of EA, and AA with those from the LOLIPOP study (SA), and the RODAM study (SSA). Given that there was only a single study each including participants with SA and SSA ancestry, the I2 from this transethnic meta-analysis might represent both ancestry and study heterogeneities.

Heritability of replicated CpG sites and explained urate variance

We used three sources of DNA methylation heritability estimates from whole blood to characterize the replicated urate-associated CpGs: (a) a twin study of 1464 individuals from Great Britain [Hannon et al.], (b) a twin study of 2386 individuals from the Netherlands [van Dongen et al.], and (c) a family-based study of 614 individuals from Australia [McRae et al.]22,23,24. Hannon et al. included 426 monozygotic (MZ) and 306 same-sex dizygotic (DZ) twin pairs to estimate the proportion of the variance of DNA methylation explained by additive genetic (A), shared environment (C), and unshared environmental (E) factors22. Van Dongen et al. included 769 MZ and 424 DZ twin pairs for the ACE model analysis23. McRae et al. studied 614 individuals from 117 families and used intraclass correlation analysis to partition the variance of DNA methylation into additive genetics and environmental components24. DNA methylation levels of these three studies were quantified using the HM450K array.

The proportion of serum urate variance explained was calculated based on data from 1832 participants of the KORA-FF4 study with DNA methylation levels quantified using blood samples collected in 2014. Among the participants of the KORA-FF4 study, 988 were also participants of the KORA F4 cohort, one of the discovery cohorts, with DNA methylation levels quantified using blood samples collected in 2007, 7 years earlier than those in the KORA-FF4 study. Data were available for 97 of the 99 replicated, urate-associated CpGs (missing observations for cg08257009 and cg05201185). To enable comparison of the proportion of explained urate variance to the one obtained from age- and sex-adjusted GWAS summary statistics3, two models were computed: a base model of the proportion of urate variance explained by age and sex, and an extended model that additionally included residuals from a regression of the CpGs on sex, age, and cell type composition. The proportion of serum urate variance explained by the replicated CpGs was then calculated as the difference in coefficient of determination (R2) between the two models.

Forward Mendelian randomization (MR) analysis: causal effects of DNA methylation on serum urate and gout

To evaluate whether the urate-associated CpGs might have a causal role in influencing serum urate levels or gout, we conducted two-sample MR analyses using meQTLs as genetic instruments of the replicated CpGs. The random assignment of alleles during meiosis mimics treatment allocation in randomized controlled trials. Using meQTLs as proxies or genetic instruments of DNA methylation enables inferences on the causal effects of DNA methylation on serum urate or gout, provided that the meQTLs meet the instrumental variable assumptions: (1) the meQTLs are associated with DNA methylation, (2) the meQTLs are not associated with potential confounders of the association between DNA methylation and serum urate or gout, and (3) the meQTLs are only associated with serum urate or gout through DNA methylation, i.e., without pleiotropic effects77. While assumptions 2 and 3 cannot be fully verified, we have addressed these assumptions in our selection criteria of meQTLs and analysis methods as detailed below.

The meQTLs of the CpGs were selected two steps. In the first step, we addressed the three instrumental variable assumptions. To address assumption 1, we selected independent cis meQTLs as genetic instruments with strong association with DNA methylation (p < 5E–8). To address assumptions 2 and 3, we removed those with genome-wide significant association (p < 5E–8) with potential confounders, and with stronger associations with the outcome than with DNA methylation, which suggests pleiotropic effects of a meQTL or may lead to reverse causation. In this first step of instrument selection, we selected cis meQTLs for each CpG (within 500 kb of the CpG) from the meta-analysis of the GoDMC with MAF > 1% and p < 5e–8 to avoid weak instrument bias78. The meQTLs were further pruned based on the following criteria: (a) a clumping threshold of r2 < 0.05, (b) associated with potential confounders (BMI, current smoking, HDL-C, SBP, CRP, and triglycerides) at p < 5E–8, (c) failed Steiger filtering, i.e., displayed stronger association with the outcome than the exposure, and (d) failed harmonization checks of the harmonise_data function in the TwoSampleMR package, which matches the effect allele of the SNPs associated with the exposure and the outcome and flags palindromic SNPs with allele frequency close to 0.5 as ambiguous79,80. The sources of the GWAS summary statistics of the potential confounders are reported in Supplementary Data 22. In the second step, to ensure that the meQTLs for each CpG were indeed conditionally independent, we conducted conditional analysis using Genome-wide Complex Trait Analysis (GCTA) to obtain the conditional p value for each meQTL controlling for other meQTLs of the same CpG (command: cojo-cond)81. As the reference panel, we used 13,558 randomly selected individuals of British descent, and 16,969,363 SNPs with MAF > 0.01% after quality control as reported previously3. We retained only meQTLs with conditional p < 5E–8 as genetic instruments of the CpGs. The median numbers of meQTLs among the 92 replicated CpGs outside of SLC2A9 was 2 (25th, 75th percentile: 0, 5), whereas it was 20 (25th, 75th percentile: 8, 21) among the 7 CpGs at SLC2A9.

The genetic association summary statistics for meQTLs, serum urate, and gout were generated among EA individuals. The meQTL summary statistics were obtained from a meta-analysis of the GoDMC Consortium (max. n = 27,750 from 36 EA studies), which included some of the participating studies of the EWAS of serum urate82. DNA methylation in GoDMC was measured in whole blood or cord blood using the HM450K or EPIC BeadChips. The 1000 Genomes reference panel was used for genotype imputation. The analysis in each cohort had two steps. First, residuals of DNA methylation beta values were generated by adjusting out age, sex, predicted white blood cell type proportions, predicted smoking status based on published results of smoking-associated DNA methylation, and genetic PCs. Next, these residuals were rank transformed and standardized and used for genetic association analysis with adjustment for relatedness in family-based studies. The SNP effect sizes can be interpreted as SD higher DNA methylation per allele. The meta-analysis was conducted in two phases. First, each GoDMC study analyzed the association between all SNPs and all CpGs, returning only associations with p < 1E–5. Next, these associations with p < 1e–5 were combined to create a candidate list of meQTL associations, which included cis associations found in at least one dataset and trans associations in at least two datasets. The association statistics of these candidate meQTLs (n = 120,212,413) were obtained from all cohorts, and then combined using fixed-effect meta-analyses82.

The GWAS summary statistic for serum urate (n = 288,649) and gout (n = 692,537) were obtained from recent GWAS meta-analyses of EA, samples from the CKDGen Consortium3. We required each CpG to have at least four cis (<500 kb) meQTLs as genetic instruments. As the primary MR analysis method, we used the multiplicative random effect inverse-variance weighted method as recommended by the guidelines for MR investigation77. We used the Egger intercept to evaluate potential pleiotropy, and Cochran’s Q test for assessing heterogeneity78,83,84. Sensitivity analysis using other MR methods that are robust to pleiotropy were conducted: MR Egger, simple mode, weighted median, and weighted mode85,86,87. To assess whether a causal estimate may be largely driven by a single genetic instrument, we conducted leave-one-out analysis. To assess whether the causal effects of the CpGs at SLC2A9 on serum urate were partly due to GWAS signals at this locus, we conducted an additional sensitivity analysis excluding meQTLs with r2 > 0.05 with any of five GWAS index SNPs at SLC2A9 (rs4447862, rs10017305, rs6825187, rs62286563, and rs73224492) among individuals of EA, reported in Tin et al.3. The SNP rs4447862 had the lowest p value in the SLC2A9 region, and the other four SNPs were detected as genome-wide significant independent SNPs by the GCTA stepwise model selection command (cojo-slct)88. All MR analyses were conducted using the TwoSampleMR package79. Calculations of the minimum detectable effect size in the forward MR analysis of DNA methylation on serum urate or gout supported that the MR analysis was well powered (Supplementary Note 3).

Reverse MR analysis: causal effect of serum urate on DNA methylation

To evaluate the potential causal effects of serum urate on DNA methylation levels, we conducted two-sample reverse MR analysis. The genetic instruments for serum urate were 123 index SNPs from a recent GWAS meta-analysis of 288,649 EA participants3. The associations between these SNPs and DNA methylation levels were generated among 3866 EA participants of the Framingham Heart Study, a sample independent of the EWAS of serum urate89. The ARIC study was not included in the reverse MR because most meQTL data were from AA participants, whereas the urate summary statistics were based on data from EA individuals. The DNA methylation levels were quantified as beta values from the HM450K array. The analysis of the association between SNPs and DNA methylation levels adjusted for age, sex, the top 50 methylation PCs, predicted blood cell fractions, and used linear mixed model to account for family structure. The reverse MR did not use the GoDMC meQTL results employed in the forward MR analysis, because the GoDMC meQTL results only contained SNPs associated with DNA methylation levels below a certain significance level, as reported above. Given that the urate index SNPs were selected in separate 1-Mb regions across the genome, we did not conduct GCTA conditional analysis to reassess the independence among these index SNPs. Otherwise, the selection of the SNPs for serum urate, their harmonization, and MR methods for primary and sensitivity analysis were the same as those in the forward MR analysis.

Mediation analysis

Given that four CpGs at SLC2A9 showed significant causal effects for serum urate, we evaluated whether these CpGs may mediate the genetic effects of the EA index SNPs of serum urate at the SLC2A9 locus3. As reported previously, four independent SNPs in two neighboring 1 Mb intervals in the SLC2A9 region were identified based on GCTA stepwise model selection3,81,88. Mediation analyses of the four CpGs for the four index SNPs were first conducted among 637 EA ARIC participants, controlling for age, sex, study center, and ten genetic PCs using the mediation package version 4.5 in R90. This mediation method uses simulation to estimate the average causal mediation effect in a potential outcome framework. The significance threshold was set at 0.0125 (0.05/4 independent index SNPs). For the significant index SNP-CpG pairs, mediation analyses were performed in the SHIP-Trend (n = 223) and KORA (n = 1636) cohorts. The mediation effects of each CpG were then combined using fixed-effect inverse-variance meta-analysis as implemented in metafor76.

Association with gene expression and chromatin accessibility

We investigated the association of the urate-associated CpGs with gene expression in cis in monocytes and whole blood. Monocyte data were obtained from Kennedy et al., who assessed these associations among 1202 individuals from the Multi-Ethnic Study of Atherosclerosis27. DNA methylation levels were quantified using the HM450K array, and gene expression levels were quantified using Illumina HumanHT-12 version 3.0 and 4.0 Expression BeadChips. The regression analysis used log-transformed expression levels as the dependent variable and DNA methylation beta values as the independent variable controlling for age, race, sex, and study site.

The association of DNA methylation levels with gene expression in whole blood was obtained from 713 participants of the KORA study. The DNA methylation levels were quantified using the HM450K array, and gene expression levels were quantified using Illumina HumanHT-12 v3 Expression BeadChips. The log2-transformed gene expression values were regressed on the DNA methylation beta values adjusted for sex and age. Prior to the analysis the technical factors as well as the blood cell type proportions were regressed out of the mRNA and DNA methylation levels, and their residuals were included in the final association model. Statistical significance for the association between urate-associated CpGs and gene expression was defined as p < 5E–4 (=0.05/99).

To evaluate whether urate-associated CpGs at SLC2A9 might be associated with gene expression in the kidney, we selected five CpGs at the SLC2A9 locus that were significantly associated with SLC2A9 gene expression in monocytes (cg03725404, cg20479063, cg02734326, cg00071950, cg11266682). The DNA methylation and gene expression levels were quantified from tubulo-interstitial kidney tissue of 314 persons. The analysis used gene expression (transcripts per million) as the dependent variable and DNA methylation beta values as the independent variable controlling for age, sex, five genetic PCs, prevalent diabetes and hypertension, BMI, batch factors, bisulfite conversion rate, RNA-sequencing batch, and RNA integrity number.

To explore the epigenetic and gene expression landscape in the SLC2A9 locus in kidney tissues, we looked up the DNA methylation profiles detected by WGBS and gene expression profiles from RNA-sequencing in five human normal kidneys (GEO accession GSE115098)91. The methylation beta values from WGBS and read counts from RNA-sequencing at the SLC2A9 locus (GRCh37/hg19, chr4:9884000-10067000) were converted to bigwig format. The bigwig files were imported into IGV genome browser for visualization. In addition, we investigated whether the urate-associated CpGs at SLC2A9 might be mapped into open chromatin regions in the kidney using data generated from ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing). Kidney cortex and medulla tissues were macro-dissected from uninvolved portions of tumor nephrectomy specimens and snap-frozen. ATAC-seq was performed by ActiveMotif (Carlsbad, California) and the subsequent data were mapped to GRCh38 and normalized by read depth. Peak calling was performed using MACS 2.1.0 and filtered against the ENCODE blacklist. Master list generation and set operations with the resulting bed files were performed using bedops. GRCh38 positions were mapped back to GRCh37/hg19 using the online liftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver)

Differential gene expression in humanized mouse model of hyperuricemia

To investigate whether the closest genes of the urate-associated CpGs might be differentially expressed in hyperuricemic condition, we obtained gene expression data from a humanized mouse model of hyperuricemia. The orthologous mouse variant to the human ABCG2 Q141K (ABCG2 Q140K) was knocked in to the endogenous Abcg2 locus using CRISPR/Cas9 on a C57BL/6J background, resulting in hyperuricemia in adult male Q140K+/+ mice as described previously64. Animal studies were performed in adherence to the NIH Guide for the Care and Use of Laboratory Animals and approved by the University of Maryland School of Medicine Institutional Animal Care and Use Committee. Mice were housed in groups of two to five per cage on a 12:12 h light/dark cycle; with lights on at 6 a.m., only male mice were used for the analysis. Animal organs were harvested and preserved in RNALater (Sigma R0901) and gradually cooled from 4 °C to −80 °C. RNA was isolated using the RNeasy Plus Mini Kit (Qiagen 74134) per the manufacturer’s protocol. RNA was suspended in molecular biology grade water and then quantified with a CLARIOstar plate reader and stored at −80 °C. Illumina Sequencing Libraries were prepared using manufacturer’s protocol for NEB Ultra II Directional RNA Library Prep kit with poly-A enrichment (NEB E7760). Samples were sequenced on four flowcell lanes of an Illumina HiSeq4000 75 bp paired end run. Five samples were sequenced in each flowcell lane. Samples were grouped per lane by tissue type. Sample quality assessment, RNA-Seq library preparation, and sequencing were performed by the Genomics Resource Center at the University of Maryland School of Medicine. RNA-Seq data was stored and analyzed on BasePair (https://app.basepairtech.com/) for expression count (STAR) and differential expression (DESeq2) analyses. Mouse gene names were mapped to human gene names in HUGO Gene Nomenclature using the Mouse Genome Informatics database. FDR for differential expression was calculated using the Benjamini–Hochberg method. We consider FDR < 0.05 as being significant.

Enrichment analyses

To inform the potential functional effects of the urate-associated CpGs, we assessed the enrichment of these CpGs in sites of DNase I or histone modification (H3K4me1, H3K4me3, H3K9me3, H3K27me3), gene sets based on GO terms and pathways in the KEGG and Reactome databases92,93,94,95. The enrichment analyses of DNase I or histone modification were performed using a local version of eForge version 2.0 (experimentally derived Functional element Overlap analysis of ReGions from EWAS)96. Briefly, eForge used date sources from either the ENCODE (125 samples) or Roadmap Epigenomics (299 samples) projects generated by the Hotspot method97,98,99. The overlaps between the input CpGs and the data sources were compared with those from 1000 random sets of CpGs with matching gene relationship and CpG island annotation. FDR was obtained based on the binomial distribution and the Benjamini–Yekutieli method for multiple testing correction. Our input included 1018 urate-associated CpGs with p < 1E–05 in the combined meta-analysis of the discovery and replication cohorts. We performed 10,000 resampling runs with an active proximity filter and considered FDR < 0.01 as significant, i.e., enrichment >99 percentile.

Enrichment in gene sets or pathways was performed using the methylGSA package and R version 3.6.1100. The enrichment test method (methylglm) was a functional class scoring method implemented using logistic regression accounting for the number of CpG sites per gene and the autosomal background that overlaps the HM450K and EPIC arrays. Gene sets or pathways with 100–500 genes were tested (default setting). We considered a gene set or pathway to be significantly enriched at FDR < 0.05 correcting for multiple testing within each database using the Benjamini and Hochberg method101.

Relationship of urate-associated CpGs with cardiometabolic traits

Our prior work on the genetics of serum urate revealed significant genetic correlations between serum urate and many cardiometabolic traits and showed that genes mapping into urate-associated loci were highly enriched for expression in kidney and liver3. Other than gout, cardiometabolic traits with high genetic correlations (r2 > 0.35) included triglycerides, HDL cholesterol, diabetes, fasting insulin, hypertension, and BMI. To gain insights on whether these relationships were also observed when studying DNA methylation in blood, we investigated whether the urate-associated CpGs were also identified in EWAS of the respective cardiometabolic traits as well as in EWAS of eGFR and liver enzymes. We also included CRP, given that CRP is a marker of inflammation, a component of metabolic syndrome38. When more than one EWAS were available for a trait, EWAS were prioritized for the larger sample size. The primary focus was placed on the interpretation of the direction of association of DNA methylation with serum urate and the other traits. The CpGs included in the lookup were those replicated in the EWAS of urate and also considered significant in the EWAS of the other traits based on criteria of the respective study (Supplementary Data 19). All studies included in this lookup controlled for age, sex, and white blood cell type proportions for detecting independent association between DNA methylation and the trait. More details are provided in Supplementary Data 19. When a study included replication and provided meta-analysis results of discovery and replication, the effects from the meta-analysis combining discovery and replication were used.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Disclaimer

The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute, the National Institutes of Health, or the US Department of Health and Human Services.