Introduction

Overweight and obesity are described as an excessive adiposity or fat accumulation, often eliciting multiple health conditions leading to health impairments or even death1,2,3. Trends of overweight and obesity have been increasing over the past three decades, globally and across all age4. In a systematic analysis of global studies, the prevalence of overweight and obesity was estimated to have rose by 27.5% for adults and 47.1% for children and adolescents (2–19 years) between 1980 and 2013, hence, leading to an estimated increase in the number of overweight and obese individuals from 857 million to 2.1 billion during the same period4. In 2016, at least 340 million children and adolescents (5–19 years) were overweight or obese as per the World Health Organization (WHO) estimates3. In the United States alone, prevalence estimates of overweight and obesity among children and adolescents (2–19 years) also increased across race/ethnicities from 1966 to 1967 through 2017–2018, reaching 16% and 19%, respectively, or approximately 14.4 million in total5.

The physiological complications of overweight and obesity in children, adolescents and adults are well documented in the literature6,7. Adult onset of type 2 diabetes, hypertension, cardiovascular diseases, pulmonary, endocrine and gastrointestinal disorders as well as premature death were identified as major consequences of childhood and adult obesity7,8.

Obesity in adolescents tends to persist in adulthood. In a 12-year prospective study in Slovenia, more than half of 7-year-old overweight and obese children remained overweight or obese at the age of 189. In a longitudinal follow-up of individuals enrolled in 1996 through 2007–2009 surveys of U.S. National Longitudinal Study of Adolescent Health, obese adolescents were 16 times more likely to develop severe adulthood obesity than normal or overweight adolescents10. Very high adolescent BMI (≥ 85th percentile) was associated with 30–40% higher adult mortality rate, as shown in a longitudinal assessment of national health surveys conducted in Norway between 1963 and 199911.

Overweight and obesity are multifactorial with numerous identified risk factors. Genetic susceptibility is considered one of these risk factors, yet monogenic or polygenic forms of obesity are rare and complex interactions between genetic and environmental factors are biologically plausible7,12,13. While the genetic factors are measured in genome-wide association studies (GWAS), exposome studies are needed to capture all environmental factors complementary to the genome that influence health14. In the case of obesity and overweight, dietary behaviors (e.g., increased fat and calorie intake), lifestyle factors (e.g., physical inactivity and sedentary behavior), and psychological states (e.g., emotional stress and maladaptive coping strategies) are all considered possible risk factors of obesity onset and projection and their comprehensive examination is warranted for developing effective disease prevention programs7.

In an approach similar to GWAS, exposome-wide association studies (ExWAS) are often based on an agnostic, untargeted and hypothesis-generating approach aiming to identify associations between environmental factors and diseases outcomes15,16,17,18. The ExWAS approach was firstly used to explore the association of numerous nutrition and environmental factors with various health outcomes in adults, such as abdominal obesity19, blood pressure20, diabetes mellitus15 and telomere length21. The application of ExWAS in studying excess weight in younger age groups is still limited to a single study investigating the association between polycyclic aromatic hydrocarbons with childhood obesity among 6–17 years participants using data from the 1999–2016 National Health and Nutrition Examination Survey (NHANES) waves22.

Given the public health significance of the rising trends of overweight and obesity in younger age groups worldwide, comprehensive, data-driven and hypothesis-generating approaches are warranted to elucidate the role of environmental origins of obesity. Using data from the NHANES 2003–2004 and 2013–2014 waves, this study aimed to explore and describe the association between multiple lifestyle, clinical or other exposures and body mass index (BMI) of U.S. adolescents aged 12–18 years, using the exposome framework.

Materials and methods

Study population

We used the National Health and Nutrition Examination Survey (NHANES) data collected in the 2003–2004 and the 2013–2014 waves. We included participants between 12 and 18 years old at the time of the interview, who reported not being pregnant and not being told by a doctor or health professional they have diabetes, with non-missing BMI value. The selection of two surveys 10 years apart aimed to allow the description of exposome profile variations among adolescents in the United States.

The NHANES is a program of studies conducted every 2 years by the U.S. Center for Disease Control and Prevention (CDC) in a nationally representative multistage probability sample of non-institutionalized population living in the United States23. Each of the cross-sectional datasets includes demographic, dietary and health-related questions collected during the interview, in addition to laboratory tests, medical, dental and physiological measurements collected during the physical examination at the mobile examination centers23. All methods were performed in accordance with the relevant guidelines and regulations. The NCHS Ethics Review Board approved the survey protocols and written parental informed consent and child assent for participants ages 12–18 years were obtained for examination at the mobile examination center24.

Study outcome

The study outcome was age- and sex-standardized body mass index (BMI) z-scores calculated using standard deviation scores from the U.S. CDC 2000 growth chart25. To describe weight status category, we used the 2010 CDC terminology for children overweight and obesity. The BMI z-score percentiles were used to classify children as being “underweight” (< 5th percentile); “healthy weight” (5th percentile < BMI z-score < 85th percentile); “overweight” (85th percentile ≤ BMI z-score ≤ 95th percentile); and “obese” (> 95th percentile)26.

Selection of explanatory variables

We screened the data files from each of the five NHANES sections (i.e., demographics; dietary; examination; laboratory; and questionnaire data) of 2003–2004 and 2013–2014 surveys. In the initial screening, we selected variables: (i) available in both surveys with similar response format; (ii) targeting adolescents (12–18 years old); (iii) being relevant to the objective of the study (i.e., excluding: language spoken at home, audiometry examinations, dermatology, etc.); and (iv) for the laboratory variables, we included only parameters measured in individual samples (i.e., excluding brominated flame retardants (BFRs) that were measured in pooled samples). As a result, we retained 139 variables belonging to the following NHANES sections: laboratory data (75, i.e., clinical and biomarkers of environmental chemical exposures), dietary (63), and questionnaire (1) (Fig. 1 and Supplementary material).

Figure 1
figure 1

Initial selection of exposome variables from 2003–2004 and 2013–2014 NHANES waves.

Laboratory variables comprised the following measurements: serum and urinary albumin, serum cotinine, standard biochemistry, complete blood count and glycohemoglobin (n = 45); urinary measurements of phthalates (n = 9); perfluoroalkyl and polyfluoroalkyl substances (polyfluoroalkyl chemicals) (n = 7); urinary polyaromatic hydrocarbons (n = 6); blood cadmium, lead and mercury (n = 3); urinary environmental phenols (n = 1); urinary arsenic (n = 1); urinary iodine (n = 1); urinary mercury (n = 1); urinary perchlorate (n = 1). The 63 dietary variables referred to those collecting information about the dietary intake consumed during the 24-h period prior to the interview. The physical activity variable was retained from the questionnaire section.

Variables transformation

All variables were evaluated as either continuous or categorical. For dietary and laboratory continuous variables, zero values were substituted with 1e−10; and all values were log-transformed, then scaled and centered. When necessary, categorical variables were recoded for a harmonized coding between the two survey datasets. For physical activity, a positive answer to vigorous physical activity “over past 30 days” (in 2003–2004 dataset) and “during a typical week” (in 2013–2014 dataset) were treated similarly. Particularly, participant’s educational level was categorized according to NHANES coding available in 2003–2004 demographic questionnaire: “Less than high school”, “High school including GED” and “More than high school”. We kept the grouping of race/ethnicity as included in NHANES questionnaire: Mexican American, Non-Hispanic White, Non-Hispanic Black, other Hispanic and other ethnicities. Age and poverty income ratio (PIR)—a ratio of family income to poverty threshold—was treated as a continuous variable. Household smoking was coded as “Yes/No”; with an answer of 1 or more smoking household members recoded as “Yes” in the 2013–2014 dataset to match the variable assessment in the 2003–2004 dataset.

Statistical analysis

To account for the complex survey design, non-response and post-stratification, survey-weighted analysis was conducted using the appropriate sample weight as per NHANES analytical guidelines27. For each survey dataset, we created survey design datasets comprising the variables retained from each section along with their assigned sample weight: the demographic variables with the interview weight (wtint2yr); the laboratory variables and the physical activity with the Mobile Examination Center (MEC) exam weight (wtmec2yr); and the dietary variables with the dietary day 1 sample weight (wtdrd1). Specifically, for 2003–2004 dataset, polyfluoroalkyl, arsenic and mercury  variables using sub-sample A weight (wtsa2yr); phthalates and polyaromatic hydrocarbons variables using sub-sample B weight (wtsb2yr); environmental phenols, iodine and perchlorate variables using sub-sample C weight (wtsc2yr). As for 2013–2014 dataset, sub-sample A weight was used for arsenic, iodine, urine mercury, polyaromatic hydrocarbons and perchlorate variables; sub-sample B weight was used for environmental phenols, polyfluoroalkyl chemicals and phthalates data. For cadmium, lead and blood mercury variables, the MEC exam weight (wtmec2yr) and blood metal weight (wtsh2yr) were used for 2003–2004 and 2013–2014 datasets, respectively. For correlation between laboratory (standard biochemistry) and dietary explanatory variables, we applied the least common denominator approach and hence we used the weights of the dietary variables.

We considered the 2003–2004 dataset for discovery of associations between study population characteristics and zBMI, and replicated the analysis on the 2013–2014 dataset. Descriptive statistics and correlations between all explanatory variables were calculated separately for the discovery and replication datasets. Survey-weighted regressions were conducted to explore the associations between each of the explanatory variables and BMI z-score (univariable) and after adjusting for age, sex, race/ethnicity, household smoking and income to poverty ratio (multivariable). We applied the Benjamini–Hochberg method28 to generate false discovery rate (FDR) adjusted p-values separately for the univariable and multivariable analyses per dataset (FDR correction). Predictors with FDR adjusted p-value < 0.05 were considered significant. Next, we identified predictors that were commonly significant in the multivariable analysis of both the discovery and replication datasets.

To evaluate the possible interaction between sex, the statistically significant predictors retained in the previous step and zBMI, we repeated the multivariable analysis by adding the interaction term, separately for the discovery and replication datasets.

Additionally, a sensitivity analysis was conducted by excluding adolescents classified as obese and repeating univariable and multivariable regression on each of 2003–2004 and 2013–2014 datasets. All of the analysis was conducted using R version 4.1.229, and the R studio version 2022.02.130. The scripts used in the analysis as well as the detailed output of the analysis are available in the Supplementary Information. All survey-weighted descriptive and explanatory analyses were conducted using R survey package31,32,33. Data, scripts and outputs are available here: https://doi.org/10.5281/zenodo.6558979.

Ethics approval and consent to participate

The NCHS Ethics Review Board approved the survey protocols and informed consent was obtained for all subjects.

Consent for publication

No individual data that consent for publication shall be acquired is present in this manuscript.

Results

Descriptive characteristics

Following the selection criteria, 1899 and 1224 participants in total, were eligible from the 2003–2004 and 2013–2014 survey datasets, respectively (Fig. 2). The weighted mean age of adolescents was similar across the two, datasets with a mean of 15 (standard error (SE) 0.1) years of age. Similarly, almost half of our study population was males; non-Hispanic whites (64.5% and 54.2% in 2003–2004 and 2013–2014 survey datasets, respectively) having less than high school education (90.5% and 91.8% in 2003–2004 and 2013–2014 survey datasets, respectively). The mean poverty income ratio (PIR) was similar between the two surveys with a weighted mean of 2.6 (0.1 SE) and 2.4 (0.1 SE) in 2003–2004 and 2013–2014, respectively. The weighted proportion of reported household smoking in 2003–2004 was 25.1% (95% CI 19.9%, 30.4%) and 23.6% (95% CI 17.5%, 29.6%) in 2013–2014.

Figure 2
figure 2

Flowchart of the participant selection from 2003–2004 and 2013–2014 surveys.

The weighted proportions of BMI categories were similar across the two surveys: “underweight” (2% in both survey waves); “healthy weight” (61.4% and 58.5% in 2003–2004 and 2013–2014, respectively); “overweight” (18% in both survey waves), and “obese” (18.1% and 20.6% in 2003–2004 and 2013–2014, respectively) (Table 1). No statistically significant (p > 0.05) differences were noted in the distribution of BMI categories between sexes in both survey waves (Supplementary Information S1).

Table 1 Estimated (weighted) background characteristics and categories of Body Mass Index of the study population from the 2003–2004 and 2013–2014 NHANES survey datasets.

Exploratory ExWAS analysis using the 2003–2004 and 2013–2014 dataset

The univariate zBMI regression models for each of the 139 variables in the 2003–2004 dataset resulted in 54 predictors with a p-value < 0.05, of which only 27 were significant after FDR correction: 18 variables of laboratory variables (alanine aminotransferase (ALT); albumin; bicarbonate; total bilirubin; cholesterol; gamma glutamyl transferase (GGT); serum iron lactate dehydrogenase; lymphocyte number; mean cell hemoglobin; mean cell volume; platelet count; red blood cell count; segmented neutrophils number; sodium ; triglycerides; serum uric acid, and white blood cell count) and 9 dietary variables (folate as dietary folate equivalents; iron; lutein and zeaxanthin; riboflavin (vitamin B2); thiamin (vitamin B1); total folate; total sugars; vitamin B6, and retinol). All significant predictors (FDR adjusted p-value < 0.05) were negatively associated with zBMI, except for the following: ALT; cholesterol; GGT; lactate dehydrogenase; number of lymphocytes; platelet count; red blood cell count; segmented neutrophils number; triglycerides; serum uric acid, and white blood cell count (Supplementary Information S2).

Multivariable regression resulted in 45 predictors below the p-value cutoff of 0.05 of which only 12 significant after FDR correction: 11 laboratory variables (ALT; GGT; mean cell hemoglobin; mean cell volume; phosphorus; platelet count; red blood cell count; segmented neutrophils number; triglycerides; serum uric acid; white blood cell count) and 1 dietary variable (riboflavin (Vitamin B2)). Eight of these significant predictors were positively associated with BMI z-score (ALT; GGT); platelet count; red blood cell count; segmented neutrophils number; triglycerides; uric acid and white blood cell count) (Table 2).

Table 2 Variables significant in multivariable linear regressions adjusted for age, sex, race/ethnicity, household smoking and poverty income ratio in 2003–2004 and 2013–2014 survey datasets.

For the replication dataset (2013–2014), the univariable regression models resulted in 39 predictors with a p-value < 0.05 of which only 22 were significant after FDR correction: 17 laboratory variables (ALT; albumin; GGT; globulin; serum iron; lactate dehydrogenase; lymphocyte number; mean cell hemoglobin; mean cell volume; monocyte number; platelet count; red cell distribution width; segmented neutrophils number; total bilirubin; triglycerides; serum uric acid; white blood cell count); 2 PAHs (1-hydroxyphenanthrene; 2-hydroxynaphthalene); 2 polyfluoroalkyl (perfluorodecanoic acid; perfluorobutane sulfonic acid) and 1 dietary variable (total sugars) (Supplementary Information S2). Multivariable regression resulted in 27 significant predictors of which only 9 laboratory variables were significant after FDR correction: 2-hydroxynaphthalene; ALT; GGT; lactate dehydrogenase; monocyte number; segmented neutrophils number; triglycerides; serum uric acid and white blood cell count (Table 2). All of these 9 variables had a positive significant association with zBMI. Volcano plots showing the distribution of the model estimates per predictor used in the ExWAS analysis are available in Figures 3 and 4.

Figure 3
figure 3

Volcano plots for the univariate and multivariable regressions in the 2003–2004 dataset.

Figure4
figure 4

Volcano plots for the univariate regression and multivariable regressions in the 2013–2014 dataset.

Six variables were commonly significant after FDR correction (FDR-adjusted p-values < 0.05) in multivariable analysis of both the discovery and replication datasets: ALT; GGT; segmented neutrophils number; triglycerides; serum uric acid and white blood cell count. In the second model for multivariable analysis that accounts for interaction between sex and each of the aforementioned predictors, the interaction term was significant for serum uric acid (estimate = 0.245, p = 0.022) and ALT (estimate = 0.248, p = 0.035) in the discovery dataset, and for ALT (estimate = 0.185, p = 0.03) only in the replication dataset (Figures 5 and 6). The full model estimates for variables with significant interaction terms are available in the Supplementary Information. Details on the regression analysis results can be found in the Supplementary Information.

Figure 5
figure 5

Scatter plot of the association between zBMI and uric acid in the 2003–2004 survey dataset: multivariable analysis including sex*uric acid as interaction term.

Figure 6
figure 6

Scatter plots of the zBMI and alanine aminotransferase in the 2003–2004 and the 2013–2014 survey datasets including the line of the predicted values from the multivariable regression models that include the interaction term.

Sensitivity analysis

Sensitivity analysis was conducted on a sample of 1540 and 966 non-obese adolescents from the 2003–2004 and 2013–2014 survey datasets. None of the predictors were significant at univariable and multivariable analysis in 2003–2004 dataset. For 2013–2014 dataset, 6 laboratory variables (ALT; albumin; blood urea nitrogen; GGT; triglycerides and white blood cell count) and 3 dietary variables (cholesterol; folic acid and vitamin B12) were significant at the univariable analysis after FDR correction; however, none remain significant after adjusting for covariates in the multivariable analysis.

Correlations

2003–2004 dataset

When examining correlations between dietary and laboratory variables, weak coefficients were found overall; a maximum coefficient of r = 0.217 was noted for the correlation between selenium and blood urea nitrogen.

Looking specifically at the correlation between serum uric acid and both laboratory and dietary predictors; the correlation between uric acid and each of serum creatinine and hemoglobin showed the highest coefficients of 0.48 and 0.44, respectively, followed by a coefficient of 0.33 for the correlation between uric acid and GGT. On the other hand, the correlation between GGT and ALT had a coefficient of 0.52. A strong correlation was found between mean cell hemoglobin and mean cell volume (r = 0.94). A medium correlation was found for the following associations: albumin and total calcium (r = 0.56); albumin and total protein (r = 0.56); total bilirubin and iron (r = 0.43); triglycerides and total cholesterol (r = 0.35), while the association between GGT and platelet count was weak (r = 0.0265) (Supplementary Information S2).

2013–2014 dataset

Similarly, weak coefficients were found when examining the correlation between dietary and laboratory variables, as noted in the correlation between selenium and each of protein (r = 0.9131); phosphorus (r = 0.8376) and blood urea nitrogen (r = 0.27). Medium correlations were noted for: GGT and ALT (r = 0.54); uric acid and each of creatinine (r = 0.45), hemoglobin (r = 0.39) and hematocrit (r = 0.39) (Supplementary Information S2).

Discussion

We conducted an exposome-wide association study exploring 139 explanatory variables with respect to sex-specific BMI-for-age in non-diabetic and non-pregnant adolescents aged 12–18 years old in the 2003–2004 NHANES survey and used the 2013–2014 survey for validation. After adjusting for age, sex, race-ethnicity, household smoking, and poverty income ratio, we found ALT, GGT, segmented neutrophils number, triglycerides, serum uric acid and white blood cell count to have statistically significant associations with zBMI after adjusting for FDR in both the discovery and replication datasets, being in line with literature findings for adolescents.

Uric acid is the end product of purine metabolism in humans and its magnitude depends on dietary purines (from animal proteins, meat, seafood, beer and fructose sources), the degradation of endogenous purines as well as the renal and intestinal excretion of urate34. Although serum uric acid levels increase differently by sex from birth till adolescence, there is still no universally accepted threshold for defining hyperuricemia, or excess concentrations of serum uric acid in children and adolescents35. Elevated levels of uric acid have been associated with obesity and non-communicable diseases, such as kidney and cardiovascular diseases in children and adolescents35,36. Recent literature emphasized the association between uric acid and metabolic syndrome (MetS) outcomes in children and adolescents, such as glucose intolerance, central obesity, hypertension, and dyslipidemia36,37,38. Uric acid was associated with the prevalence of metabolic syndrome and its components, as shown in a cross-sectional analysis of 1370 adolescents (12–17 years of age) using data from NHANES 1999–2002; the unweighted prevalence of metabolic syndrome was ≈ 21% in the highest quartile (> 339 μmol/L) as compared to ≈ 10% in the third quartile (≤ 339 μmol/L), ≈ 4% in the second quartile (≤ 291 μmol/L) and < 1% among participants in the unweighted lowest quartile of serum concentrations of uric acid (≤ 250 μmol/L)39. A similar distribution of serum uric acid levels was observed in this study targeting non-diabetic adolescents for the 2003–2004 dataset (median: 297 μmol/L, interquartile range (IQR): [250 μmol/L, 351 μmol/L]) and 2013–2014 dataset (median: 297 μmol/L, IQR: [244 μmol/L, 351 μmol/L]). In the adjusted multivariable analysis, the strongest adjusted effect size of the association between various exposomic variables with zBMI was observed for uric acid in both discovery (estimate = 0.452) and replication (estimate = 0.707) datasets. After excluding non-obese participants, the association between zBMI and uric acid was no longer statistically significant, although it still showed the strongest effect size in both discovery and replication subsets (adjusted estimates of 0.24 and 0.39, respectively). Thus, our findings highlight the pathogenic role of elevated concentrations of uric acid in young obese age groups, as showcased in different studies. In a case–control study conducted in Italy among 120 children and adolescents with primary obesity (zBMI ≥ 97th percentile) and 50 healthy controls, carotid intima-media thickness was significantly correlated (r = 0.61; 95% CI 0.58–0.64) with the fourth quartile of uric acid among obese children regardless of the presence of metabolic syndrome, defined in the study as ≥ 3 or more of the following criteria: obesity, hypertension, low HDL cholesterol, elevated triglycerides, and impaired fasting glucose and/or insulin resistance40. On the other hand, the association between serum uric acid and cardiovascular diseases, irrespectively of BMI, has also been documented in a study conducted on an 1999–2006 unweighted NHANES sample of 12–17 years old adolescents; after adjusting for age, sex, race/ethnicity and BMI, the odds of having elevated blood pressure (mean systolic and/or diastolic blood pressure percentile ≥ 95th percentile) was 1.38 (95% CI 1.16–1.65) for each 0.1 mg/dL increase in uric acid41. In a randomized, double-blinded trial among pre-hypertensive obese adolescents (11–17 years old), patients treated with two mechanisms of urate reduction (allopurinol and probenecid) did not continue to gain weight during the 3-months study period and showed a similar and significant reduction in their systolic blood pressure by 10.2 mmHg and their diastolic by 9.0 mmHg in the two treatment groups as compared to the placebo group; thus, highlighting the role of uric acid as a biochemical mediator of increased blood pressure42.

The observed association of both uric acid and GGT with obesity and other cardiovascular risk factors has been previously documented in the literature. In a cross-sectional study of 2067 children and adolescents (6–20 years) in Hong Kong, a combined effect of the upper quartiles of both uric acid and GGT on obesity, low high-density lipoprotein cholesterol (HDL-C) levels and high blood pressure (adjusted odds ratios ranged from 1.63 to 5.82, all p < 0.005) was documented37. GGT, a liver enzyme implicated in the degradation of glutathione is associated with BMI, total cholesterol, diabetes mellitus (all components of MetS) as well as cardiovascular disease and all-cause mortality in adults43. The correlation between GGT and MetS and hypertension among younger age groups was demonstrated in a 10-year longitudinal study in Taiwan, where subjects (10–15 years) with higher baseline levels of GGT were at least twice more likely to develop MetS and hypertension during the follow-up period44. In our analysis, we showed a positive correlation, albeit weak, between uric acid and GGT in both discovery and replication datasets, without adjustment for zBMI.

Also, ALT had a statistically significant association with zBMI in the multivariable analysis using both the discovery and replication datasets. ALT is a liver enzyme related to fat liver accumulation and considered a useful biomarker for non-alcoholic fatty liver disease (NAFLD)45. The association between serum ALT and zBMI was previously documented in a study among adolescents (12–18 years) from NHANES III (1988–1994), in which overweight and obese study subjects were three to six times more likely to have higher levels of ALT (> 30 U/L) as compared to those with normal weight46. Similarly, a significant correlation between ALT, zBMI and metabolic syndrome was found among 5411 adolescents aged 12–19 years from NHANES 1999–2014; yet, with no significance increase in the prevalence of increased ALT over time47. On the other hand, elevated serum ALT levels (> 40 U/L) were also associated with markers of metabolic syndrome, as demonstrated in a study among adolescents 10–19 years old from the Korean National Health and Nutrition Examination Survey in 199848.

Elevated triglycerides or hypertriglyceridemia is common among obese children and adolescents, and this component of metabolic syndrome is a known biomarker of cardiovascular disease risk49. In our analysis, a positive association between triglycerides and zBMI was found in each of the discovery (estimate = 0.285) and replication datasets (estimate = 0.444), but not in the sensitivity analysis in non-obese adolescents. This positive association found in our ExWAS study between triglycerides and zBMI is in line with the findings of a study on abnormal lipid levels among adolescents (12–19 years) in NHANES 1999–2006, in which 22% of overweight and 43% of obese had at least one abnormal lipid level including elevated triglycerides, the most common lipid abnormality associated with excess weight50.

Our analysis also showed the association between zBMI and inflammatory markers, such as white blood cell count and segmented neutrophils number. Such findings were also documented among adolescents51,52,53, suggesting that obesity-induced inflammation could start in childhood54.

None of the dietary variables remained significant at multivariable analysis in the discovery and replication datasets. Additionally, weak correlations were found between the laboratory and dietary variables in each of the discovery and replication datasets. On the other hand, none of the biomarkers of environmental chemical exposures remained significant after FDR correction at multivariable analysis in the discovery and replication datasets. Particularly, 2-hydroxynaphthalene was no longer significant at multivariable analysis in the replication dataset after FDR correction. Although the implicated biological mechanisms remain unclear, further studies are warranted to better understand the role of PAHs and other environmental pollutants in obesity pathogenesis.

The associations found between uric acid, GGT or ALT with zBMI using the whole study population were no longer statistically significant in the sensitivity analysis that included non-obese adolescents. This observation warrants for further investigation on the potential use of these biochemical parameters as biomarkers in the early stages of obesogenesis in adolescence or childhood. The sex specific trends observed in the association between the three aforementioned biochemical parameters and zBMI are also worth of detailed investigation in other population studies as they might be useful in future obesity screening and prevention programs in adolescence and/or earlier life stages.

The strength of this study lies in the agnostic nature of the ExWAS approach which allows for the simultaneous assessment of multiple parameters and their associations with different outcomes. The NHANES dataset is considered representative of the U.S. population; in effect, the weighted estimates of overweight and obesity prevalence in this U.S. study population (12–18 years old) were similar in the 2003–2004 (18.4% and 18.1%, respectively) and 2013–2014 survey datasets (18.5% and 20.6%, respectively). Moreover, the obesity prevalence estimates in both datasets were similar to the weighted estimates by the U.S. CDC of 17.4% (13.9–21.3%) for the 2003–2004 survey and 20.6% (16.2–25.6%) for the 2013–2014 survey55. The created models are considered as robust, being tested on two NHANES datasets, 10 years apart. Another strong feature of this ExWAS study was the inclusion of variables belonging to all exposome domains; the general external (individual household income), the specific external (dietary variables; education; household smoking and physical activity) and the internal domain (intrinsic and laboratory variables). Yet, in order to fully explore the exposome’s utility, it is encouraged to include additional groups of environmental components in relation to the studied health outcome17.

Due to the cross-sectional study design of NHANES, causal associations cannot be established and although the approach was as inclusive as possible, not all exposome parameters were available or could be included, e.g. chemical exposure data were not fully available in these NHANES surveys. On the other hand, the lack of observed associations between dietary factors and BMI might be interpreted by the use of a self-reported recall of the food items consumed during the past 24 hours; which might be subject to recall bias, under-reporting, over reporting, or omission of foods56. Additionally, our analysis did not account for the “dietary recall status” variable and hence subjects classified as not having a “reliable recall” were not excluded. In this study, we used only dietary intake measurements for 1 day as an adequate metric of describing population mean nutrient intake. Earlier studies have shown no significant differences between the 2 days’ dietary intake measurements available in NHANES for estimating population means57,58. However, multiple measures of daily dietary intake are recommended, particularly in exposome studies, to ensure sufficient capture of the within- and between- subject variability59. Another possible limitation is the use of only two surveys out of the total available NHANES year surveys; ExWAS studies integrating additional NHANES datasets as well as a bigger number of environmental exposure variables are warranted to improve our knowledge in the environmental determinants of obesogenesis. Moreover, we focused on the analysis of parameters such as dietary habits that are more directly linked with obesity and not in the analysis of external drivers of variability in these parameters such as mental health, food security, socioeconomic determinants. Although the poverty income ratio was used in the models as a proxy of socioeconomic status, more detailed exposome domain assessments including external parameters in the future will allow for the identification of effective population programs/interventions that tackle adolescent obesity.

Adolescence is a critical life window of susceptibility to metabolic diseases, such as, overweight and obesity during which ongoing children’s development may be perturbed by a suite of environmental stressors, including lifestyle/behavioral factors and dietary habits58. The methodological framework of the human exposome and its tools allow for a comprehensive assessment of multiple factors with respect to disease outcomes through an agnostic, untargeted and hypothesis-generating approach. The NHANES-based discovered and replicated predictors of zBMI among U.S. adolescents seem to be in line with the global literature and further highlight their importance as potential early-stage biomarkers of excess weight. Additional studies at younger age groups are warranted to better explore a broader suite of lifestyle, dietary and environmental determinants in association with these biomarkers of effect and their implications with metabolic disease process.