Introduction

Body composition is associated with cardiorespiratory fitness and longitudinal health outcomes1,2. Excess adiposity impairs functional performance, is a major risk factor for developing chronic diseases, and is often accompanied by poor self-esteem3,4,5,6,7. The increased risk of chronic diseases that accompany excessive fat accumulation is the leading cause of death globally and contributes to an estimated $210 billion in medical costs in the US annually8,9.

In clinical practice, health risk levels are defined using body mass index (BMI), where adults with BMIā€‰ā‰„ā€‰25 andā€‰ā‰„ā€‰30Ā kg/m2 are classified as overweight and obese, respectively10,11,12. However, BMI cannot discern the fat component of body mass from lean tissues, which often leads to risk level misclassification13,14.

Alternative biomarkers are being designed that focus on global or regional body fat and lean mass, rather than weight. Some of these biomarkers use raw measurements such as waist circumference (WC)15. Others combine measurements together into ratios or more complex formulae, as for example Percentage Body Fat (PBF)16, Waist-to-Hip Ratio (WHR)3, fat-free mass index (FFMI)17, A Body Shape Index (ABSI)18 and Relative Fat Mass (RFM)19. However, the important question remains which biomarker or combination of biomarkers is best at predicting health risks, and for which condition.

This paper presents a new, simple and effective technique to assess existing biomarkers and their combinations, in terms of their risk predictive power.

Popular techniques to assess biomarkers use a simple disease classifier/detector obtained by thresholding a biomarker value, and measurements extracted from the associated confusion matrix (aka contingency table)20. Specificity, sensitivity and area under the ROC curve (also called c-statistics) are some of the most common such measurements21,22,23,24,25. Those approaches try to design or assess biomarkers that work well for detecting disease. In such prior work, the focus is on answering the question ā€œDoes my patient have condition C or not?ā€.

In contrast, our study aimed to answer a different question: ā€œWhat body composition biomarker should I change, and by how much to achieve the largest reduction in my health risks?ā€. This question is most naturally answered by using differential calculus26. We introduce the Normalized Sensitivity score (NORSE), a new measure of risk change that is based on the established mathematical tool of sensitivity analysis27. Note that here the term ā€œsensitivityā€ is intended as the rate of change of a dependent variable with respect to an independent one28,29 and is different from sensitivity as the True Positive Rate of a classifier21. Because of its focus on risk changes, the NORSE tool may be used to help people to take up healthier lifestyles and behaviors.

Unlike existing detection-based approaches, NORSE is designed to measure the rate of change of a health risk (e.g., prevalence of hypertension) with respect to changes in an input biomarker (e.g., body fat mass). We do not propose new biomarkers, rather a new way of evaluating and ranking existing biomarkers based on their sensitivity of prevalence.

Traditionally, researchers have tried to create strong biomarkers by combining together simpler ones through various hand-designed formulae22,30,31; This is the case for ABSI18 and RFM19. In contrast, here we propose to combine multiple raw biomarkers together through joint, multi-dimensional statistical models. We discover that 2D models (association of two biomarkers with one condition) yield higher discrimination and personalization than 1D models (association of one biomarker with one condition); and that they enable the separation of the negative effects of fat mass from the positive effects of muscle mass on peopleā€™s health risks.

Our results are validated on large datasets of participants (subsets of NHANES Nā€‰~ā€‰100Ā K) and explained via new, compact, and intuitive visualizations.

Methods

Participants

All analyses in this study were conducted using the NHANES dataset32, collected by the Center for Disease Control and Prevention (CDC) between the years 1999 and 2020. The dataset comprises a total of more than 100,000 unique participants with data related to: demographics, body composition, fitness habits, eating habits and medical conditions. Our analysis focuses on the adult population only (ages between 20 and 110). The Supplementary Material available online presents a detailed accounting of the NHANES study design, participant selection, sample size and participants demographic characteristics.

Health conditions

This study considers 6 common health conditions: hypertension, diabetes, high cholesterol, arthritis, coronary heart disease and cancer (general malignancy). Being positive to a condition is assessed via participantsā€™ own answers to questions like: ā€œHas a doctor ever told you that you have diabetes?ā€, as defined in the NHANES protocol (see Supplementary Material). The lack of an official diagnosis likely adds noise to our results, but aggregating statistics over a relatively large number of participants mitigates that issue.

Body composition biomarkers

This study analyzes the predictive power of 23 biomarkers, amongst which: BMI, WHR, ABSI, PBF and RFM. The full list of biomarkers and their description is in the Supplementary Material. We have organized all biomarkers into three groups: global body composition biomarkers (e.g., BMI, PBF, total body weight), composition biomarkers based on regional measurements (e.g., percent trunk fat, waist circumference, WHR), and biomarkers that are less strongly associated with body composition (e.g., standing height and leg length).

Statistical models for risk change prediction

Here we present two different types of risk change prediction models. 1D models are those where we study the association between one medical condition and one input biomarker. In 2D models, we have one medical condition and two input biomarkers. Multi-dimensional models combine multiple biomarkers using a joint statistical model, rather than trying to compress their information into a single, scalar output. Examples of 1D and 2D biomarker models are illustrated in Fig.Ā 1. Notice that in theory, it is possible to extend our models to a dimensionality higher than 2. However, the limited amount of data in NHANES and the so called ā€œcurse of dimensionalityā€ would yield noisier results33.

Figure 1
figure 1

1D and 2D biomarker models. (A) A 1D d-map for a given population. (B) The corresponding 1D p-map for a condition C of interest. (C) A 2D d-map for a given population. (D) The corresponding 2D p-map. NORSE scores are indicated at the end of each row and column.

Distribution and prevalence maps

Our biomarker models are visualized via two types of visualizations: ā€œpopulation distribution mapsā€ (aka d-maps) and ā€œcondition prevalence mapsā€ (aka p-maps).

d-map

A distribution map reports the probability distribution of a population as a function of input biomarkers (Fig.Ā 1A, C). Each cell in a d-map reports the total number of people with the biomarker \(B\) within a given range (e.g.,\(B \in\) [18.5, 25]), both in absolute terms (e.g., \(n_{cell} = 5166\) participants) and as a percentage of the total population (e.g., \(\frac{{n_{cell} }}{{n_{tot} }} = 21.0\%\)). A d-map is visualized via a white-blue-purple colormap where white denotes 0% and purple denotes the maximum probability for that map.

p-map

A prevalence map reports the prevalence of a given medical condition \(C\) as a function of a biomarker \(B\) (Fig.Ā 1B, D). We have \(n_{cell}\) participants in a cell, out of which \(n_{cond}\) are positive for the condition \(C\). The cell reports the condition prevalence \(P_{C} = \frac{{n_{cond} }}{{n_{cell} }}\) as a percentage. A p-map is visualized via a grey-green-yellowā€“red colormap, with green indicating low prevalence and vice-versa for red. In d-maps and p-maps cells with \(n_{cell} < 40\) or \(\frac{{n_{cell} }}{{n_{tot} }} < 0.2\%\) are left empty to reduce noise associated with small counts.

Sensitivity of prevalence with respect to input biomarkers

In this study we explore associations between changes in biomarkers (the independent variable \(B\)), and the corresponding change in condition prevalence (the dependent variable \(P\)). For this, we use derivative-based sensitivity analysis34. Illustrative examples are presented in Fig. S2 of the Supplementary Material. For a given amount of change \(\Delta B\) in the \(X\) axis, the corresponding change \(\Delta P_{C}\) in \(Y\) depends on the slope of the curve at that point (it is a local analysis). Sensitivity is defined as the partial derivative \(S_{CB} = \frac{{ \partial P_{c} }}{ \partial B}\) (in the continuous domain). The higher the \(S_{CB}\) value, the larger the influence of the input biomarker onto the condition prevalence. More generally, \({\varvec{B}} \in {\text{R}}^{{\text{n}}}\) is an n-dimensional biomarker vector, and \({\varvec{S}}_{CB}\) is the associated gradient vector \({\varvec{S}}_{CB} = \left[ {\frac{{ \partial P_{c} }}{{ \partial B_{1} }},\frac{{ \partial P_{c} }}{{ \partial B_{2} }}, \ldots ,\frac{{ \partial P_{c} }}{{ \partial B_{n} }}} \right]\).

Derivatives and gradients capture only local sensitivity of functions with respect to independent variables. However, our experiments show a roughly linear relationship between disease prevalence and various biomarkers, thus justifying our approach (see examples in Fig. S3 of the Supplementary Material).

Normalized sensitivity to predict risk changes

In general, different biomarkers have different measurement units and vary in their value ranges. For example, for the standing height biomarker, we typically have \(B \in \left[ {140,{ }220} \right]\;{\text{cm}}\) for adults, while for BMI we have \(B \in \left[ {10,{ }60} \right]\;{\text{kg/m}}^{2}\). To compare sensitivities of diverse biomarkers with one another we first need to map their values to a canonical range. We do so via a normalized sensitivity score (namely NORSE) which we define as follows. A biomarker \(B\) measured in our population has mean \(\upmu _{B}\) and standard deviation \(\sigma_{B}\). Thus, its z-score35 is \(X = \frac{{B -\upmu _{B} }}{{\sigma_{B} }}\). The z-score of a measurement represents its distance (in terms of number of standard deviations) from the mean. By the chain rule, the sensitivity with respect to the z-score \(X\) (i.e., the NORSE measurement \(N_{CB}\)) is defined as \(N_{CB} = \frac{{ \partial P_{c} }}{ \partial B}\frac{ \partial B}{{ \partial X}} = \sigma_{B} S_{CB}\). The NORSE score is a unit-less number and can now be used to compare the risk predictive power of diverse biomarkers with respect to one another.

Normalized sensitivity in maps

For consistency and to aid comparisons, the length of the sides of each cell in our visualization maps are set to \(\frac{1}{2} \sigma_{B }\) (see Fig.Ā 1). The blue and brown numbers on the side of a p-map are the NORSE scores computed for each row and column, respectively. Small NORSE values (\(\left| {N_{CB} } \right| < 2\)) are hidden to remove noise in the visualizations. Notice how in the example in Fig.Ā 1D the \(N_{{CB_{1} }}\) sensitivities (blue) are negative, while the \(N_{{CB_{2} }}\) ones (brown) are positive. This important effect will be discussed in detail in the results section.

All methods were performed in accordance with the relevant guidelines and regulations.

Meeting presentation

This work has not been published or presented elsewhere.

Results

Our modeling approach yielded five main new findings: (1) waist and hip circumferences used either in a ratio or within a 2D joint model yield the strongest predictive power; (2) fat-derived biomarkers have a stronger predictive power than weight-related ones such as BMI and total body weight; (3) regional body fat biomarkers are more predictive of health risks than global fat measurements; (4) 2D biomarker models produce smaller and more homogeneous cohorts than 1D ones which, in turn, leads to higher sensitivity, discrimination and personalization of health risks; and (5) 2D biomarker models help us explain the ā€œobesity paradoxā€ as the effect of controlling separately for fat mass and lean mass. These observations are enumerated upon in the sections that follow.

Predicting health risk changes from individual biomarkers

The NORSE scores for 23 biomarkers and 6 medical conditions for adult men and women are shown in Table 1. The last column reports NORSE scores averaged across conditions and genders. Such values are used to rank list all biomarkers.

Table 1 Ranking of 1D biometric models.

According to these results, WHR is the strongest health predictor, in the sense that normalized changes to WHR are associated with the largest changes in condition prevalence. The waist-to-thigh ratio is second, ABSI is third and RFM is fourth. BMI is in the middle of the table and total body weight lower still. Standing height and leg length have slightly negative NORSE scores, suggesting that tall people with long legs are statistically associated with lower health risks.

Interestingly, the top performing biomarkers are all regional ones; specifically, measurements associated with abdominal fat (e.g., WHR, ABSI, RFM, PTF). In the middle of the table we have global composition biomarkers (e.g. PBF, FMI, BMI); and at the bottom, biomarkers that do not correlate much with body composition (e.g., standing height, leg length). NORSE scores were able to cluster all biomarkers into these three groups automatically. Note that PBF is the strongest of the global adiposity biomarkers.

The bottom row in the table reports column-wise average NORSE scores. Their value indicates which health conditions are ā€œeasierā€ to predict from individual biomarkers. In our results, hypertension shows the largest average NORSE, and cancer the lowest.

Age stratification analysis

As an example, the tables in Fig.Ā 2 show diabetes prevalence with respect to WHR, for men and women and for different age brackets. As age increases diabetes prevalence increases, on average. The NORSE values follow a curve; they are low for young and elderly people, and they are higher in the middle. Very young people tend to have low diabetes risk even for high WHR values, and older people tend to have high prevalence, independent of WHR. People in the middle are those where changing WHR may have the greatest influence on their diabetes risk.

Figure 2
figure 2

Age stratification of NORSE for Cā€‰=ā€‰diabetes in adult men (A) and women (B). (C) NORSE curve as a function of age.

Predicting health risk changes from joint, 2D biomarker models

A 2D model associates two distinct biomarkers with the prevalence of a given condition. The example in Fig.Ā 3 shows d-maps and p-maps for Xā€‰=ā€‰weight, Yā€‰=ā€‰waist, Cā€‰=ā€‰diabetes for adult men and women. Notice that when fixing the weight coordinate (e.g., 66ā€‰<ā€‰weightā€‰<ā€‰76Ā kg for men), diabetes prevalence increases considerably (from 0.5% to 25.3%) with increasing waist. Also, for a fixed waist (e.g., 100Ā cmā€‰<ā€‰waistā€‰<ā€‰108Ā cm for men), prevalence decreases (from 25.3 to 3.8%) for increasing weight. This shows two things: (1) 2D biomarker models can discriminate different levels of risk better than using only one biomarker at a time, and (2) There are cases where increases in body weight correspond to improvements in health risks. Notice how all x-sensitivities (in blue) are negative, and all y-sensitivities (in gold) are positive.

Figure 3
figure 3

2D biometric models for Xā€‰=ā€‰weight, Yā€‰=ā€‰waist circumference, Cā€‰=ā€‰diabetes for adult men (A, B) and women (C, D).

Separating the effects of abdominal fat and lean mass

The effect of reduced health risks with (apparent) increased obesity goes under the name of the ā€œobesity paradoxā€36,37. Here we explain the negative weight-risk correlation by separating the negative effects of fat from the positive effects of lean mass. In our 2D model, such separation happens naturally by controlling for waist circumference. All participants within the same row have a similar waist circumference. We hypothesize that for those people, residual weight increases are mostly due to increases in lean muscle tissue, which tends to be associated with better health38,39,40. With this interpretation the observed prevalence trends remain explained and there is no paradox.

Risk discrimination in 2D models

Two biomarkers can be combined together by e.g. taking a ratio (as for WHR) or through a joint 2D statistical model. In the former approach, some information is lost. In fact, imagine two people, one has waistā€‰=ā€‰81Ā cm, hipā€‰=ā€‰90Ā cm and the other has waistā€‰=ā€‰108Ā cm, hipā€‰=ā€‰120Ā cm. They have the same WHRā€‰=ā€‰0.9 but very different risk levels (see Fig. S5 in Supplementary Material). Generally, Multi-dimensional models yield higher risk discrimination than 1D ones, as shown next.

Ranking 2D biomarker models based on NORSE scores and NORSE separation

Our 23 biomarkers combine into 253 valid pairs. Each pair defines a 2D model, for which we measure its NORSE scores, across two genders and 6 health conditions. NORSE scores are calculated for both biomarkers (both along the x and along the y dimensions). For many models, one of those scores tends to be strongly negative (increasing biomarker correlates with reduced risks) and the other strongly positive (increasing biomarker correlates with increased risks). We hypothesize that their difference (namely NORSE separation) relates to the modelā€™s ability to discriminate the negative effect of fat from the positive effect of lean mass.

Table 2 presents results for the 10 models with the largest NORSE separation. The right-most column reports average NORSE separations across conditions and genders. Those values are used to rank all biomarker pairs. Notice that for many 2D models, their average NORSE scores are higher than those of the 1D models (max avg NORSE isā€‰<ā€‰10 in Table 1, andā€‰>ā€‰19 in Table 2). In fact, keeping the input biomarkers separate (as opposed to fusing them together into a single output) allows us to subdivide the participants population into smaller and more homogeneous cohorts, for higher risk discrimination.

Table 2 Ranking of 2D biometric models.

For both men and women, the largest NORSE separation is achieved by the hipā€“waist joint model. This confirms the power of using waist and hip circumferences for risk prediction (see Table 1).

The weight-waist 2D model

The NHANES dataset does not contain many measurements of hip circumferences (nā€‰=ā€‰2402 for men, nā€‰=ā€‰2523 for women valid measurements when intersected with Cā€‰=ā€‰hypertension). The pair weight-waist is amongst the best in terms of NORSE separation, but with one order of magnitude more measurements (nā€‰=ā€‰23,726 for men, nā€‰=ā€‰25,437 for women for Cā€‰=ā€‰hypertension). More data ensures lower measurement noise and more confident results. For that reason, our next example focuses on the weight-waist model.

FigureĀ 4 shows p-maps for Cā€‰=ā€‰cancer (A, B) and Cā€‰=ā€‰hypertension (C, D) for adult men. In panel A, for a fixed weight the cancer prevalence increases with increasing waist circumference. When fixing the waist, the prevalence decreases with increasing weight. Panel B shows the same trends even after removing smokers from our analysis. Smokers here are detected through the SMQ020 NHANES code (ā€œSmoked at least 100 cigarettes in lifeā€). Similar results apply to hypertension (panel C, D), and same trends have been observed for the other four conditions, with or without smokers in the analysis. Age stratification results are presented in the Supplementary Material.

Figure 4
figure 4

Prevalence maps and average NORSE scores for Xā€‰=ā€‰Weight, Yā€‰=ā€‰Waist, Cā€‰=ā€‰cancer, for adult men. (A) including smokers. (B) excluding smokers. (C, D) Same as above but for hypertension.

Discussion

This study introduces a new way of assessing the strength of biomarkers as predictors of health risk changes. In contrast to AUC-ROC type techniques, here we estimate how much changes in input biomarkers affect changes in health risks. We achieve that through a new normalized sensitivity score.

The results in this paper show that when used in isolation, WHR is the biomarker with the strongest ā€œeffectā€ (in a sensitivity sense) on the risks of common health conditions. However, a high sensitivity also means that a small error in the input measurement is likely to have a large, detrimental effect on the accuracy of the output health risk.

For example, imagine that someone has waistā€‰=ā€‰85Ā cm and hipā€‰=ā€‰100Ā cm (thus WHRā€‰=ā€‰0.85), but those quantities are measured as waistā€‰=ā€‰87.5Ā cm, hipā€‰=ā€‰97.5Ā cm. Therefore, the WHR is erroneously measured as WHRā€‰=ā€‰0.9. A 2.5Ā cm error on the input biomarkers translates into a 0.05 error on the output WHR, which for adult men (Fig.Ā 2A) translates into a large, 9% error on hypertension risk. These observations, exposed by the analyses reported herein, lead us to argue that to benefit from the increased sensitivity of our models, it is necessary to use state-of-the-art digital anthropometrics technology to increase input accuracy and thus the accuracy of risk predictions. Much literature discusses errors of measurements obtained using a measuring tape for example41,42,43. Recent progress in computer vision and photogrammetry offers accurate and inexpensive tools for measuring body composition and anthropometrics through optical scanners or even conventional smartphones44,45,46,47,48,49,50.

Limitations

Limitations of the analysis presented here include: examination of cross-sectional data only, no longitudinal studies; establishing statistical associations rather than mechanistic understanding of cause and effect; lack of an official diagnosis for health conditions with reliance only on participants self-reported answers to a questionnaire; limited population size; treating diabetes as a single condition without distinction between type I and type II (by far the most common); and use of disease prevalence as a proxy for health risks.

Conclusions

This study advances a new way of estimating the power of different body composition biomarkers when predicting health risk changes. Our results indicate that waist and hip circumferences, either used in a ratio or in a joint 2D model, hold the strongest predictive power. In general, regional body composition biomarkers produce the best results. We also show how joint biomarker models provide further resolution, prediction accuracy and the possibility to separate the negative effects of body fat from the positive effects of muscle mass. Our joint models help explain the ā€œobesity paradoxā€ via conventional statistical analysis.

We believe that our findings will lead to a better understanding of obesity, its causes and its effects on peopleā€™s health. Also, focusing on sensitivity measures may help individuals understand what behavior changes affect their health the most, and embrace healthier habits. Finally, combining our findings with emerging technology for body scanning and anthropometrics measurements promises to advance the way we assess obesity and associated health risks for everyone.