1 Introduction

Infectious diseases such as influenza and Ebola pose a serious threat to global public health. In the United States, it is estimated that each year seasonal influenza causes 31.4 million outpatient visits, over 200,000 hospitalizations, 3,300-49,000 deaths, and is responsible for 44.0 million days of lost productivity [20, 34, 39, 41]. Worldwide influenza causes about three to five million cases of severe illness and about 250,000 to 500,000 deaths [45].

To control the spread of influenza, the Advisory Committee on Immunization Practices (ACIP) recommends seasonal influenza vaccination annually for individuals aged 6 months and older without contraindications [15]. However, this recommendation assumes an unlimited supply of vaccines to be available which is not always the case, especially during an influenza pandemic [16, 20, 32].

This work focuses on the study of health disparities and vaccination priorities among different subpopulations. Health disparities can arise due to differences in age, gender, geographic location, income level, and other socioeconomic variables [23] since personal attributes influence behavior, activities, and social interactions, which in turn affect an individual’s vulnerability and infectivity. To effectively control the spread of a disease it is important to understand these differences so scarce public health resources can be distributed efficiently [1, 4, 21, 24, 36, 43].

Health disparity, as measured by case outcome and economic return, varies by the way a population is partitioned. For instance, the disparity may be high among age groups but not so much among income groups, implying that age is a bigger determinant of disparity than income. It is important to understand health disparity with respect to demographic features and identify the most significant ones. This would help identify cohorts that are at a disadvantage in terms of encountering more infections and may require more resources to achieve herd immunity not only within their own cohorts but among the entire population. Ultimately it will guide the distribution of health care resources so that overall infection rates can be curtailed and economic loss can be minimized. The literature lacks this kind of detailed, individual-based study of health disparities and their consequences on the control of epidemics.

To study cohort level disparities, we use an individual-based model where a synthetic representation of a real population is created at an individual level. The synthetic individuals are statistically equivalent to the Census population when aggregated to a block group level. Each synthetic individual is assigned an activity sequence which is derived from time use surveys. It includes the types of activities performed and the time of day they are performed at [5]. A location for each activity is also assigned and when individuals are co-located at these locations, they are assumed to be in contact with each other, resulting in the creation of a synthetic social contact network. This approach supports fully heterogeneous individuals and does not require assumptions about large-scale regularity of interactions, unlike compartmental models [3, 8].

There have been many articles that study the cost of influenza infections [6, 9, 26]. For example, there are studies focusing on the cost effectiveness of vaccination for children [19, 25, 29, 33, 35, 40], healthy working adults [10, 28], and people of age 65 years and above [37]. Research in [16] compares the cost effectiveness of vaccination using a dynamic model and a static model and prioritizes individuals by age and risk groups. These works focused on the measurement of direct and indirect impacts that take medical cost and productivity loss into account. Meltzer et al. [32] used a Monte Carlo simulation model to analyze the direct economic impact of vaccine-based interventions in pandemic influenza and proposed multiple criteria to generate vaccination priorities for subpopulations. Work by [17] examined herd immunity to further learn the direct and indirect effects of influenza vaccination. However, prior work has not examined health disparities that originate from differences in individual level attributes.

In this work, we build a computational framework to study the impact of heterogeneities in individuals’ demographic and socioeconomic attributes, on health disparities during an influenza epidemic. We also design vaccination prioritization strategies by simulating epidemics where vaccine-based interventions are applied to targeted groups of people in the population to fulfill specific policy objectives. Our contributions include:

  • A framework to study health disparities in an influenza epidemic using a synthetic social contact network and an agent-based simulation model. This modular framework is highly generic and can be used to study other disease epidemics and intervention strategies.

  • Studied the impact of heterogeneities in individual level attributes on health and economic outcomes. Identified attributes that cause significant health disparities among cohorts.

  • Calculated direct and indirect impacts of vaccination-based interventions.

  • Designed cohort-based vaccination prioritization strategies using multiple objectives and measured their performance.

The rest of this paper is organized as follows. Section 2 describes the framework and the models used in this research. Section 3 shows simulation results, and discusses health disparities with respect to age and income attributes. Various vaccination strategies are designed and discussed with respect to their case and economic outcomes. Section 4 concludes the paper. Appendix provides details of the data used in the paper.

2 Framework and methods

In this section, we present the details of the framework along with the models and the methods used. Although this work focuses on influenza and vaccine based intervention, the framework is generic and can be applied to diseases and be used for other interventions such as antivirals and social distancing.

2.1 Framework

The framework consists of three major components: (1) real/synthesized data, (2) modeling and simulation, and (3) analysis (see Figure 1). The first component collects data from a variety of sources and synthesizes it [38]. This includes building the social contact network, collecting data on disease model parameters, data on case outcome distributions conditional on demographics, and data on costs conditional on clinical outcomes. The second component consists of a disease spread model which determines the health state (susceptible, exposed, infected or removed) of each synthetic individual at the end of the simulation; and a labeling algorithm for classifying each infected individual into four health outcomes i.e., death, hospitalization, outpatient, and ill but not seek medical care. The third component is the analytical engine of the framework that analyzes health disparities based on individual level demographic and health attributes, calculates the impact of interventions, and builds intervention priorities targeting towards specific economic and social objectives.

Figure 1
figure 1

Framework for studying health disparities based on individuals, cohorts, and specific objectives. Real/synthesized data module collects data on costs, disease parameters, interventions etc. and synthesizes data from multiple sources to create the synthetic social contact network. Modeling and simulation module consists of the disease model, net return model which labels health outcomes and associated costs. The inputs are collected by the first module, the outputs generated by the second module are individual level infected cases with a specific outcome (i.e., death, hospitalization, outpatient, ill but not seek medical care), and vaccinated cases. Analysis module performs health disparity analysis and intervention analysis. Health disparity is analyzed with respect to age and household income. Vaccination prioritization strategies are designed and compared to minimize death rate/total death count, or to maximize net returns per dollar spent/total net returns

The following steps illustrate how the framework processes the data, runs the simulations, and analyzes the results.

  1. (a)

    collect and synthesize data on the population and social contact network \(\rightarrow \) build influenza disease model \(\rightarrow \) run epidemic simulations \(\rightarrow \) label infected cases with clinical outcomes \(\rightarrow \) analyze health disparity with respect to individual level attributes.

    To simulate an epidemic, we need individual level data of the population, its contact network on which the transmissions occur, disease model parameters such as transmission rate, i.e., the probability of transmission from an infectious person to a susceptible person per unit of contact time, incubation duration, infectious duration, etc. If vaccination intervention is applied, then vaccine efficacy (reduction on disease transmission probability) and compliance rate (probability that a person will take the vaccine) are also required. The simulation output provides health label (infected or not) and vaccine label (vaccinated or not) for each person. This is joined with the person’s individual level attributes such as age, gender, income etc.

    Further, each synthetic individual is assigned with a risk level based on age regardless of health and vaccine label, i.e., whether the person is high-risk or non-high-risk [32]. Then each infected case is labeled with a specific clinical outcome among: death, hospitalization, outpatient, and ill but not seeking medical care, based on age and the risk level of the individual. For each clinical outcome, there is an associated cost of treatment. The clinical outcomes of the infected are decided by age and the risk level as given in [32]; and the data on the treatment costs for each outcome is given in [12]. We have included these data in Appendix for completeness.

    We divide the whole population into subgroups based on a select attribute, e.g. age, and compute the difference between age-groups with respect to cumulative infection rate, health outcome, and economic cost. The statistical significance tests are used to determine whether health disparities exist in terms of the select attribute.

  2. (b)

    design vaccination strategies \(\rightarrow \) change intervention parameter settings \(\rightarrow \) run simulations \(\rightarrow \) analyze the effects of revised vaccination strategies \(\rightarrow \) find public health policy implications.

    We consider different vaccine distribution schemes assuming vaccines are in limited supply. A no-priority strategy is defined for comparative analysis where all vaccines are distributed randomly in the population. In other cases, a specific demographic is selected to divide the population into subgroups on which a vaccine prioritization is determined. In this study, vaccination priorities are designed based on a subgroup’s ability to achieve specific objectives such as minimum number of total infections or maximum net return per capita. Different vaccination priorities are considered and compared using simulation results.

    Our goal is to answer the following kinds of questions through this framework: (1) Can health disparities be explained by the demographics? If yes, what are the causal reasons behind them? (2) Is it possible to design vaccination prioritization strategies assuming vaccines are in limited supply based on this knowledge? (3) What is the level of improvement provided by these vaccination priorities with respect to various measures of performance?

2.2 Method

This section provides details of the data and methods used in the study.

2.2.1 Synthetic social contact network

A synthetic population and its social contact network are used to simulate the spread of the disease. In this study we use the synthetic social contact network of the Montgomery county in Virginia [38]; its basic properties are summarized in Table 1. A methodology for constructing the synthetic population and contact network for any area in the US can be found in [3, 5, 8, 18, 38]. In what follows we briefly describe the procedures; interested readers can find details in the aforementioned references.

Table 1 Social contact network of Montgomery County, VA

First, a statistical representation of the each individual in the population is built using the US Census data. This synthetic population is statistically equivalent to the real population as given in the US census, when aggregated up to a census block group level. Individuals in the synthetic population are endowed with a complete range of demographic attributes as available in the Census [5, 7], including variables such as age and income level. The population synthesis process preserves the confidentiality of the individuals in the original data sets, yet produces realistic attributes and demographics for the synthetic individuals. Joint demographic distributions are reconstructed from the marginal distributions available in typical census data together with joint distributions in Public Use Microdata Samples (PUMS) using an iterative proportional fitting technique. This technique guarantees that a census of our synthetic population is statistically indistinguishable from the input census when aggregated to block groups.

Next, a set of activity templates for households are determined, based on American time-use surveys [11] and National Household Travel Survey. These activity templates provide a daily set of activities for individuals and the time of day they are performed. Each synthetic household is then matched with one of the survey households, using a decision tree based on demographics such as the size of the household, number of workers in the household, number of children, etc. The synthetic household members are then assigned the activity templates of its matching survey household members, giving each synthetic member a daily sequence of activities. The activities can be of type home, work, shop, school and “other”. For each activity, and for each individual, a geographic location is identified based on land-use patterns, transportation network and data from commercially available databases such as Dun and BradStreet.

A social network is formed when individuals are simultaneously present at a location. The co-location based social network is dynamic and changes as people visit different locations and come in contact with individuals at these locations. The contact network G(V,E,w) is an edge-weighted network. Nodes correspond to individuals, edges represent the contact between two end nodes, and edge weights represent contact durations. Edge (u,v) with weight w(u,v) represents that node u is in contact with node v for duration w(u,v), during which the disease may transmit from node u to node v with probability p(w(u,v)) if u is infected and v is susceptible.

2.2.2 Disease model

In this study, a standard SEIR model is used for modeling the spread of influenza. SEIR models are widely used in mathematical epidemiology literatures [2, 27] and have been extensively developed and applied to model contagious disease outbreaks [30, 31]. Each person is in one of the following four health states at any time: susceptible, exposed, infectious, and removed. A person v is in the susceptible state until he becomes exposed. If v becomes exposed, he remains so for ΔtE(v) days, which is called incubation period, during which he is not infectious. Then he becomes infectious and remains so for ΔtI(v) days, which is called infectious period. Finally he becomes removed (or recovered) and remains so permanently.

The health states of person v during the epidemic period, denoted by (v), can be equivalently represented by τ(v) = (tSE(v),tEI(v),tIR(v)), where tSE(v) is the day when v becomes exposed, tEI(v) = tSE(v) + ΔtE(v) is the day when v becomes infectious, and tIR(v) = tEI(v) + ΔtI(v) is the day when v becomes removed. The model of our framework assumes that ΔtE(v) and ΔtI(v) for each person v are known at the beginning of the simulation. It also assumes the health state transitions are not reversible and they are the only possible state transitions [14, 30, 31].

With the SEIR model, the disease spreads in a population in the following way. It can only be transmitted from an infectious node to a susceptible node. On any day, if node u is infectious and v is susceptible, disease transmission from u to v occurs with probability p(w(u,v)) = 1 − (1 − r)w(u,v) where r is the probability of disease transmission for a contact of one unit time. So the disease propagates probabilistically along the edges of the contact network.

To simulate the spread of influenza in the synthetic contact network, we use EpiFast [8], a high-performance agent-based simulation model. It follows the standard SEIR disease model to capture the within-host and between-host disease progressions, and uses a distributed algorithm to compute stochastic disease propagation in the synthetic network. EpiFast provides individual level details while maintaining heterogeneity among individuals.

2.2.3 Label infected people with clinical outcomes

Each infected person can have one of four clinical outcomes attributed to influenza infection: death (D), hospitalization(H), outpatient(O), ill but not seeking medical care(I). These outcomes depend on a person’s underlying risk condition and age. One’s risk condition (high risk/non-high risk) arising from a pre-existing medical condition often depends on age too. Economic cost is assigned to each infected person according to the outcome. The economic cost includes medical costs related to the treatment of infected cases (direct costs) and productivity loss of infected people (indirect costs). Once individual level outcomes and costs are known, we can calculate the total costs for any subgroup in the population.

For an infected individual whose age is known, the risk level and clinical outcomes are determined by the distributions given in [32]; and the treatment costs for each outcome under each age and risk level are determined by the data given in [12]. More details about these distributions are shown in Appendix Tables 1920, and 21.

2.2.4 Interventions

Two scenarios regarding interventions are considered in this work: base case, where no intervention is applied to contain the epidemic; and intervention case, where an intervention is applied to a subset of the population. The intervention may be targeted towards a subpopulation chosen by individuals’ specific attributes or completely random where each person is equally likely to be chosen. In our simulation, we focus on mass vaccination, which is applied at the beginning of the epidemic. Each person chosen to be vaccinated complies with a specified probability, called compliance rate. We assume that each person within a cohort follows a same compliance rate. Compliance rate varies across different cohorts.

2.2.5 Net return model

The total economic costs under base case include direct and indirect costs. The total costs under intervention case include direct, indirect, and intervention costs. Direct costs are the medical costs associated with the clinical outcome. Indirect costs are measured by the loss of productivity due to illness. Net return (NR) is defined as the reduction in total cost due to intervention. It provides a measure of economic efficiency.

By aggregating individual costs up to a cohort level or population level, and comparing the base case and the intervention case, we can compute the net return of an intervention as:

$$ NR = \sum\limits_{i\in S_{base}} C_{i} - \sum\limits_{j}^{S_{int}} C_{j} - \sum\limits_{k}^{S_{vax}} C_{k}^{vax}, $$
(1)

where Sbase,Sint are sets of infected individuals under base case and intervention case, respectively. Svax is the set of vaccinated individuals. Ci represents the economic cost of individual i who gets infected. \(C_{k}^{vax}\) is the cost of vaccination for individual k who gets vaccinated. We assume that for any k, \(C_{k}^{vax}\) is US $30.53 [32]. Ci is a predefined variable that varies based on age, risk factors, and clinical outcomes. The details on cost are provided in the Appendix, Table 21. We can compute total net return, direct net return, and indirect net return by using corresponding Ci of total cost, direct cost, indirect cost, respectively in Table 21. All costs are adjusted to 2016 USD.

To account for difference in group sizes we calculate net return per capita (NRPC), net return per vaccinated person (NRPV), and net return per dollar spent (NRPD) as:

$$ NRPC = \frac{NR}{N_{c}}, NRPV = \frac{NR}{N_{v}}, NRPD = \frac{NR}{N_{d}}, $$
(2)

where Nc,Nv,Nd represent the group size, number of vaccinated people within the group, and total dollar spent on vaccinating the group respectively.

Note that our individual based framework allows us to calculate net return for any subpopulation and for any health outcome. More details are shown in Section 3.

2.2.6 Subpopulations

To discover health disparities with respect to various individual level attributes, we split the population into cohorts by the selected attributes. In this work, we use age and household income to build the cohorts. We create: (i) four age groups that are distinct with respect to economic activity and health care related costs: 0-4 year olds (preschool); 5-19 year olds (school); 20-64 year olds (adult); and people who are 65 and above years old (senior); (ii) four quartiles by annual household income: $0-18400 (Q1); $18400-41620 (Q2); $41620-75000 (Q3); and above $75000 (Q4).

2.3 Applications

This framework can be applied to undertake various studies using alternative interventions, disease models, and datasets. Three major tasks become easier within this framework include: (a) study attributes based health disparity; (b) determine which attributes better explain health disparity; (c) design effective intervention strategies and build prioritization for distributing limited health care resources.

We list all the notations defined in Section 2and 3 in Table 2.

Table 2 Notations and their meanings

3 Simulation results and analysis

This section describes simulation settings and results. Vaccination is the only intervention considered here. Firstly, health disparities and economic disparities are discussed separately with respect to age and household income. We assume that vaccines can fully cover the whole population, and the compliance rates of each cohort are the same. Sensitivity of the disparity results are considered with respect to disease transmission rate and compliance rate for vaccination. Secondly, vaccination prioritization strategies are designed assuming vaccines are in limited supply. The compliance to vaccination can be uniform or vary by age.

3.1 Simulation settings

Table 3 shows list of parameter values and their sources. We seed the epidemic by randomly infecting 10 individuals at the start of the simulation. A base attack rate of 40% is assumed when no interventions are applied. The corresponding disease transmission rate for this network is 0.00007 per minute of contact time. The transmission rate keep the same across base case and intervention case. In intervention case, vaccination is applied randomly with a compliance rate of 50% and the vaccine efficacy is assumed to be 90%. A 90% vaccine efficacy means that the vaccine reduces the probability of disease transmission by 90%. Note that these parameter values do not correspond to any specific epidemic, as they vary from season to season and differ between populations. For each scenario, we run 30 replicates for 300 days and report the average results.

Table 3 List of parameter values and their sources

3.2 Health disparity

Here we study disparities in attack rate between different age groups (or income groups) when no intervention is applied (base case). The disease model determines the health state of each individual at the end of simulation. We identify all infected cases and split them into age (or income) groups to calculate the attack rates for each group.

Age-based groups

The epidemic curves, which show the number of new infections each day, for each age group are shown in Figure 2. We find that the epidemic in the school age group starts earlier and peaks higher. To check for significant differences in attack rates (cumulative infection rate) across age groups, the t-tests are performed. Results in Table 4 show p-values of 2-sample t-tests. The average attack rate (\(\overline {AR}\)) for each group is reported in their respective rows. The p-values show that there is a significant difference in \(\overline {AR}\) between any two age groups, and school age group has the greatest attack rate at \(\overline {AR}= 70\%\), making it the most vulnerable group to influenza infection.

Figure 2
figure 2

The epidemic curves of age groups along days. X-axis denotes days, Y-axis denotes normalized number of cases (as a fraction of the size of the group)

Table 4 Disparity in attack rate among age groups: p-value (base-case)

Table 5 shows that there is also a significant difference in average death rate (\(\overline {DR}\)) between any two age groups, and senior age group has the greatest death rate at \(\overline {DR}= 0.296\%\).

Table 5 Disparity analysis of death rate among age groups: p-value (base-case)

We also observe significant difference in average hospitalization rate (\(\overline {HR}\)), average outpatient rate (\(\overline {OR}\)), average ill rate (\(\overline {IR}\)) with respect to age. Results are omitted for the sake of brevity.

Income-based groups

Figure 3 shows the epidemic curves of income groups. We observe that households with high income tend to have an earlier and higher epidemic peak. Table 6 shows p-values of 2-sample t-tests where attack rates between any two income groups are tested. Here too, we find a significant difference in \(\overline {AR}\) between any two income groups, and \(\overline {AR}\) increases as household income increases. The higher quartile income groups encounter higher infection rates. This is likely due to the fact that higher income households have larger families, as shown in Table 7. In larger families, once one person is infected, the risk of secondary infection can go up by 38% [44].

Figure 3
figure 3

The epidemic curves of income groups along days. X-axis denotes days, Y-axis denotes normalized number of cases

Table 6 Disparity analysis of attack rate among income groups: p-value (base-case)
Table 7 Average household size of income groups

Unlike age groups, we observe no significant difference in \(\overline {DR}\) among income groups in Table 8. This is because the elderly population which is at a higher risk of death is more evenly distributed among income groups as compared to age groups.

Table 8 Disparity analysis of death rate among income groups: p-value (base-case)

3.3 Economic disparity

Next we examine the economic impact of intervention by age and income groups. For each group we compute the direct, indirect, and total net returns. To do comparative analysis across groups we measure net return per capita (NRPC), net return per vaccinated person (NRPV), and net return per dollar spent (NRPD) using (1) and (2). All returns are measured in 2016 USD.

Age-based groups

We show direct, indirect, and total net return for each age group in Figure 4. For all age groups the indirect return is the dominant component of total net return. This is because the indirect cost, i.e. loss in wages, from death is much greater than the direct (medical) cost, for all individuals regardless of age and risk level, as shown in Table 21. School-aged and senior groups have significantly higher net return compared to the other two age groups because vaccine intervention reduces the attack rate in school-aged group and the death rate in senior group more. This observation is reflected in all per capita calculations as shown in Table 9. Vaccinations to school-aged and senior groups are economically more beneficial than to preschoolers and adults. However, this does not mean that vaccination based intervention is always more effective in school-aged and senior groups in reducing the attack rate. We will explain this later in the sensitivity analysis section.

Figure 4
figure 4

Net return per capita (NRPC), per dollar spent (NRPD), and per vaccinated person (NRPV). The bar height represents value of net return within each one. Each total net return (blue + red) is the sum of indirect net return (upper blue bar) and direct net return (lower red bar). Indirect net returns are significantly larger than direct net returns across all age groups

Table 9 Disparity analysis of the net return per dollar spent among age groups: p-value (intervention case)

Figure 5 shows distributions of death, hospitalization, outpatient, and ill not seeking medical care among the infected cases, in each age group, and Figure 6 shows the corresponding cost distributions. Note that for each age group, although the percentage of death count (Figure 5) is the smallest, the cost of death (in Figure 6) is the largest. This is because the economic cost of death includes lifetime productivity and hence is higher than the cost for any other clinical outcome (see Table 21).

Figure 5
figure 5

Count distribution of death (first from left in red), hospitalization (second from left in green), outpatient (third from left in blue), and ill but not seeking medical care (forth from left in purple), among the infected cases in each age group. For each age group, the percentage of death count is the smallest

Figure 6
figure 6

Cost distribution of death (first from left in red), hospitalization (second from left in green), outpatient (third from left in blue), and ill but not seeking medical care (forth from left in purple) in each age group. For each age group, the cost of death is the largest

Income-based groups

Unlike age groups, the disparity in net return per dollar spent, among income groups is not as significant, see Table 10. The only exceptions are first income quartile versus third and fourth income quartiles.

Table 10 Disparity analysis of net return per dollar among income groups: p-value (intervention case)

3.4 Sensitivity analysis

Next we perform a sensitivity analysis on compliance rate and attack rate (by adjusting the disease transmission rate) to see whether our findings are robust to these changes.

3.4.1 Compliance rate

The simulation settings remain the same as described in Section 3.1 except that compliance to vaccination is reduced from 50% to 25% now. The economic returns of both school-aged group and senior group were significantly higher than for preschoolers and adults at 50% compliance rate, but at lower levels of compliance, the senior group provides the highest benefits, see Table 11. The reason is that the attack rate drops by 66% in senior group and only by 51% among the school-aged children as shown in Table 12. The lower impact on attack rate among school-aged group is probably due to the fact that lower compliance to vaccination prevents this group from gaining herd immunity and the indirect gains from vaccination are disproportionately reduced. The economic returns are not found to be significantly different across preschoolers, schoolers, and adults under 25% compliance.

Table 11 Disparity analysis of the net return per dollar among age groups: p-value (CMPL= 25%)
Table 12 Reduction in attack rate (%)

3.4.2 Attack rate

Next we change the attack rate from 40% to 60%, corresponding to disease transmission rate of 0.00011. All other simulation parameters remain the same as before. We consider both 50% and 25% compliance rate to vaccination. We find the results are qualitatively the same as they are for 40% attack rate and are omitted here to avoid duplication.

The sensitivity analysis results show that the following findings are robust to simulation settings: 1) disparities in health and clinical outcomes exist among age and income groups, but disparities in economic return are only significant with respect to age; 2) school age group is much more vulnerable to influenza infection than other age groups due to their higher contact rates; 3) in random allocation, the vaccination of school-aged group is the most effective in containing the disease except when compliance to intervention is low; 4) deaths have the biggest impact on net return.

3.5 Vaccination prioritization strategy

If vaccines are limited it is important to prioritize their use based on specific objectives. Here we consider a simple heuristic for determining priorities based on various objectives, which include: minimizing the total number of deaths, minimizing death rate, maximizing net return per dollar spent, maximizing total net returns, and minimizing attack rate. Our heuristic assigns higher priority to a subpopulation that has a higher value of the measure we want to minimize (maximize). For example, if the objective is to minimize death rate, then we prioritize age groups according to their death rates, i.e. the age group with a higher death rate is given a higher priority. Given that there is no significant difference in death rate and net return among income-based groups, we only consider age-based groups for deciding vaccination priorities. We use the simulation results based on the setting given in Section 3.1 to determine the vaccination priorities for age-based groups. The results are shown in Table 13.

Table 13 Priorities for age-based groups

3.5.1 Fixed compliance rate across age groups

We first assume the vaccination coverage is 50% and the compliance rate of each cohort is 1. The number of vaccines used is limited to 50% of the population size which is 77,820 ∗ 50% = 38,910 for the Montgomery county. Table 14 shows the various prioritization strategies with the actual vaccinated fractions under different objectives. The strategies S4321, S3421, S2134, S4231, and S3241 correspond to minimizing \(\overline {DR}\), \(\overline {DC}\), \(\overline {AR}\), and maximizing \(\overline {NPRD}\), \(\overline {TNR}\), where subscript numbers of each strategy describe vaccination prioritization of the four age groups. The column values in Table 14 under each strategy represent the actual vaccinated fractions of the corresponding age groups under that strategy. For example S4321 column, which corresponds to minimizing death rate, shows that vaccines are prioritized according to the following ordering: seniors, adults, school-aged children, and preschoolers. Seniors are vaccinated first which uses 7,335 vaccines to cover 100% of the senior group. The remaining 31,575 vaccines are distributed to adults, which cover only 60% of the adults. None of the children receive any vaccines in this case. Strategy with no priority (Snp) is introduced as a baseline in which 50% of each age group is selected uniformly at random for vaccination.

Table 14 Vaccinated fraction of each age group under different vaccination strategies

To compare the performance of strategies, we compute death rate, death count, net return per dollar spent, total net return, and attack rate over the whole population. Note that death rate, net return per dollar spent, and attack rate are normalized values and hence are directly comparable. The results are shown in Table 15. S2134 turns out to be the optimal strategy across all objectives. Followed by S4231 and Snp, whose results are also competitive. These three strategies are significantly better than the other ones. Comparison of S2134, S4231, and Snp with other strategies shows that they have a higher vaccinated fraction of school-aged group.

Table 15 Performance of different vaccination strategies

We conduct the same analysis under other settings to determine optimal vaccination priorities; these include attack rate= 40%, compliance= 0.25; attack rate= 60%, compliance= 0.5; attack rate= 60%, compliance= 0.25. We observe that the optimal strategy always has the highest number of school age people vaccinated.

Snp in Table 15 performs very close to the optimal strategy. Similar observations were made in other settings, except when attack rate= 60% and compliance= 0.25. Snp performs poorly when the disease transmission rate is high and the compliance rate is low. This indicates that vaccination priority is particularly important when vaccines are in short supply and the virus strain is highly infective.

3.5.2 Age based compliance rates

In previous section, we assume the compliance rate is the same across all age groups. However, this is unlikely to be the case so we estimate compliance rates based on age (showed in Table 16). The compliance rates are extracted from a survey conducted by Gfk.com, under the National Institute of Health grant no. 1R01GM109718. This survey collects data on demographics of the respondents and their preventive health behaviors during a hypothetical influenza outbreak. We categorize survey respondents into four age groups corresponding to the age groups used in this paper and estimate their compliance to vaccination.

Table 16 Compliance rates based on age

The new average compliance rate is 52%. If we still assume the vaccination coverage is 50%, the vaccine supply will be enough to meet the needs of almost everyone, so in order to observe the effect of different vaccine prioritization, we assume the vaccination coverage is 30%, i.e. only 43288 vaccines are available. The vaccinated fraction table (similar to Table 14) is shown in Table 17. It considers both vaccination coverage and compliance rate across each age group. S3421 is exactly the same with S3241.

Table 17 Vaccinated fraction of each age group under different vaccination strategies (based on age based compliance)

The results are listed in Table 18. The death rate, net return per dollar spent, and attack rate are directly comparable. We can still observe that the optimal strategy is S2134 in terms of all measures and following by S4231 and Snp. The observation is consistent with the former results where the compliance rate is assumed to be uniform.

Table 18 Performance of different vaccination strategies

3.6 Public health policy implications

The public health authorities often publish recommendations of interventions to the population. In case of limited resources, e.g. when vaccines are not enough to satisfy the needs of the whole population, some subpopulations are given higher priorities, to optimize the efficiency of interventions towards specific objectives of containing the epidemic.

Through the simulation results, we observe that school age group is much more critical than other age groups, and a vaccination strategy targeting school age group first can maximize net returns and improve herd immunity. School-aged children tend to have longer contact times with each other and form dense subnetworks at school locations; thus the epidemic spreads faster and more easily within the school age group. If the school age group receives no vaccine or few vaccines, the highly connected subnetworks of students continue to be able to spread the disease among students, impeding the intervention strategy to effectively reduce overall infections and costs. On the other hand, if the school-aged group is well vaccinated, the epidemic is effectively contained not only in this group but across all groups.

3.7 Limitations

This computational framework makes several assumptions about parameter settings. For example, in health and economic disparity study, it assumes that individuals follow the same compliance rate within a cohort and it is forced to be the same across cohorts. In vaccination strategy study, we relax the assumption and make the compliance rate follow a realistic distribution based on age. However, these parameter settings may not correspond to any specific scenario in the real world. Our goal is to provide a framework that can be used to study hypothetical scenarios to support planning, response and decision-making during epidemic outbreaks.

4 Conclusions

In this paper, we build a computational framework that provides a generic methodology for finding health and economic disparities among cohorts based on individual level features. We apply this framework to study disparities with respect to age and income during an influenza epidemic in the Montgomery county of Southwest Virginia. In one scenario, no interventions are applied to control the epidemic and in other scenarios vaccine-based intervention is applied to cohorts that are split by age and income.

We use an agent based disease propagation model to study the transmission dynamics of influenza on the social network. The infected cases are then assigned different clinical outcomes and cost of treatment depending on the severity of clinical outcome. Various vaccine allocation strategies are designed and simulated in this study.

Our results show significant health disparities across age groups and income groups, and economic disparities among age groups. The metrics for measuring the outcome disparities include attack rate, death rate, and death count. The metrics for measuring economic disparities include net return per capita, net return per vaccinated, and net return per dollar spent. We also find that in a severe flu season with a limited vaccination coverage, if vaccines are assigned randomly without priorities, then the intervention is least effective for the school aged cohort. Given the high connectivity of school aged children in the social contact network, they will be at a disadvantage if the vaccine assignment is completely random. After measuring performance of vaccination prioritization strategies under both simulated and realistic scenarios, the strategy that prioritizes the school aged children maximizes the normalized net returns and minimizes attack rate and death rate.