Abstract
In health services research, endogenous healthcare utilization refers to the notion that the choice of utilizing health services is endogenous due to its correlation with the intensity of utilization outcomes, such as the number of emergency room visits. Greene in (Empir Econ 36:133–173, 2009) extended four conventional two-part models for zero-abundant count utilization to the two-part models that account for endogenous utilization. However, statistical inference on the (average) marginal and incremental effects in these models has not been carefully studied. The present article provides the estimation formulations for (average) marginal and incremental effects of the four two-part models: zero-inflated Poisson and negative binomial models, and hurdle Poisson and negative binomial models with correlated errors that characterize endogenous healthcare utilization. The variance estimation derived from the delta method is provided to facilitate the statistical inference of these effects. We then perform simulation studies to numerically justify our methodology. An empirical study is presented to investigate the average effects of household income and health insurance status on healthcare utilization with the German Scocioeconomic Panel data. The four models give consistent results regarding interpretations for moral hazards and adverse selection in the study.
Similar content being viewed by others
References
Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6, 93–109 (2005)
Bratti, M., Miranda, A.: Endogenous treatment effects for count data models with endogenous participation or sample selection. Health Econ. 20, 1090–1109 (2011)
Deb, P., Norton, E.: Modeling healthcare expenditures and use. Annu. Rev. Public Health 39, 489–505 (2018)
Engle, R.F., Hendry, D.F., Richard, J.F.: Exogeneity. Econometrica 51, 277–304 (1983)
Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G.: Longitudinal Data Analysis. Chapman & Hall/CRC Press, Boca Raton (2008)
Frick, J.R.: A General Introduction to the German Socio-Economic Panel Study (SOEP)-Design, Contents and Data Structure (Waves A-V, 1984–2005). Deutsches Institut für Wirtschaftsfor-schung, Berlin (2006)
Greene, W.H.: Accounting for excess zeroes and sample selection in Poisson and negative binomial regression models. NYU Working Paper No. EC-94-10: Department of Economics, New York University. 1994. Available at SSRN: https://ssrn.com/abstract=1293115
Greene, W.H.: Econometric Analysis, 5th edn. Prentice Hall, New Jersey (2002)
Greene, W.H.: Econometric Analysis, 8th edn. Pearson, New York (2018)
Greene, W.: Models for count data with endogenous utilization. Empir. Econ. 36, 133–173 (2009)
Hur, K., Hedeker, D., Henderson, W., Khuri, S., Delay, J.: Modeling clustered count data with excess zeros in health care outcomes research. Health Serv. Outcomes Res. Method. 3, 5–20 (2002)
Kavand, H., Voia, M.: Estimation of healthcare demand and its implication on income effects of individuals. In: Greene W., Khalaf L., Makdissi P., Sickles R., Veall M., Voia MC. (eds) Productivity and Inequality. In: NAPW 2016. Springer Proceedings in Business and Economics. 2018. Springer, Cham (2018)
Krinsky, I., Robb, A.L.: On approximating the statistical properties of elasticities: a correction. Rev. Econ. Stat. 72, 189–190 (1990)
Lambert, D.: Zero-inflated Poisson regression with an application to defects in manufacturing. Technometrics 34, 1–4 (1992)
Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., Schabenberger, O.: SAS for Mixed Model, 2nd edn. SAS Institute Inc., Cary (2006)
Liu, L., Strawderman, R.L., Cowen, M.E., Shih, Y.T.: A flexible two-part random effects model for correlated medical costs. J. Health Econ. 29, 110–123 (2010)
Liu, X.Y., Zhang, B., Tang, L., Zhang, Z.W., Allison, J.J., Sirivastava, D.K., Zhang, H.: Are marginalized two-part models superior to non-marginalized two-part models for count data with excess zeroes? Estimation of marginal effects, model misspecification, and model selection. Stat. Methods Med. Res. 18, 175–214 (2018)
Mihaylova, B., Briggs, A., O’Hagan, A., Thompson, S.: Review of statistical methods for analysing healthcare resources and costs. Health Econ. 20, 897–916 (2011)
Mullahy, J.: Specification and testing of some modified count data models. J. Econ. 33, 341–365 (1986)
Riphahn, R.T., Wambach, A., Million, A.: Incentive effects in the demand for healthcare: a bivariate panel count data estimation. J. Appl. Econ. 18, 387–405 (2003)
SAS/STAT 14.1 User’s Guide, the NLMIXED Procedure. Cary, NC: SAS Institute Inc., (2015)
Terza, J.V.: Estimating count data models with endogenous switching: sample selection and endogenous treatment effects. J. Econ. 84, 129–154 (1998)
Wagner, G.G., Frick, J.R., Schupp, J.: The German Socio-Economic Panel Study (SOEP)—Scope. Evolut. Enhanc. SSRN Electr. J. 127, 139–169 (2007)
Winkelmann, R.: Healthcare reform and the number of doctor visits-an econometric analysis. J. Appl. Econ. 19, 455–472 (2004)
Winkelmann, R.: Econometric Analysis of Count Data, 5th edn. Springer, Berlin (2008)
Wooldridge, J.: Econometric Analysis of Cross Section and Panel Data. The MIT Press, New York (2010)
Funding
H.Z. receives supports from the Northwestern Brain Tumor SPORE (NCI Grant #P50CA221747), Mesulam Center for Cognitive Neurology and Alzheimer’s Disease (NIA Grant #P30AG013854), and Robert H. Lurie Comprehensive Cancer Center (NCI Grant #P30CA060553), all of which are affiliated with Northwestern University Feinberg School of Medicine in Chicago, IL.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
Authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent is not applicable, as the article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Density functions and log-likelihood functions
1.1 Zero-inflated Poisson and zero-inflated negative binomial models with endogenous utilization
In the framework of zero-inflated models with endogenous utilization, for the \({\hbox {ZIP}} _{\mathrm{eu}}\) model, the marginal probability function is given by
where \(\frac{\mathrm{var}(y_i|x_i, z_i, \varepsilon _i)}{\mathrm{E}(y_i|x_i, z_i, \varepsilon _i)} =1+\varPhi (-A_i)e^{x_i'\beta +\sigma \varepsilon _i}\), which demonstrates overdispersion of the outcomes. If we denote the parameter by \(\theta =(\beta ', \gamma ', \rho ,\sigma )\), and the log-likelihood function is
for the \(\hbox {ZINB}_{\mathrm{eu}}\) model, the marginal probability function is as below
where \(\frac{\mathrm{var}(y_i|x_i, z_i, \varepsilon _i)}{\mathrm{E}(y_i|x_i, z_i, \varepsilon _i)} =1+\varPhi (-A_i)e^{x_i'\beta +\sigma \varepsilon _i} +\frac{e^{x_i'\beta +\sigma \varepsilon _i} }{\alpha }\). We can see that the overdispersion in the \(\hbox {ZINB}_{\mathrm{eu}}\) models is more intense than that in the regular NB models. Let’s denote the parameter by \(\theta =(\beta ', \gamma ', \rho ,\sigma ,\alpha )\). Then, the log-likelihood function is
1.2 Hurdle Poisson and hurdle negative binomial models with endogenous utilization
For the \(\hbox {HP}_{\mathrm{eu}}\) model, the log-likelihood function for the model is
For the \(\hbox {HNB}_{\mathrm{eu}}\) model, the log-likelihood function is
Appendix 2: Gradients of marginal and incremental effects
The calculation of variance estimates shown in (18), (19), (22), (23) involve the gradients of the marginal and incremental effects. The theoretical formulas of the gradients for our four models are focused on in this section. The corresponding estimates are obtained by substituting the estimate \({\hat{\theta }}\) for the parameter \(\theta\).
1.1 Zero-inflated Poisson and zero-inflated negative binomial models with endogenous utilization
Recall that the observed marginal mean formulas for \({\hbox {ZIP}} _{\mathrm{eu}}\) and \(\hbox {ZINB}_{\mathrm{eu}}\) models in (5) sare identical: \(\mathrm{E}(y_i|x_i, z_i) = \int _{-\infty }^\infty e^{x_i'\beta +\sigma \varepsilon _i}\varPhi (A_i)\phi (\varepsilon _i)\,d\varepsilon _i\), and so are their marginal effects and incremental effect formulas. The difference is that the parameters in \({\hbox {ZIP}} _{\mathrm{eu}}\) models are \(\theta =(\beta ',\gamma ',\rho ,\sigma )\), and those in the \(\hbox {ZINB}_{\mathrm{eu}}\) models have an additional component \(\alpha\): \(\theta =(\beta ',\gamma ',\rho ,\sigma ,\alpha )\). The following discussions are the same for both \({\hbox {ZIP}} _{\mathrm{eu}}\) and \(\hbox {ZINB}_{\mathrm{eu}}\) models and will not be emphasized again unless indicated otherwise.
For the purpose of notation simplicity, we introduce a pair of smooth functions \(p(t)=e^t\) and \(q(t)=\varPhi (t)\). Then, \({\dot{p}}(t)=\ddot{p}(t)=e^t\) and \({\dot{q}}(t)= \phi (t)\), \(\ddot{q}(t)= -t\phi (t)\), for any \(t\in \mathbb {R}\). The similar p, q notations were used in our previous publication (Liu et al. 2018).
Suppose \(x_{ij}=z_{ij}\) is a continuous covariate, then we use the short-hand notations \(p_i, {\dot{p}}_i,\ddot{p}_i, q_i, {\dot{q}}_i,\ddot{q}_i\) when they are evaluated at \(x_i'\beta +\sigma \varepsilon _i\) and \(A_i=\left( {z_i'\gamma +\rho \varepsilon _i}\right) /{\sqrt{1-\rho ^2}}\), respectively. Then, the marginal mean of count outcomes \(y_i\) is \(\mathrm{E}(y_i|x_i, z_i) = \int _{-\infty }^\infty p_iq_i\phi (\varepsilon _i)\,d\varepsilon _i\); the marginal effect with respect to \(x_{ij}\) is
where \(\frac{\partial A_i}{\partial z_{ij}}={\gamma _{j}}/{\sqrt{1-\rho ^2}}\).
If the case is that \(x_{ij}=z_{ij}\) is categorical, changing from level \(l_1\) to \(l_2\), then we use the notations \(\{p_{2i}, {\dot{p}}_{2i},\ddot{p}_{2i}\}, \{p_{1i}, {\dot{p}}_{1i},\ddot{p}_{1i}\}, \{q_{2i}, {\dot{q}}_{2i},\ddot{q}_{2i}\}, \{q_{1i}, {\dot{q}}_{1i},\ddot{q}_{1i}\}\) for p, q and their derivatives evaluated at \({\tilde{x}}_{2i}'\beta +\sigma \varepsilon _i\), \({\tilde{x}}_{1i}'\beta +\sigma \varepsilon _i\), \(A_{2i}=\left( {{\tilde{z}}_{2i}'\gamma +\rho \varepsilon _i}\right) /{\sqrt{1-\rho ^2}}\) and \(A_{1i}=\left( {{\tilde{z}}_{1i}'\gamma +\rho \varepsilon _i}\right) /{\sqrt{1-\rho ^2}}\), respectively, where \({\tilde{x}}_{2i}\), \({\tilde{x}}_{1i}\), \({\tilde{z}}_{2i}\), \({\tilde{z}}_{1i}\) are the covariate vectors \(x_i\) and \(z_i\) with \(x_{ij}=z_{ij}\) equal to \(l_2\) and \(l_1\), respectively. Thus, the incremental effect of marginal mean with respect to \(x_{ij}=z_{ij}\) is
Then, we can find the derivatives of \(\eta _j\) with respect to the parameters \(\theta\):
where \(u_{(j+1)}\) is the unit vector of length \(J_1\) with 1 in the \((j+1)\)-st component, \(\frac{\partial A_i}{\partial \gamma }={z_{i}}/{\sqrt{1-\rho ^2}}\), \(\frac{\partial ^2 A_i}{\partial z_{ij}\partial \gamma }=v_{(j+1)}/\sqrt{1-\rho ^2}\) (length of the unit vector \(v_{(j+1)}\) is \(J_2\) and its \((j+1)\)-st component is 1), \(\frac{\partial A_i}{\partial \rho }=(\varepsilon _i+z_i'\gamma \rho )/\sqrt{(1-\rho ^2)^3}\), \(\frac{\partial ^2 A_i}{\partial z_{ij}\partial \rho }=\gamma _{j}\rho /\sqrt{(1-\rho ^2)^3}\). The derivatives of \(\pi _j\) with respect to \(\theta\) are
where \(\frac{\partial A_{2i}}{\partial \gamma }={{\tilde{z}}_{2i}}/{\sqrt{1-\rho ^2}}\), \(\frac{\partial A_{1i}}{\partial \gamma }={{\tilde{z}}_{1i}}/{\sqrt{1-\rho ^2}}\), \(\frac{\partial A_{2i}}{\partial \rho }=(\varepsilon _i+{\tilde{z}}_{2i}'\gamma \rho )/\sqrt{(1-\rho ^2)^3}\), \(\frac{\partial A_{1i}}{\partial \rho }=(\varepsilon _i+{\tilde{z}}_{1i}'\gamma \rho )/\sqrt{(1-\rho ^2)^3}\). For \(\hbox {ZINB}_{\mathrm{eu}}\) models, \(\nabla _\alpha \,\eta _j(x_i,z_i,\theta )=\nabla _\alpha \,\pi _j(x_{i(-j)},z_{i(-j)},\theta ) =0\).
1.2 Hurdle Poisson models with endogenous utilization
For the convenience of computation and notations, we introduce smooth functions \(p^{\mathrm{HP}}(t) =\displaystyle \frac{e^{t+e^t}}{e^{e^t}-1}\) and the same \(q(t)=\varPhi (t)\) for \(\hbox {HP}_{\mathrm{eu}}\) models as used for the \({\hbox {ZIP}} _{\mathrm{eu}}\) and \(\hbox {ZINB}_{\mathrm{eu}}\) models. Then
where \(\kappa (t)=\frac{e^t}{e^{e^t}-1}\), \({\dot{\kappa }}(t)=\kappa (t)\{1-e^t-\kappa (t)\}\), \(\ddot{\kappa }(t)={\dot{\kappa }}(t)\{1-e^t-2\kappa (t)\} -e^t\kappa (t)\). Then, for \(p^\mathrm{HP}\) and q, we use the similar notations \(p^\mathrm{HP}_i,q_i,p^\mathrm{HP}_{2i}, q_{2i}, p^\mathrm{HP}_{1i}, q_{1i}\) and their derivatives when evaluated at \(x_i'\beta +\sigma \varepsilon _i\) and \(A_i=\left( {z_i'\gamma +\rho \varepsilon _i}\right) /{\sqrt{1-\rho ^2}}\) and when \(x_{ij}\) takes the values \(l_2\) or \(l_1\) for the i-th observation, as we have done for the \({\hbox {ZIP}} _{\mathrm{eu}}\) and \(\hbox {ZINB}_{\mathrm{eu}}\) models. Therefore, the marginal mean of \(\hbox {HP}_{\mathrm{eu}}\) model in (10) can be rewritten as \(\mathrm{E}(y_i|x_i,z_i)= \int _{-\infty }^\infty p_i^\mathrm{HP}q_i\,\phi (\varepsilon _i)\,d\varepsilon _i\). Consequently, the marginal effect is
and the incremental effect with respect to categorical \(x_{ij}=z_{ij}\) from \(l_1\) to \(l_2\) is
where \(\theta =(\beta ',\gamma ',\rho , \sigma )\). The remaining lines for the derivatives of \(\eta _j\) and \(\pi _j\) with respect to \(\theta\) have the same forms as (28), (29) in the \({\hbox {ZIP}} _{\mathrm{eu}}\) and \(\hbox {ZINB}_{\mathrm{eu}}\) models, with p and its derivatives replaced with \(p^{\mathrm{HP}}\) and its derivatives.
1.3 Hurdle negative binomial models with endogenous utilization
For \(\hbox {HNB}_{\mathrm{eu}}\) models, our parameter vector is \(\theta = (\beta ',\gamma ,\rho ,\sigma ,\alpha )\), and our auxiliary functions are \(p^\mathrm{HNB}(t,\alpha ) =\displaystyle \frac{e^t}{1-\delta (t,\alpha )}\), and \(q(t)=\varPhi (t)\), where \(\delta (t,\alpha ) = \tau ^\alpha (t,\alpha )\), \(\tau (t,\alpha ) =\displaystyle \frac{\alpha }{\alpha + e^t}\), and \(\alpha >0\).
It is tedious but not hard to get the partial derivatives of \(p^\mathrm{HNB}\) with respect to t and \(\alpha\). For simplicity of notations, let’s drop the function arguments of \(p^\mathrm{HNB}, \delta ,\tau\). Then,
where we used \({\dot{\tau }}_t(t,\alpha ) = \tau (\tau -1)\), \(\ddot{\tau }_{tt}(t,\alpha ) = \tau (\tau -1)(2\tau -1)\), \({\dot{\tau }}_\alpha (t,\alpha ) = {e^t}/{(\alpha + e^t)^2}\), \({\dot{\delta }}_t(t,\alpha ) = \alpha \delta (\tau -1) = -e^t\delta \tau\), \(\ddot{\delta }_{tt}(t,\alpha ) = \alpha \delta (\tau -1)\{(\alpha + 1)\tau -\alpha \} = \delta \tau ^2e^t(e^t-1)\), \({\dot{\delta }}_\alpha (t,\alpha ) = \delta (\ln \tau + 1-\tau )\).
At the i-th observation, the function \(p^\mathrm{HNB}\) and its partial derivatives evaluated at \((x_i'\beta +\sigma \varepsilon _i,\alpha )\) are denoted by \(p_i^\mathrm{HNB}, {\dot{p}}_{ti}^\mathrm{HNB}, \ddot{p}_{tti}^\mathrm{HNB}, {\dot{p}}^\mathrm{HNB}_{\alpha i}, \ddot{p}^\mathrm{HNB}_{t\alpha i}\). When the covariate \(x_{ij}\) is fixed at \(l_1\) or \(l_2\), the values of \(p^\mathrm{HNB}, {\dot{p}}_t^\mathrm{HNB}, {\dot{p}}^\mathrm{HNB}_{\alpha }\) are denoted by \(p^\mathrm{HNB}_{2i}, {\dot{p}}_{2ti}^\mathrm{HNB}, {\dot{p}}^\mathrm{HNB}_{2\alpha i}\), and \(p^\mathrm{HNB}_{1i}, {\dot{p}}_{1ti}^\mathrm{HNB}, {\dot{p}}^\mathrm{HNB}_{1\alpha i}\), respectively. Notations \(q_i,{\dot{q}}_i,\ddot{q}_i,q_{2i}, q_{1i},{\dot{q}}_{2i}, {\dot{q}}_{1i}\) for the q function are the same as those in the zero-inflated models and hurdle Poisson models.
Hence, the marginal mean of \(\hbox {HNB}_{\mathrm{eu}}\) model in (12) can be rewritten as \(\mathrm{E}(y_i|x_i,z_i)= \int _{-\infty }^\infty p_i^\mathrm{HNB}q_i\,\phi (\varepsilon _i)\,d\varepsilon _i\), yielding the marginal effect with respect to continuous \(x_{ij}=z_{ij}\), i.e.,
and the incremental effect with respect to \(x_{ij}=z_{ij}\) from \(l_1\) to \(l_2\), i.e.,
Then, the derivatives of \(\eta _j\) with respect to the parameter \(\theta\) are
where \(u_{(j+1)}\) is the unit vector of length \(J_1\). The derivatives of \(\pi _j\) with respect to \(\theta\) are
where \({\tilde{x}}_{2i},{\tilde{x}}_{1i}\), \({\tilde{z}}_{2i},{\tilde{z}}_{1i}\) are the covariate vectors \(x_i\) and \(z_i\) with \(x_{ij}=z_{ij}\) being equal to \(l_2\) and \(l_1\), respectively. Formulas of derivatives of \(A_i,A_{2i},A_{1i}\) are the same as those for the \({\hbox {ZIP}} _{\mathrm{eu}}\) and \(\hbox {ZINB}_{\mathrm{eu}}\) models.
Appendix 3: Box plots of the parameter estimates in the simulation study
The box plots of parameter estimates of the four models with endogenous utilization are given in Fig. 1.
Appendix 4: Parameter estimates of the empirical study
This section presents the three tables that summarize the parameter estimates of the two-part models with endogenous utilization in Table (4) and without endogeneity in Table (5) and the estimates of average marginal effect and average incremental effect for the latter (Table 6) for the GSOEP data.
Rights and permissions
About this article
Cite this article
Liu, X., Chen, W., Chen, T. et al. Marginal effects and incremental effects in two-part models for endogenous healthcare utilization in health services research. Health Serv Outcomes Res Method 20, 111–139 (2020). https://doi.org/10.1007/s10742-020-00211-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-020-00211-x