Original Research
The use of sampling weights in Bayesian hierarchical models for small area estimation

https://doi.org/10.1016/j.sste.2014.07.002Get rights and content

Highlights

  • We describe a hierarchical model to acknowledge design weights in a small area estimation context.

  • We carry out a simulation study and compare our method with a number of alternative hierarchical models and direct estimation.

  • The simulation illustrates how incorporating the weights can reduce bias and hierarchical modeling can reduce variance.

  • We illustrate the method using data from the Behavioral Risk Factor Surveillance System (BRFSS).

Abstract

Hierarchical modeling has been used extensively for small area estimation. However, design weights that are required to reflect complex surveys are rarely considered in these models. We develop computationally efficient, Bayesian spatial smoothing models that acknowledge the design weights. Computation is carried out using the integrated nested Laplace approximation, which is fast. An extensive simulation study is presented that considers the effects of non-response and non-random selection of individuals, allowing examination of the impact of ignoring the design weights and the benefits of spatial smoothing. The results show that, when compared with standard approaches, mean squared error can be greatly reduced with the proposed methods. Bias reduction occurs through the inclusion of the design weights, with variance reduction being achieved through hierarchical smoothing. We analyze data from the Washington State 2006 Behavioral Risk Factor Surveillance System. The models are easily and quickly fitted within the R environment, using existing packages.

Introduction

In this paper we consider the application of spatial models for small area estimation (SAE). SAE is an important endeavor since many agencies require estimates of health, education and environmental measures in order to plan and allocate resources and target interventions. The data upon which SAE is based are often gathered via complex designs.

Standard model-based approaches to the analysis often ignore the sampling mechanism and are therefore subject to potentially large biases. Adjusting for the sampling scheme by including in the model the design variables upon which sampling was based (and which are associated with the outcome of interest) is often not possible, because the required variables are unavailable, or the required model would be overly complex (Gelman, 2007). In this paper we consider the situation in which it is not possible to model the sampling scheme. Weighted design-based approaches provide a common technique to bias removal but the resultant estimators can be highly variable for areas in which only small sample sizes are collected. Hierarchical models provide a method to reduce the variance, with Fay and Herriot (1979) providing an early example and, notably, acknowledging the sampling scheme. Since this influential paper many hierarchical modeling approaches have been suggested, see Rao (2003) for a comprehensive summary of the literature and Pfeffermann (2013) for a more recent account.

In terms of spatial smoothing techniques, a number of authors allow for spatial correlation between areas, see for example Singh et al., 2005, Pratesi and Salvati, 2008, Pereira and Coelho, 2010. These models are subject to bias, however, since they do not adjust for the sampling scheme. Pseudo-likelihood (Skinner, 1989, Pfeffermann et al., 1998) has been used within a hierarchical modeling framework with the scaling of the weights being a major issue (Potthoff et al., 1992, Longford, 1996, Asparouhov, 2006, Rabe-Hesketh and Skrondal, 2006). Congdon and Lloyd (2010) used such an approach and introduced residual spatial random effects. In this paper we describe a range of models that can acknowledge the sampling scheme and allow spatial smoothing. We describe a new approach based on the concept of “effective sample size” and “effective number of cases”. A related Bayesian model has recently been suggested by Ghitza and Gelman (2013), while a quite different approach, based on a penalized spline model, is described in Zheng and Little, 2003, Zheng and Little, 2005. A key feature of the models we describe is that computation is fast and can be carried out using existing packages within the R computing environment.

The outline of this paper is as follows. We begin with a motivating example that concerns diabetes prevalence in the Behavioral Risk Factor Surveillance System (BRFSS) in Section 2. In Section 3 we describe hierarchical spatial and non-spatial models, which we then compare with various approaches in Section 4, via an extensive simulation study. We return to the BRFSS data in Section 5 and conclude the paper with a discussion in Section 6. The Supplementary Materials contain more technical details and additional supporting information.

Section snippets

Motivating example

The BRFSS is an annual telephone health survey conducted by the Centers for Disease Control and Prevention (CDC) that tracks health conditions and risk behaviors in the United States and its territories since 1984. In the BRFSS survey, interviewees (who must be 18 years or older) are asked a series of questions on their health behaviors and provide general demographic information, such as age, race, gender and the zip code in which they live. Here we focus on the survey conducted in Washington

Sample weighted Bayesian hierarchical models

Hierarchical models have been used extensively for SAE. In this section, we first provide a heuristic derivation of a “working” likelihood that uses the information in the weights. We then review some commonly used three-stage hierarchical models, including a spatial model, without considering the sampling weights. We refer to the resultant estimators as unadjusted; these estimators can be seriously biased in the event of non-random selection of individuals or non-response. Subsequently, we

Simulation study

In this section we report the results of simulation studies, under a variety of scenarios, to evaluate the performance of:

  • Direct estimates: using either the observed counts yi and sample sizes mi (Unadjusted) or the design-based estimator defined in (2) along with the appropriate variance estimate V^i (Adjusted).

  • Independent normal random effects: The hierarchical model with independent normal random effects given by (7) along with a binomial first stage model based on (yi,mi) (Unadjusted) or (yi

Motivating example revisited

We apply the unadjusted and adjusted Bayesian hierarchical models we developed in Section 3 to the Washington State 2006 BRFSS data introduced in Section 2. Sampling weights are taken to be the final weight used in the BRFSS survey, as summarized in Eq. (1). We emphasize that the design variables are unavailable so that the weights are the only available means for adjusting for selection bias. For those nine areas with only mi=1 observation, the effective sample size and effective number of

Conclusion

In this paper we have described a pragmatic approach to SAE that allows spatial smoothing, and incorporates sample weights to acknowledge the design. We have assumed that the design variables are unavailable so that directly modeling of the sampling mechanism is not possible. By using the sample weights to adjust the data before estimation we separate the design-based survey computations and the model-based Bayesian shrinkage, allowing both components to be modified as the situation requires.

Acknowledgments

This work was supported by the National Institutes of Health Grant R01 AI029168. The authors are grateful to Laina Mercer for helpful discussions and aid in the computation of the variance adjustment.

References (36)

  • T. Asparouhov

    General multi-level modeling with sampling weights

    Commun Stat Theory Methods

    (2006)
  • J. Besag et al.

    Bayesian image restoration with two applications in spatial statistics

    Ann Inst Stat Math

    (1991)
  • Chowdhury S, Khare M, Wolter K. Weight trimming in the national immunization survey. in: ASA Proceedings of the Joint...
  • P. Congdon et al.

    Estimating small area diabetes prevalence in the US using the behavioral risk factor surveillance system

    J Data Sci

    (2010)
  • R. Fay et al.

    Estimates of income for small places: an application of James–Stein procedure to census data

    J Am Stat Assoc

    (1979)
  • Y. Fong et al.

    Bayesian inference for generalized linear mixed models

    Biostatistics

    (2010)
  • A. Gelman

    Struggles with survey weighting and regression modeling

    Stat Sci

    (2007)
  • Y. Ghitza et al.

    Deep interactions with MRP: election turnout and voting patterns among small electoral subgroups

    Am J Polit Sci

    (2013)
  • L. Kish

    Methods for design effects

    J Official Stat

    (1995)
  • E. Korn et al.

    Analysis of health surveys

    (1999)
  • R. Little et al.

    Assessment of weighting methodology of the national comorbidity survey

    Am J Epidemiol

    (1997)
  • N. Longford

    Model-based variance estimation in surveys with stratified clustered design

    Aust J Stat

    (1996)
  • T. Lumley

    Complex surveys: a guide to analysis using R

    (2010)
  • B. MacGibbon et al.

    Small area estimates of proportions via empirical Bayes techniques

    Survey Methodol

    (1989)
  • D. Malec et al.

    Small area inference for binary variables in the National Health Interview Survey

    J Am Stat Assoc

    (1997)
  • M. Paul et al.

    Bayesian bivariate meta-analysis of diagnostic test studies using integrated nested laplace approximations

    Stat Med

    (2010)
  • L.N. Pereira et al.

    Small area estimation of mean price of habitation transaction using time-series and cross-sectional area-level models

    J Appl Stat

    (2010)
  • D. Pfeffermann

    New important developments in small area estimation

    Stat Sci

    (2013)
  • Cited by (0)

    View full text