This paper assesses concordance and inconsistency among three small area estimation methods that are currently providing county-level health indicators in the United States. The three methods are multi-level logistic regression, spatial logistic regression, and spatial Poison regression, all proposed since 2010. Diabetes prevalence is estimated for each county in the continental United States from the 2012 sample of Behavioral Risk Factor Surveillance System. The mapping results show that all three methods displayed elevated diabetes prevalence in the South. While the Pearson correlation coefficients among three model-based estimates were all above 0.60, the highest one was 0.80 between the multilevel and spatial logistic methods. While point estimates are apparently different among the three small area estimate methods, their top and bottom of quintile distributions are fairly consistent based on Bangdiwala’s B-statistic, suggesting that outputs from each method would support consistent policy making in terms of identifying top and bottom percent counties.

Small area estimation (SAE) methods have been routinely used to generate poverty, employment and other economic indicators at census county and tract levels in the United States (US). Official data providers, such as the Census Bureau, and Bureau of Labor Statistics in the US approach SAE in stages: 1) proposing an appropriate SAE method, 2) evaluating and validating the proposed method, and 3) deploying the recommended method for specific SAE applications or SAE data releases. Certainly, stages 1) and 2) are often iteratively developed, and an initially proposed method may not proceed to stage 3. Such an approach had produced more than a dozen SAE methods for various small area applications (

Model-based methods, however, can vary widely based on model-specification, auxiliary information selection, and model estimation (e.g., frequentist and Bayesian). In the last few years, there have been at least three applications of model-based SAE methods, all based on the Behavioral Risk Factor Surveillance System (BRFSS) data at the county level. A Bayesian unit-level model is currently used by the Centers for Disease Control and Prevention (CDC) to monitor changes in diabetes prevalence from 2004 onward (

SAE methodological adoptions for major nationwide projects present several challenges, especially when the same data are used. First, by not going through the cycle of methodological development from proposing to validating a method, one risks of applying one application method as one fits for all without proper validation. The only exception in this regard is the multi-level logistic regression method that was later validated (

The current study intends to compare the three model-based methods to assess their concordance and inconsistency. A previous SAE study compared synthetic method, spatial data smoothing, and model-based regression analysis, and found that the model-based regression analysis was superior over the other two methods (

The study area was limited to 3,109 counties in the 48 states and the District of Columbia with a sample size of 455,406 respondents aged ≥ 18 years. The BRFSS 2012, the most recent year that the CDC released detailed county identifiers, was selected for SAEs. Respondents were regarded as having diabetes if they answered “yes” in the question: “Has a doctor, nurse, or other health professional ever told you had diabetes?” Figure

Geographic distribution of the number of diagnosed diabetes cases in the Behavioral Risk Factor Surveillance Survey among 3,109 U.S. counties in 2012. The color pattern was categorized by the quartiles of the number of diagnosed diabetes.

Demographic controls include age groups (18–44, 45–64, 65+ years), race (non-Hispanic White, non-Hispanic Black, Hispanic, Hispanic, and others), and sex (male and female), all of them were used in the three methods. A total of 8, 582 respondents who did not report any of the three personal characteristics, living locations (state or county), and the diagnosis of diabetes were removed. Table

Summary table of diabetes in the U.S.

Diagnosed diabetes | P-value* | ||
---|---|---|---|

No |
Yes |
||

Age | <.0001 | ||

18–45 | 105446 (96.65%) | 3650 (3.35%) | |

45–64 | 140475 (86.91%) | 21159 (13.09%) | |

65+ | 105983 (80.03%) | 26440 (19.97%) | |

Sex | <.0001 | ||

Male | 140905 (86.67%) | 21673 (13.33%) | |

Female | 214109 (87.72%) | 29962 (12.28%) | |

Race | <.0001 | ||

Non-Hispanic White | 282189 (88.27%) | 37485 (11.73%) | |

Non-Hispanic Black | 29536 (80.34%) | 7227 (19.66%) | |

Hispanic | 22269 (86.33%) | 3525 (13.67%) | |

Others | 16710 (86.55%) | 2596 (13.45%) |

* Chi-square test, where 7519 missing values are excluded.

County poverty rate, defined as percent of people living under the 100% of the federal poverty line was included as a county-level auxiliary variable. It was based on the 5-year estimate, according to American Community Survey 2012 (

In the model specification process, we made sure that all models were specified as close as possible to the original SAE models. The multilevel logistic regression model, which is labeled Model 1, was specified identical to original model using the original SAS codes provided by the author (

where _{isc}_{1}, β_{2}, β_{3}, β_{4}_{s}_{c}

In the absence of time effect, Model 2 can be specified by dropping time effect from the space-time logistic model in Dwyer-Lindgren et al. (

which has a similar model construct to Model 1 without state random effect. In particular, county effect was attributed to a spatial uncorrelated random effect _{c}_{spat}

Model 3 is a Bayesian Poisson model, which assumes that survey data are sampled from the complete population data (_{ijkc}_{ijkc}

where _{1i}, _{2j}, _{3k} are for age, race, and sex effects, respectively. The spatial function _{spat}_{ijkc}

To generate county-level diabetes prevalence from Model 1, we obtained estimated coefficients for four fixed effects (age, sex, race and poverty) and 3,157 random effects (48 states plus 3,109 counties). In particular, counties without samples had a county-level random effect (_{c}

The number of diabetes cases in each county can then be summed up by age, race and sex. After dividing by state-county specific population, we can obtain the SAE of diabetes prevalence straightforwardly:

To generate county-level diabetes prevalence from Model 2, we first calculated the probability of a person with diagnosed diabetes similar to Eq. (5) by replacing state random effect

We then used the probability to calculate the SAE of diabetes prevalence:

The approach to generating SAEs for Model 3 differs from Models 1 and 2. Note that we defined _{ijkc}_{ijkc}_{ijkc}_{c}_{ijkc}_{ijkc}

where _{ijkc}_{ijkc}_{ijkc}

Because the parameter _{ijkc}

We first used county maps to provide visual descriptions of diabetes estimates by the three methods. The intention here is to see if the three methods provide consistent geographic patterns regardless of specific county prevalence rates. Moreover, we categorized the SAEs into five quintiles (top 20%, upper-middle 20%, middle 20%, lower-middle 20% and bottom 20% of observations) in each model, and calculated Bangdiwala’s B-statistic (also known as Cohen’s Kappa coefficients) to evaluate the proportion of counties in the same quintile categories (

Model 1 was analyzed by the PROC GLIMMIX procedure in SAS v9.13. Model 2 & 3 was analyzed by the

Results from SAEs of diabetes prevalence are shown in Figure

Comparing three SAEs of diabetes prevalence at the county level using quintiles.

Scatter plots of small area estimates among Models 1, 2 and 3.

The weighted diabetes prevalence from the sample of the current study was 10.3%. Model 2 provides closest average (0.1042) among 3,109 counties (Table

Descriptive statistics of diabetes prevalence estimates from the three SAE models.

Model | Mean | SD | Minimum | Q1 | Median | Q3 | Maximum | F | P-value^{†} |
---|---|---|---|---|---|---|---|---|---|

All counties (N = 3,109) | |||||||||

1 | 0.1152 | 0.0237 | 0.0508 | 0.0986 | 0.1121 | 0.1282 | 0.2402 | 910.22 | <.0001 |

2 | 0.1042 | 0.0238 | 0.0335 | 0.0870 | 0.1024 | 0.1193 | 0.2171 | ||

3 | 0.0902 | 0.0220 | 0.0342 | 0.0373 | 0.0855 | 0.1023 | 0.1789 | ||

1 | 0.1146 | 0.0222 | 0.0536 | 0.0994 | 0.1124 | 0.1272 | 0.2210 | 377.93 | <.0001 |

2 | 0.1034 | 0.0233 | 0.0405 | 0.0868 | 0.1024 | 0.1187 | 0.1940 | ||

3 | 0.0960 | 0.0226 | 0.0351 | 0.0800 | 0.0931 | 0.1097 | 0.1789 | ||

1 | 0.1167 | 0.0271 | 0.0510 | 0.0969 | 0.1105 | 0.1311 | 0.2402 | 831.05 | <.0001 |

2 | 0.1063 | 0.0250 | 0.0335 | 0.0875 | 0.1023 | 0.1208 | 0.2171 | ||

3 | 0.0755 | 0.0105 | 0.0342 | 0.0687 | 0.0736 | 0.0806 | 0.1337 |

Abbreviation: SD = Standard deviation; Q1 = The first quartile; Q3 = The third quartile

† The p-values were calculated from the analysis of variation.

Table

Mean difference comparison in the SAE of diabetes prevalence among Models 1, 2 and 3.

Comparison | Difference | 95% CI |
---|---|---|

All counties (N = 3,109) | ||

Model 1 vs. Model 2 | 0.0110 | (0.0096, 0.0124) |

Model 1 vs. Model 3 | 0.0250 | (0.0237, 0.0264) |

Model 2 vs. Model 3 | 0.0140 | (0.0127, 0.0154) |

Model 1 vs. Model 2 | 0.0112 | (0.0096, 0.0128) |

Model 1 vs. Model 3 | 0.0186 | (0.0170, 0.0202) |

Model 2 vs. Model 3 | 0.0073 | (0.0057, 0.0089) |

Model 1 vs. Model 2 | 0.0104 | (0.0079, 0.0129) |

Model 1 vs. Model 3 | 0.0413 | (0.0388, 0.0438) |

Model 2 vs. Model 3 | 0.0309 | (0.0284, 0.0333) |

The concordance analysis intends to show similarity in diagonals or closer to diagonals in a contingency table of 5 by 5 quantiles. A greater concordance is shown in both tails than in the middle quantiles. The SAEs among the three models had a smaller proportion of concordance in the second, third, and fourth sections (see in Figure

The observer agreement charts of categorized small area estimates among Models 1, 2 and 3.

Currently, various SAE methods are used by federal agencies and research institutions, and it is difficult to gauge their estimation performance. In this paper, we have compared SAEs from thee model-based methods that are all actively producing SAEs at county or smaller area units. Our comparisons focused on spatial patterns or relative measures (e.g., quintiles) generated from each method, rather than point estimates. Overall, the three methods were able to point to the elevated diabetes prevalence observed in southern states. In addition, both top and bottom quintiles categories had highest concordance, which is of less concern, because either top or bottom of quantiles are matter the most in terms of policy making. In other words, when a county is high in diabetes prevalence compared to the national average, it less likely goes wrong if another SAE method is used.

Partly due to data limitation from the public use BRFSS file, we had 884 counties without identifiers in our SAEs. Since spatial Poisson regression in Model 3 depends on samples in spatial units, missing county samples would cause more problems for SAEs. Indeed, separate comparisons, both in map displays and post-hoc analyses, showed substantial discrepancies in counties with missing identifiers, and in general, they were under estimates compared to Model 1 and 2 estimates. Even though requesting unsuppressed BRFSS with all county identifiers is possible through each state (

Model-based SAE methods often include small area auxiliary information, and it tends to improve model predictability. However, different ways of using small area auxiliary variables present challenges for model comparisons. It also produces inconsistent model parameter estimates in space-time models, as both importance and reliance of auxiliary information change over the time dimension. Since socioeconomic status (SES) relates to many health outcomes, it is expected to be included in model-based SAEs in some form. In the original articles, only Model 1 used poverty, and Model 2 used education level, while model 3 did not use any SES variables. In the current study, we included the

Nationwide SAE of BRFSS at the county level can be calibrated to reflect state level estimates. In the absence of the full geographic sample, and to be true the three SAE model, we opted not to calibrate their estimates. Perhaps, for this reason, the three SAEs provided quite different national averages in diabetes prevalence, with Model 1 being the highest and Model 3 being the lowest. That is also why we placed less emphasis on comparing point estimates from the three methods. In real world practice, it might be preferable to calibrate the sample for each state to ensure each state average from SAEs match the overall state prevalence without SAE (

To some degree, all three methods considered spatial effects. The multilevel method uses the state-county hierarchy to account for some spatial effects while Models 2 and 3, use Markov random fields to work through geographically connected boundaries among all counties. While Model 1 does not consider local variation, models 2 and 3 were unable to incorporate geographic entities that completely separately or do not have local connectivity to other entities, such as Alaska or Hawaii. For this reason, we were unable to include Alaska and Hawaii due to their geographic separation. In addition, none of the three SAE methods considered spatial clustering. However, we know when data are fairly complete (e.g., births and deaths), model-based estimates would be substantially biased when spatial clustering or spatial association effects were not removed (

All three methods were able to display elevated county-level diabetes prevalence in the South. While their point estimates were very highly correlated, the highest coloration was between the multilevel and spatial logistic methods (r = 0.86), suggesting much higher consistency compared to the spatial Poisson regression method. While there are apparent differences in point estimates among the three SAE methods, their top and bottom 20 percent distributions are fairly consistent. Each method outputs would support consistent policy making in terms of top and bottom percent counties for diabetes prevalence.

We thank for the consulting support from Institute of Health Metrics and Evaluation at the University of Washington.

The authors have no competing interests to declare.

The views expressed on statistical issues in this paper are those of the authors and not necessarily those of Economic Research Service, U.S. Department of Agriculture.