Elsevier

Value in Health

Volume 9, Issue 6, November–December 2006, Pages 377-385
Value in Health

Too Much Ado about Propensity Score Models? Comparing Methods of Propensity Score Matching

https://doi.org/10.1111/j.1524-4733.2006.00130.xGet rights and content
Under an Elsevier user license
open archive

Abstract

Objective

A large number of possible techniques are available when conducting matching procedures, yet coherent guidelines for selecting the most appropriate application do not yet exist. In this article we evaluate several matching techniques and provide a suggested guideline for selecting the best technique.

Methods

The main purpose of a matching procedure is to reduce selection bias by increasing the balance between the treatment and control groups. The following approach, consisting of five quantifiable steps, is proposed to check for balance: 1) Using two sample t-statistics to compare the means of the treatment and control groups for each explanatory variable; 2) Comparing the mean difference as a percentage of the average standard deviations; 3) Comparing percent reduction of bias in the means of the explanatory variables before and after matching; 4) Comparing treatment and control density estimates for the explanatory variables; and 5) Comparing the density estimates of the propensity scores of the control units with those of the treated units. We investigated seven different matching techniques and how they performed with regard to proposed five steps. Moreover, we estimate the average treatment effect with multivariate analysis and compared the results with the estimates of propensity score matching techniques. The Medstat MarketScan Data Base provided data for use in empirical examples of the utility of several matching methods. We conducted nearest neighborhood matching (NNM) analyses in seven ways: replacement, 2 to 1 matching, Mahalanobis matching (MM), MM with caliper, kernel matching, radius matching, and the stratification method.

Results

Comparing techniques according to the above criteria revealed that the choice of matching has significant effects on outcomes. Patients with asthma are compared with patients without asthma and cost of illness ranged from $2040 to $4463 depending on the type of matching. After matching, we looked at the insignificant differences or larger P-values in the mean values (criterion 1); low mean differences as a percentage of the average standard deviation (criterion 2); 100% reduction bias in the means of explanatory variables (criterion 3); and insignificant differences when comparing the density estimates of the treatment and control groups (criterion 4 and criterion 5). Mahalanobis matching with caliber yielded the better results according all five criteria (Mean = $4463, SD = $3252). We also applied multivariate analysis over the matched sample. This decreased the deviation in cost of illness estimates more than threefold (Mean = $4456, SD = $996).

Conclusion

Sensitivity analysis of the matching techniques is especially important because none of the proposed methods in the literature is a priori superior to the others. The suggested joint consideration of propensity score matching and multivariate analysis offers an approach to assessing the robustness of the estimates.

Keywords

propensity score matching
randomization
selection bias

Cited by (0)