Stata:偏差校正倾向得分匹配及PSM操作应用

Connor 火币交易所 2022-10-04 295 0

计量经济学服务中心专辑汇总ASPMEX!计量百科 ·资源·干货:

Stata |Python |Ma tlab |Eviews |R

Geoda |A rcGis |GeodaSpace |SPSS

一文读懂 |数据资源 |回归方法 |网络爬虫

门 限回归 |工具变量 | 内生性 |空间计量

因 果推断 |合成控制法 |倾向匹配得分 |断点回归 |双重差 分

面板数据 | 动态面板数据

计量经济学服务中心专辑汇总ASPMEX!计量百科 ·资源·干货:

Stata |Python |Ma tlab |Eviews |R

Geoda |A rcGis |GeodaSpace |SPSS

一文读懂 |数据资源 |回归方法 |网络爬虫

门 限回归 |工具变量 | 内生性 |空间计量

因 果推断 |合成控制法 |倾向匹配得分 |断点回归 |双重差 分

面板数据 | 动态面板数据

Stata:偏差校正倾向得分匹配及PSM操作应用 一.命令介绍**

偏差校正倾向得分匹配方法对应命令为:nnmatch

语法格式为:

nnmatch depvar treatvar varlist_nnmatch [ ifexp] [ inrange] [pw] [, tc(ate |att |atc) m( #) metric(maha |matname) exact(varlist_ex) biasadj(bias |varlist_adj) robust(#_v) population level(#) keep(filename) replace]

详细解释为:

depvar :结果变量

treatvar:处理变量

varlist_nnmatch :匹配变量

tc(ate|att|atc) specifies which treatment effect is to be estimated:

ate: the average treatment effect,

att: the average treatment effect for the treated, or

atc: the average treatment effect for the controls.

metric(maha |matname) :metric(maha)表示使用马氏距离ASPMEX,即权重矩阵为样本协方差矩阵的逆矩阵

m(#) :进行#近邻匹配,默认#=1ASPMEX

robust(#_v):表示进行异方差稳健的标准误

展开全文

depvar :结果变量

treatvar:处理变量

varlist_nnmatch :匹配变量

tc(ate|att|atc) specifies which treatment effect is to be estimated:

ate: the average treatment effect,

att: the average treatment effect for the treated, or

atc: the average treatment effect for the controls.

metric(maha |matname) :metric(maha)表示使用马氏距离ASPMEX,即权重矩阵为样本协方差矩阵的逆矩阵

m(#) :进行#近邻匹配,默认#=1ASPMEX

robust(#_v):表示进行异方差稳健的标准误

nnmatch y t x1, m(3)

nnmatch y t x1 x2, tc(att)

nnmatch y t x1 x2, tc(atc) met(maha) bias(bias) robust(4)

nnmatch y t x1 x2, met(matname) bias(x1 x3) keep(artdata) replace

nnmatch y t x1 x2 [w=w], met(matname) bias(x1 x3) exact(x4) pop

二.偏差校正匹配估计量操作应用**

本文仍然使用倾向得分匹配所对应的案例数据ASPMEX,所对应的变量数据结构为:

. desc

Contains data from E:\2022年8月Stata课程2022.08.13--2022.08.15\data\ldw_exper.dta

obs: 445

vars: 12 30 Jan 2013 12:47

size: 12,015

storage display value

variable name typeformat label variable label

t byte %8.0g participation injob training program

age byte %8.0g age

educ byte %8.0g years of education

black byte %8.0g indicator forAfrican-American

hisp byte %8.0g indicator forHispanic

married byte %8.0g indicator formarried

nodegree byte %8.0g indicator formore than grade school but

less than high-school education

re74 float%9.0g real earnings in1974 ( inthousands of

1978 $)

re75 float%9.0g real earnings in1975 ( inthousands of

1978 $)

re78 float%9.0g real earnings in1978 ( inthousands of

1978 $)

u74 float%9.0g indicator forunemployed in1974

u75 float%9.0g indicator forunemployed in1975

Sorted by:

首先使用一对一的匹配ASPMEX,不做偏差校正,但是进行稳健标准误估计: nnmatch re78 t age edu black his married re74 re75 u74 u75, tc(atc) m(1) robust(1)

结果为:

. nnmatch re78 t age edu black his married re74 re75 u74 u75, tc(atc) m(1) robust(1)

Matching estimator: Average Treatment Effect forthe Controls

Weighting matrix: inverse variance Number of obs = 445

Number of matches (m) = 1

Number of matches,

robust std. err. (h) = 1

re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]

SATC | 2.262412 .9681856 2.34 0.019 .3648029 4.160021

Matching variables: age educ black hisp married re74 re75 u74 u75

上表显示权重矩阵为默认的ASPMEX,即对角线元素为各变量样本方差的对角矩阵之逆矩阵,ATT的估计值为2.2624,并且在5%的水平下显著

下面进行进行偏差校正

nnmatch re78 t age edu black his married re74 re75 u74 u75, tc(atc) m(1) robust(1) bias(bias)

结果为:

. nnmatch re78 t age edu black his married re74 re75 u74 u75, tc(atc) m(1) robust(1)

> bias(bias)

Matching estimator: Average Treatment Effect forthe Controls

Weighting matrix: inverse variance Number of obs = 445

Number of matches (m) = 1

Number of matches,

robust std. err. (h) = 1

re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]

SATC | 2.160295 .9681856 2.23 0.026 .262686 4.057904

Matching variables: age educ black hisp married re74 re75 u74 u75

Bias-adj variables: age educ black hisp married re74 re75 u74 u75

发现atc的值减少到2.1603ASPMEX,并且也在5%显著性水平下显著

下面使用样本协方差矩阵的逆矩阵为权重矩阵ASPMEX,metric(maha)即使用马氏距离

nnmatch re78 t age edu black his married re74 re75 u74 u75, tc(atc) m(1) robust(1) bias(bias) metric(maha)

结果为:

. nnmatch re78 t age edu black his married re74 re75 u74 u75, tc(atc) m(1) robust(1)

> bias(bias) metric(maha)

Matching estimator: Average Treatment Effect forthe Controls

Weighting matrix: Mahalanobis Number of obs = 445

Number of matches (m) = 1

Number of matches,

robust std. err. (h) = 1

re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]

SATC | 2.24162 .9538869 2.35 0.019 .3720359 4.111204

Matching variables: age educ black hisp married re74 re75 u74 u75

Bias-adj variables: age educ black hisp married re74 re75 u74 u75

大家都在读: 一文读懂倾向得分匹配法(PSM)举例及stata实现(一)

一、倾向匹配得分应用之培训对工资的效应

政策背景:国家支持工作示范项目( National Supported Work,NSW )

研究目的:检验接受该项目(培训)与不接受该项目(培训)对工资的影响ASPMEX。基本思想:分析接受培训组(处理组, treatment group )接受培训行为与不接受培训行为在工资表现上的差异。但是,现实可以观测到的是处理组接受培训的事实,而处理组没有接受培训会怎样是不可能观测到的,这种状态也成为反事实( counterfactual )。

匹配法就是为了解决这种不可观测事实的方法ASPMEX。在倾向得分匹配方法( Propensity Score Matching )中,根据处理指示变量将样本分为两个 组,一是处理组,在本例中就是在 NSW 实施后接受培训的组;二是对照组 ( comparison group ),在本例中就是在 NSW 实施后不接受培训的组。倾向得分 匹配方法的基本思想是,在处理组和对照组样本通过一定的方式匹配后,在其他 条件完全相同的情况下,通过接受培训的组(处理组)与不接受培训的组(对照 组)在工资表现上的差异来判断接受培训的行为与工资之间的因果关系。

注:本例节选自 Cameron&Trivedi 《微观计量经济学:方法与应用》(中译本,上海财经大学出版社, 2010 ) pp794-800 所有数据及程序均来自于本书的配套网站(

政策背景:国家支持工作示范项目( National Supported Work,NSW )

研究目的:检验接受该项目(培训)与不接受该项目(培训)对工资的影响

ASPMEX。基本思想:分析接受培训组(处理组, treatment group )接受培训行为与不接受培训行为在工资表现上的差异。但是,现实可以观测到的是处理组接受培训的事实,而处理组没有接受培训会怎样是不可能观测到的,这种状态也成为反事实( counterfactual )。

匹配法就是为了解决这种不可观测事实的方法

ASPMEX。在倾向得分匹配方法( Propensity Score Matching )中,根据处理指示变量将样本分为两个 组,一是处理组,在本例中就是在 NSW 实施后接受培训的组;二是对照组 ( comparison group ),在本例中就是在 NSW 实施后不接受培训的组。倾向得分 匹配方法的基本思想是,在处理组和对照组样本通过一定的方式匹配后,在其他 条件完全相同的情况下,通过接受培训的组(处理组)与不接受培训的组(对照 组)在工资表现上的差异来判断接受培训的行为与工资之间的因果关系。

注:本例节选自 Cameron&Trivedi 《微观计量经济学:方法与应用》(中译本,上海财经大学出版社, 2010 ) pp794-800 所有数据及程序均来自于本书的配套网站(

Contains data from E:\2022年8月Stata课程2022.08.13--2022.08.15\data\ldw_exper.dta

obs: 445

vars: 12 30 Jan 2013 12:47

size: 12,015

storage display value

variable name typeformat label variable label

t byte %8.0g participation injob training program

age byte %8.0g age

educ byte %8.0g years of education

black byte %8.0g indicator forAfrican-American

hisp byte %8.0g indicator forHispanic

married byte %8.0g indicator formarried

nodegree byte %8.0g indicator formore than grade school but

less than high-school education

re74 float%9.0g real earnings in1974 ( inthousands of

1978 $)

re75 float%9.0g real earnings in1975 ( inthousands of

1978 $)

re78 float%9.0g real earnings in1978 ( inthousands of

1978 $)

u74 float%9.0g indicator forunemployed in1974

u75 float%9.0g indicator forunemployed in1975

Sorted by:

描述性分析 tabulate t, summarize(re78) means standard

结果为:

tabulate t, summarize(re78) means standard

participati | Summary of real

on injob | earnings in1978 ( in

training | thousands of 1978 $)

program | Mean Std. Dev.

0 | 4.5548023 5.4838368

1 | 6.3491454 7.8674047

Total | 5.3007651 6.6314934

三、倾向匹配得分操作

数据介绍 :Data used by Lalonde (1986)We are interested in the possible effect of participation in a job training program on individuals earnings in 1978This dataset has been used by many authors ( Abadie et al. 2004,Becker and Ichino, 2002, Dehejia and Wahba, 1999).

数据介绍 :Data used by Lalonde (1986)We are interested in the possible effect of participation in a job training program on individuals earnings in 1978This dataset has been used by many authors ( Abadie et al. 2004,Becker and Ichino, 2002, Dehejia and Wahba, 1999).

gen u=runiform

sort u //排序

或者order u

上述命令是为了生成伪随机数

ASPMEX

,满足01的均匀分布 ** local** v1 "t"

** local** v2 "age edu black hisp married re74 re75 u74 u75"

**global** x "`v1' `v2' "

psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹配

$表示引用宏变量

ASPMEX

psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹

**等价于

psmatch2 t age edu black hisp married re74 re75 u74 u75, out(re78) neighbor(1) ate ties logit common

下面用pstest查看匹配效果是否较好的平衡

ASPMEX

了数据 下面用pstest查看匹配效果是否较好的平衡

ASPMEX

了数据 psmatch2 t age edu black hisp married re74 re75 u74 u75, out(re78) neighbor(1) ate ties logit common // 1:1 匹

pstest age edu black hisp married re74 re75 u74 u75, both graph

psgraph

结果为:

psgraph

完整结果为:

setseed 20180105

. gen u=runiform

. sort u

. localv1 "t"

. localv2 "age edu black hisp married re74 re75 u74 u75"

. global x "`v1' `v2' "

. psmatch2 $x, out(re78) neighbor(1) ate ties logit common

Logistic regression Number of obs = 445

LR chi2(9) = 11.70

Prob > chi2 = 0.2308

Log likelihood = -296.25026 Pseudo R2 = 0.0194

t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

age | .0142619 .0142116 1.00 0.316 -.0135923 .0421162

educ | .0499776 .0564116 0.89 0.376 -.060587 .1605423

black | -.347664 .3606532 -0.96 0.335 -1.054531 .3592032

hisp | -.928485 .50661 -1.83 0.067 -1.921422 .0644523

married | .1760431 .2748817 0.64 0.522 -.3627151 .7148012

re74 | -.0339278 .0292559 -1.16 0.246 -.0912683 .0234127

re75 | .01221 .0471351 0.26 0.796 -.0801731 .1045932

u74 | -.1516037 .3716369 -0.41 0.683 -.8799987 .5767913

u75 | -.3719486 .317728 -1.17 0.242 -.9946841 .2507869

_cons | -.4736308 .8244205 -0.57 0.566 -2.089465 1.142204

There are observations with identical propensity score values.

The sort order of the data could affect your results.

Make sure that the sort order is random before calling psmatch2.

> --

Variable Sample | Treated Controls Difference S.E. T-st

> at

> --

re78 Unmatched | 6.34914538 4.55480228 1.79434311 .632853552 2.

> 84

ATT | 6.40495818 4.99436488 1.4105933 .839875971 1.

> 68

ATU | 4.52683013 6.15618973 1.6293596 .

> .

ATE | 1.53668776 .

> .

> --

Note: S.E. does not take into account that the propensity score is estimated.

psmatch2: | psmatch2: Common

Treatment | support

assignment | Off suppo On suppor | Total

Untreated | 11 249 | 260

Treated | 2 183 | 185

Total | 13 432 | 445

. pstest age edu black hisp married re74 re75 u74 u75, both graph

> --

Unmatched | Mean %reduct | t-test | V(T)/

Variable Matched | Treated Control %bias |bias| | t p>|t| | V(C)

> --

age U | 25.816 25.054 10.7 | 1.12 0.265 | 1.03

M | 25.781 25.383 5.6 47.7 | 0.52 0.604 | 0.91

educ U | 10.346 10.088 14.1 | 1.50 0.135 | 1.55*

M | 10.322 10.415 -5.1 63.9 | -0.49 0.627 | 1.52*

black U | .84324 .82692 4.4 | 0.45 0.649 | .

M | .85246 .86339 -2.9 33.0 | -0.30 0.765 | .

hisp U | .05946 .10769 -17.5 | -1.78 0.076 | .

M | .06011 .04372 5.9 66.0 | 0.71 0.481 | .

married U | .18919 .15385 9.4 | 0.98 0.327 | .

M | .18579 .19126 -1.4 84.5 | -0.13 0.894 | .

re74 U | 2.0956 2.107 -0.2 | -0.02 0.982 | 0.74*

M | 2.0672 1.9222 2.7 -1166.6 | 0.27 0.784 | 0.88

re75 U | 1.5321 1.2669 8.4 | 0.87 0.382 | 1.08

M | 1.5299 1.6446 -3.6 56.7 | -0.32 0.748 | 0.82

u74 U | .70811 .75 -9.4 | -0.98 0.326 | .

M | .71038 .75956 -11.1 -17.4 | -1.06 0.288 | .

u75 U | .6 .68462 -17.7 | -1.85 0.065 | .

M | .60656 .63388 -5.7 67.7 | -0.54 0.591 | .

> --

* ifvariance ratio outside [0.75; 1.34] forU and [0.75; 1.34] forM

Sample | Ps R2 LR chi2 p>chi2 MeanBias MedBias B R %Var

Unmatched | 0.019 11.75 0.227 10.2 9.4 33.1* 0.82 50

Matched | 0.008 3.87 0.920 4.9 5.1 20.6 1.09 25

* ifB>25%, R outside [0.5; 2]

五、PSM命令简介

Stata does not have a built-in command for propensity score matching, a non-experimental method of sampling that produces a control group whose distribution of covariates is similar to that of the treated group. However, there are several user-written modules for this method. The following modules are among the most popular:

Stata没有一个内置的倾向评分匹配的命令,一种非实验性的抽样方法,它产生一个控制组,它的协变量分布与被处理组的分布相似

ASPMEX

。但是,这个方法有几个用户编写的模块。以下是最受欢迎的模块(主要有如下几个外部命令) **psmatch2.ado**

pscore.ado

nnmatch.ado

psmatch2.ado was developed by Leuven and Sianesi (2003) and pscore.ado by Becker and Ichino (2002). More recently, Abadie, Drukker, Herr, and Imbens (2004) introduced nnmatch.ado. All three modules support pair-matching as well as subclassification.

You can find these modules using the .net commandas follows:

net search psmatch2

net search pscore

net search nnmatch

You can install these modules using the .ssc or .net command, for example:

ssc install psmatch2, replace

After installation, read the help files to find the correct usage, for example:

helppsmatch2

上述主要介绍了如何获得PSM相关的命令

ASPMEX

,总结一下目前市面上用的较好的命令为psmatch2. PSM 相关命令 helppsmatch2

helpnnmatch

helppsmatch

helppscore

持续获取最新的 PSM 信息和程序

findit propensity score

findit matching

psmatch2 is being continuously improved and developed. Make sure to keep your version up-to-date as follows

ssc install psmatch2, replace

where you can check your version as follows:

whichpsmatch2

语法格式

helppsmatch2

psmatch2 depvar [indepvars] [ ifexp] [ inrange] [, outcome(varlist)

pscore(varname) neighbor( integer) radius caliper(real)

mahalanobis(varlist) ai( integer) population altvariance

kernel llr kerneltype( type) bwidth(real) spline

nknots( integer) common trim(real) noreplacement

descending odds index logit ties quietly w(matrix) ate]

where indepvars and mahalanobis(varlist) may contain factor variables;

see fvvarlist.

psmatch2 D x1 x2 x3, outcome(y)

pscore(varname) neighbor( integer) radius caliper(real)

mahalanobis(varlist) ai( integer) population altvariance

kernel llr kerneltype( type) bwidth(real) spline

nknots( integer) common trim(real) noreplacement

descending odds index logit ties quietly w(matrix) ate]

核匹配 (Kernel matching)

核匹配 (Kernel matching)

ASPMEX

他匹配方法 广义精确匹配(Coarsened Exact Matching) || help cem

局部线性回归匹配 (Local linear regression matching)

样条匹配 (Spline matching)

马氏匹配 (Mahalanobis matching)

广义精确匹配(Coarsened Exact Matching) || help cem

局部线性回归匹配 (Local linear regression matching)

样条匹配 (Spline matching)

马氏匹配 (Mahalanobis matching)

评论