...

Two-Step Condence Sets Isaiah Andrews September 29, 2014

by user

on
Category: Documents
23

views

Report

Comments

Transcript

Two-Step Condence Sets Isaiah Andrews September 29, 2014
Two-Step Condence Sets
and the Trouble with the First Stage F Statistic
Isaiah Andrews
1 Harvard
1
Society of Fellows
September 29, 2014
Introduction
Two-step condence sets
When weak identication is a concern, researchers typically
proceed in two steps:
1
First calculate statistic intended to measure identication
strength, for example rst stage F
2
Then:
If model seems well-identied, proceed as usual and report
non-robust condence sets
If weak identication seems an issue, calculate robust
condence sets, seek new specication, or don't report results
I focus on the case where we always report some condence set
Other cases can lead to terrible properties for reported
condence sets
Introduction
Two-step condence sets
Represent the outcome of the rst step by the
statistic φICS ∈ {0, 1}
φICS = 1
identication
indicates evidence of weak identication
In the second step we'll use
CSNR if identication seems strong
CSR if identication seems weak
We can write two-step condence sets as
CSNR
CSR
(
CS2S =
if
if
φICS = 0
φICS = 1
Introduction
Example: Linear IV
We could let
CSNR
be the t-statistic condence set based on
2SLS or LIML
CSR to be an Anderson-Rubin condence set
For F the rst-stage F-statistic, the commonly-used Staiger
Let
and Stock (1997) rule of thumb corresponds to
φICS = 1 {F ≤ 10}
For these choices,
CS2S
reports the usual t-statistic CS when
reports the Anderson-Rubin CS when
F > 10
F ≤ 10
Introduction
Coverage distortions
I'll be interested in the coverage of the two-step condence set
Prβ0 {β0 ∈ CS2S }
Throughout I'll assume that
CSNR has coverage at least 1 − α under strong identication
CSR has coverage at least 1 − α under both strong and weak
identication
I'll be interested in the maximal coverage distortion for
dened as the smallest
γ
such that
Prβ0 {β0 ∈ CS2S } ≥ 1 − α − γ
CS2S ,
Introduction
Coverage distortions
Unsurprisingly,
φICS
CS2S
can exhibit large distortions if we choose
poorly
Perhaps more surprising- Stock and Yogo (2005) show that
pretests based on the rst-stage F-statistic can bound
coverage distortions in linear IV with homoskedastic errors
For example, Stock and Yogo give a test for the null that the
nominal 5% t-test has true size exceeding 10%
For 2SLS, critical values larger than the rule of thumb value
c = 10
Problem: we frequently think economic data are
heteroskedastic, serially correlated, clustered, etc
Introduction
This project
1
I demonstrate numerically that the rst-stage F statistic is not
a reliable measure of identication strength in IV with
heteroskedastic errors
2
I propose a general approach to constructing robust two-step
condence sets that
1
is applicable to general GMM models
2
controls coverage up to an arbitrarily small, user-selected level
of distortion
3
γ
when identication is weak
indicates strong identication with probability tending to one
under standard asymptotics
Introduction
Main Idea
Under strong identication, many dierent condence sets are
asymptotically equivalent
In particular, we can construct robust condence sets with
coverage 1
−α−γ
which are contained in
CSNR
with
probability tending to one under strong identication
Thus, to asses the quality to the usual asymptotic
approximations, we can compare robust and non-robust
condence sets
By choosing
φICS
in this way, we can ensure that
coverage at least 1
−α−γ
CS2S
has
Introduction
Why study two-step procedures?
Two-step condence sets are often unappealing from a
theoretical perspective
Nonetheless, pretests for identication are extremely common
in empirical work
Commonly used pretests are unreliable, so alternatives are
needed
Robust condence sets that are ecient under strong ID give
a natural, reliable way to asses ID strength
Further argument for computing and reporting such condence
sets
Outline
1
Introduction
2
Problems with the First-Stage F-statistic
3
A Simple Two-Step Condence Set
4
Simulation Performance
5
Reporting Results
6
Conclusion
Problems with the First-Stage F-statistic
Linear IV model
Consider the linear IV model
Yt = Xt β0 + V1,t
Xt = Zt0 π + V2,t
Zt a k × 1 vector of instruments,
E [Zt V1,t ] = E [Zt V2,t ] = 0
with
Simulations set
β0 = 0
Calibrations with moderate and high endogeneity
Problems with the First-Stage F-statistic
Simulation design
I report results for two-step condence sets with
CSNR
the nominal 95% Wald condence set based on:
2SLS
LIML
CSR
the Anderson-Rubin condence set
φICS = 1 {F ≤ c }
for:
the rule-of-thumb cuto
c =
10 (RoT)
the Stock and Yogo (2005) cutos (SY)
SY cutos guarantee coverage at least 85% in homoskedastic
case
For heteroskedastic simulations, I use a heteroskedastic-robust
form of the rst stage F
Problems with the First-Stage F-statistic
Minimal coverage: homoskedastic case
Moderate Endogeneity
High Endogeneity
Minimal Coverage
k=5
k = 10 k = 20
k=5
k = 10 k = 20
RoT LIML
92.9%
93.1%
93.5%
90.4%
91.5%
91.4%
RoT 2SLS
90.4%
89.1%
89.2%
82%
76.1%
64.2%
SY LIML
91.6%
90.6%
89.4%
88.4%
89.2%
89.7%
SY 2SLS
92.6%
92.4%
93.6%
87.5%
87.7%
87.7%
Table : Minimal coverage for condence sets in homoskedastic IV
simulations with 10,000 observations.
Problems with the First-Stage F-statistic
Minimal coverage: heteroskedastic case
Moderate Endogeneity
High Endogeneity
Minimal Coverage
k=5
k = 10 k = 20
k=5
k = 10 k = 20
RoT LIML
63.2%
44.8%
31.9%
21.8%
9.5%
1%
RoT 2SLS
55.1%
30.8%
41.8%
0%
0%
0%
RoT CUGMM
45.4%
18.4%
37.3%
71.5%
54.7%
13.1%
RoT 2SGMM
35.4%
13%
34.6%
1.9%
0%
0%
SY LIML
61.2%
43%
29.1%
21.8%
9.5%
1%
SY 2SLS
63.6%
40.2%
56.1%
0%
0%
0%
SY CUGMM
39.5%
15.3%
34.1%
63.2%
54.7%
3.1%
SY 2SGMM
46.7%
19.8%
49.2%
2.8%
0%
0%
Table : Minimal coverage for condence sets in heteroskedastic IV
simulations with 10,000 observations.
Problems with the First-Stage F-statistic
Minimal coverage: heteroskedastic case
As we can see, all of the two-step procedures studied can
exhibit large coverage distortions
The problem: when the data are heteroskedastic, the rst
stage F-statistic doesn't give a reliable guide to identication
strength
k = 10 moderate endogeneity calibration, substantial
coverage distortions persist even for E [F ] = 500
In the
High endogeneity calibration much more extreme: signicant
distortions even for
E [F ] = 100, 000
Outline
1
Introduction
2
Problems with the First-Stage F-statistic
3
A Simple Two-Step Condence Set
4
Simulation Performance
5
Reporting Results
6
Conclusion
A Simple Two-Step Condence Set
Main Idea
The idea: compare robust and non-robust condence sets.
In general GMM models, for all the commonly-used non-robust
CSNR
γ > 0, we can construct a
preliminary robust condence set CSR ,P with
1 Prβ {β0 ∈ CSR ,P } ≥ 1 − α − γ regardless of identication
0
condence sets
and any
strength
2
3
Pr {CSR ,P ⊆ CSNR } → 1 under strong identication
CSR ,P ⊆ CSR
A Simple Two-Step Condence Set
Main Idea
If we take
φICS = 1 {CSR ,P 6⊆ CSNR }
φICS →p 0 under
CSR ,P ⊆ CS2S so
then
strong identication. Moreover
Prβ0 {β0 ∈ CS2S } ≥ 1 − α − γ
regardless of identication strength
GMM Model
Consider a potentially weakly identied GMM model with
Eβ0 [gT (β0 )] = 0
1 P
Linear IV: gT (β) =
Zt (Yt − Xt β)
T
identifying assumption
For convenience, continue to assume dim (β)
=1
Let
lim
T →∞
Var
! √
T
g
(β)
Σ (β) Σβ (β)
√ ∂T
=
Σβ (β) Σββ (β)
T ∂β gT (β)
A Simple Two-Step Condence Set
Robust test statistics
Dene the
S
statistic of Stock and Wright (2000) as
S (β) = T · gT (β)0 Σ̂ (β)−1 gT (β) .
S (β0 ) →d
χ2k
regardless of identication strength
Following Kleibergen (2005), let
DT (β) =
∂
gT (β) − Σ̂β (β) Σ̂ (β)−1 gT (β)
∂β
and dene the K statistic
1
K (β) = T · gT (β)0 Σ̂ (β)− 2 PΣ̂(β)− 12 D
K (β0 ) →d
χ21
Formal statement
1
T (β)
Σ̂ (β)− 2 gT (β)
regardless of identication strength
A Simple Two-Step Condence Set
Local asymptotic equivalence
Under standard assumptions, for
on an ecient estimator
√
sup
T kβ−β0 k≤C
W (β) a Wald statistic based
β̂
kW (β) − K (β)k = op (1)
under strong identication
Formal statement
Thus, the K-statistic condence set
β : K (β) ≤ χ21,1−α
is asymptotically equivalent to the usual Wald condence set
√
on
T -neighborhoods of β0
More general statement given in paper
A Simple Two-Step Condence Set
Non-local non-equivalence
Unfortunately, the equivalence of the K and Wald condence
sets holds only locally, not globally
K condence sets are often inconsistent, in the sense that even
in strongly identied models they fail to shrink towards the
true parameter value as the sample grows
A Simple Two-Step Condence Set
A consistent robust condence set
To obtain a consistent condence set, for
a > 0 consider
CSR ,P = β : K (β) + a · S (β) ≤ χ21,1−α .
K (β) + a · S (β) is a linear combination statistic, as in
Andrews (2013)
This condence set has coverage
1
− α − γ (a) = Pr (1 + a) · χ21 + a · χ2k −1 ≤ χ21,1−α
regardless of identication strength
γ→0
as
level of
γ
a → 0, and we can choose a to obtain any desired
A Simple Two-Step Condence Set
Detecting weak identication
For
CSNR
the Wald condence set, under strong identication
Pr {CSR ,P ⊆ CSNR } → 1
To asses whether the usual strong-identication
approximations are reasonable, can check whether
CSR ,P ⊆ CSNR
Motivates choice
Gives
φICS = 1 {CSR ,P 6⊆ CSNR }
Pr {β0 ∈ CS2S } ≥ 1 − α − γ
Outline
1
Introduction
2
Problems with the First-Stage F-statistic
3
A Simple Two-Step Condence Set
4
Simulation Performance
5
Reporting Results
6
Conclusion
Simulation Performance
Linear IV
For comparability with my earlier simulation results, I set
CSR ,P has coverage 85%
Thus, CS2S has coverage at least 85% as well
γ = 10%,
so
I again start by considering the homoskedastic case
Simulation Performance
Homoskedastic Linear IV
Moderate Endogeneity
k=5
k = 10 k = 20 k = 5 k = 10 k = 20
LIML
92.6%
92.4%
90.4%
86%
87%
85%
2SLS
92.8%
92.8%
92.9%
86%
87%
85%
Minimal Coverage
CS2S
CS2S
High Endogeneity
Table : Minimal coverage for dierent condence sets in homoskedastic
IV simulations with 10,000 observations.
Simulation Performance
Homoskedastic Linear IV
We see that
CS2S
controls coverage distortions in the
homoskedastic case
Since procedures based on the rst-stage F statistic also work
here, we can compare their performance
In particular, we might worry that my
distortions by setting
φICS = 1
based on the rst-stage F
This isn't the case
CS2S
controls coverage
more often than procedures
Simulation Performance
Pretest comparison
1
SY LIML
ICS LIML
SY 2SLS
ICS 2SLS
0.9
0.8
0.7
E[φICS]
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
Mean of First Stage F−statistic
Figure :
E [φICS ] = Pr {φICS
= 1}
plotted against the mean of the rst
stage F-statistic in moderate endogeneity homoskedastic linear IV
calibration with
k = 10.
Simulation Performance
Heteroskedastic Linear IV
Moderate Endogeneity
Minimal Coverage
CS2S LIML
CS2S 2SLS
CS2S CUGMM
CS2S 2SGMM
High Endogeneity
k=5
k = 10 k = 20
k=5
k = 10 k = 20
94.1%
92.4%
93%
86.8%
86.8%
85.2%
93.7%
94%
94.3%
86.7%
86.6%
85.2%
95%
94.1%
93.4%
86.8%
92.6%
88.8%
94.5%
94%
93.9%
86.8%
92.8%
88.8%
Table : Minimal coverage for dierent condence sets in heteroskedastic
IV simulations with 10,000 observations.
Outline
1
Introduction
2
Problems with the First-Stage F-statistic
3
A Simple Two-Step Condence Set
4
Simulation Performance
5
Reporting Results
6
Conclusion
Reporting Results
Picking
γ
1
My discussion so far assumes a choice of the maximal
distortion
γ
What if you don't like my
2
γ?
When used at all in empirical work robust condence sets
typically supplement, rather than replace, non-robust ones
Reporting Results
Pick some
γmin ≥ 0.
For
γ ≥ γmin ,
consider the family of robust
condence sets
CSR ,P (γ) = β : K (β) + a (γ) · S (β) ≤ χ21,1−α
where
CSR ,P (γ) has coverage 1 − α − γ
Reporting Results
Note that decreasing in
γ : γ ≤ γ 0 ⇒ CSR ,P (γ 0 ) ⊆ CSR ,P (γ)
CSR such that CSR ,P (γmin ) ⊆ CSR
Dene the maximal distortion cuto
Consider
γ̂ = min {γ ≥ γmin : CSR ,P (γ) ⊆ CSNR }
Reporting Results
CSNR , CSR , and γ̂
∗
If I adopt the rule that I will use CSNR if γ̂ ≤ γ , and will use
CSR otherwise, my condence set has coverage at least
We can report
1
− α − γ∗
Reporting Results
Income and Democracy
Acemoglu, Johnson, Robinson, and Yared (2008) study the
relationship between per-capita income and measures of
political democracy
They argue that once one controls for country xed eects,
there is no signicant relationship
Consider a number of specications, including one which
instruments income with the trade-weighted income of a
country's trading partners
Cervellati, Jung, Sunde and Vischer (2014) argue that this
zero eect nding masks heterogeneity across countries, and in
particular that the relationship between income and democracy
is negative in former European colonies
I re-examine the identifying power of the trade-weighted
income instrument
Reporting Results
Income and Democracy
Specication
I
II
III
2SLS
-0.120 (0.11)
-1.37 (23.0)
-0.16 (0.07)
First Stage F
26.5
0.0
57.6
γ̂
20.2%
0%
57.9%
S CS
[-1,1]
[-1,1]
[-1,1]
Sample Restriction
None
Non-colonies
Former Colonies
Observations
895
218
718
Table : Results based on Acemoglu et. al. (2008) data
Conclusion
I show that two-step condence sets based on the rst-stage
F-statistic can be highly unreliable in non-homoskedastic data
I propose an alternative approach to detecting weak
identication and constructing two-step condence sets which
Controls coverage distortions under weak identication and is
applicable to general GMM models
Indicates strong identication with probability tending to one
under standard asymptotics
While I've framed my discussion using linear combination
condence sets, many other options also work
Close connection between ecient robust condence sets and
reliable assessments of identication strength
The End
Thank you!
Formal Results
Weak identication assumptions
Assumption
For JT (β) = ET
1
√
T
X
T t =1
vec
h
∂
∂β 0 gT
i
(β) ,
gt (β0 )
∂
∂β 0 gt (β0 )
!
− JT (β0 )
→d N
Σg
0,
Σθg
Σg θ
Σθ
where Σg is positive denite and
Σg
Σβ g
Σg β
Σββ
=
lim
T →∞
is consistently estimable.
1
VarT ,ξ √
T
X
T t =1
gt (β0 ) vec ∂β∂ 0 gt (β0 )
!!
Formal Results
Weak identication assumptions
Assumption
√
There exists sequence of full-rank O ( T ) normalizing matrices
Λ1,T such that DT (β0 ) Λ1,T →d D for a (possibly degenerate)
Gaussian random matrix D which is full rank almost surely
Formal Results
Identication-robust tests
Theorem
Under these assumptions,
(K (β0 ) , S (β0 ) − K (β0 )) →d χ2p , χ2k −p
and K (β0 ) and S (β0 ) − K (β0 ) are asymptotically independent
Return
Formal Results
Strong identication assumptions
Assumption
1
2
3
gT (β) →p limT →∞ ET [gt (β)] uniformly, and
limT →∞ ET [gt (β)] is continuous in β.
Σ̂g (β) →p Σg (β) uniformly for Σg (β) continuous in β and
everywhere positive denite with a uniformly bounded maximal
eigenvalue and a minimal eigenvalue bounded away from zero
For all ε > 0 there exists δ > 0 such that
0
lim E [gt (β)]
Ω (β) lim ET [gt (β)] < δ
T →∞ T
T →∞
only if kθ − θ0 k < ε.
Formal Results
Strong identication assumptions
Assumption
1
2
3
β0 belongs to the interior of its parameter space
gT (β) and Σ̂g (β) are almost surely continuously dierentiable
on some open ball B (β0 ) around β0
For
∂
g (β) ,
J (β) = lim ET
0 T
T →∞
∂β
J (β) is continuous at β0 , GT (β) = ∂β∂ 0 gT (β) →p J (β)
uniformly on B (β0 ), and J (β0 ) is full-rank
4
supβ∈B (β )
0
∂ vec(Σ̂g (β)) = Op (1)
∂β 0
Formal Results
Identication-robust tests
Theorem
Under these assumptions we have that for all C ≥ 0
√
Return
sup
T kβ−β0 k≤C
kW (β) − K (β)k = op (1).
Fly UP