Two-Step Condence Sets Isaiah Andrews September 29, 2014
by user
Comments
Transcript
Two-Step Condence Sets Isaiah Andrews September 29, 2014
Two-Step Condence Sets and the Trouble with the First Stage F Statistic Isaiah Andrews 1 Harvard 1 Society of Fellows September 29, 2014 Introduction Two-step condence sets When weak identication is a concern, researchers typically proceed in two steps: 1 First calculate statistic intended to measure identication strength, for example rst stage F 2 Then: If model seems well-identied, proceed as usual and report non-robust condence sets If weak identication seems an issue, calculate robust condence sets, seek new specication, or don't report results I focus on the case where we always report some condence set Other cases can lead to terrible properties for reported condence sets Introduction Two-step condence sets Represent the outcome of the rst step by the statistic φICS ∈ {0, 1} φICS = 1 identication indicates evidence of weak identication In the second step we'll use CSNR if identication seems strong CSR if identication seems weak We can write two-step condence sets as CSNR CSR ( CS2S = if if φICS = 0 φICS = 1 Introduction Example: Linear IV We could let CSNR be the t-statistic condence set based on 2SLS or LIML CSR to be an Anderson-Rubin condence set For F the rst-stage F-statistic, the commonly-used Staiger Let and Stock (1997) rule of thumb corresponds to φICS = 1 {F ≤ 10} For these choices, CS2S reports the usual t-statistic CS when reports the Anderson-Rubin CS when F > 10 F ≤ 10 Introduction Coverage distortions I'll be interested in the coverage of the two-step condence set Prβ0 {β0 ∈ CS2S } Throughout I'll assume that CSNR has coverage at least 1 − α under strong identication CSR has coverage at least 1 − α under both strong and weak identication I'll be interested in the maximal coverage distortion for dened as the smallest γ such that Prβ0 {β0 ∈ CS2S } ≥ 1 − α − γ CS2S , Introduction Coverage distortions Unsurprisingly, φICS CS2S can exhibit large distortions if we choose poorly Perhaps more surprising- Stock and Yogo (2005) show that pretests based on the rst-stage F-statistic can bound coverage distortions in linear IV with homoskedastic errors For example, Stock and Yogo give a test for the null that the nominal 5% t-test has true size exceeding 10% For 2SLS, critical values larger than the rule of thumb value c = 10 Problem: we frequently think economic data are heteroskedastic, serially correlated, clustered, etc Introduction This project 1 I demonstrate numerically that the rst-stage F statistic is not a reliable measure of identication strength in IV with heteroskedastic errors 2 I propose a general approach to constructing robust two-step condence sets that 1 is applicable to general GMM models 2 controls coverage up to an arbitrarily small, user-selected level of distortion 3 γ when identication is weak indicates strong identication with probability tending to one under standard asymptotics Introduction Main Idea Under strong identication, many dierent condence sets are asymptotically equivalent In particular, we can construct robust condence sets with coverage 1 −α−γ which are contained in CSNR with probability tending to one under strong identication Thus, to asses the quality to the usual asymptotic approximations, we can compare robust and non-robust condence sets By choosing φICS in this way, we can ensure that coverage at least 1 −α−γ CS2S has Introduction Why study two-step procedures? Two-step condence sets are often unappealing from a theoretical perspective Nonetheless, pretests for identication are extremely common in empirical work Commonly used pretests are unreliable, so alternatives are needed Robust condence sets that are ecient under strong ID give a natural, reliable way to asses ID strength Further argument for computing and reporting such condence sets Outline 1 Introduction 2 Problems with the First-Stage F-statistic 3 A Simple Two-Step Condence Set 4 Simulation Performance 5 Reporting Results 6 Conclusion Problems with the First-Stage F-statistic Linear IV model Consider the linear IV model Yt = Xt β0 + V1,t Xt = Zt0 π + V2,t Zt a k × 1 vector of instruments, E [Zt V1,t ] = E [Zt V2,t ] = 0 with Simulations set β0 = 0 Calibrations with moderate and high endogeneity Problems with the First-Stage F-statistic Simulation design I report results for two-step condence sets with CSNR the nominal 95% Wald condence set based on: 2SLS LIML CSR the Anderson-Rubin condence set φICS = 1 {F ≤ c } for: the rule-of-thumb cuto c = 10 (RoT) the Stock and Yogo (2005) cutos (SY) SY cutos guarantee coverage at least 85% in homoskedastic case For heteroskedastic simulations, I use a heteroskedastic-robust form of the rst stage F Problems with the First-Stage F-statistic Minimal coverage: homoskedastic case Moderate Endogeneity High Endogeneity Minimal Coverage k=5 k = 10 k = 20 k=5 k = 10 k = 20 RoT LIML 92.9% 93.1% 93.5% 90.4% 91.5% 91.4% RoT 2SLS 90.4% 89.1% 89.2% 82% 76.1% 64.2% SY LIML 91.6% 90.6% 89.4% 88.4% 89.2% 89.7% SY 2SLS 92.6% 92.4% 93.6% 87.5% 87.7% 87.7% Table : Minimal coverage for condence sets in homoskedastic IV simulations with 10,000 observations. Problems with the First-Stage F-statistic Minimal coverage: heteroskedastic case Moderate Endogeneity High Endogeneity Minimal Coverage k=5 k = 10 k = 20 k=5 k = 10 k = 20 RoT LIML 63.2% 44.8% 31.9% 21.8% 9.5% 1% RoT 2SLS 55.1% 30.8% 41.8% 0% 0% 0% RoT CUGMM 45.4% 18.4% 37.3% 71.5% 54.7% 13.1% RoT 2SGMM 35.4% 13% 34.6% 1.9% 0% 0% SY LIML 61.2% 43% 29.1% 21.8% 9.5% 1% SY 2SLS 63.6% 40.2% 56.1% 0% 0% 0% SY CUGMM 39.5% 15.3% 34.1% 63.2% 54.7% 3.1% SY 2SGMM 46.7% 19.8% 49.2% 2.8% 0% 0% Table : Minimal coverage for condence sets in heteroskedastic IV simulations with 10,000 observations. Problems with the First-Stage F-statistic Minimal coverage: heteroskedastic case As we can see, all of the two-step procedures studied can exhibit large coverage distortions The problem: when the data are heteroskedastic, the rst stage F-statistic doesn't give a reliable guide to identication strength k = 10 moderate endogeneity calibration, substantial coverage distortions persist even for E [F ] = 500 In the High endogeneity calibration much more extreme: signicant distortions even for E [F ] = 100, 000 Outline 1 Introduction 2 Problems with the First-Stage F-statistic 3 A Simple Two-Step Condence Set 4 Simulation Performance 5 Reporting Results 6 Conclusion A Simple Two-Step Condence Set Main Idea The idea: compare robust and non-robust condence sets. In general GMM models, for all the commonly-used non-robust CSNR γ > 0, we can construct a preliminary robust condence set CSR ,P with 1 Prβ {β0 ∈ CSR ,P } ≥ 1 − α − γ regardless of identication 0 condence sets and any strength 2 3 Pr {CSR ,P ⊆ CSNR } → 1 under strong identication CSR ,P ⊆ CSR A Simple Two-Step Condence Set Main Idea If we take φICS = 1 {CSR ,P 6⊆ CSNR } φICS →p 0 under CSR ,P ⊆ CS2S so then strong identication. Moreover Prβ0 {β0 ∈ CS2S } ≥ 1 − α − γ regardless of identication strength GMM Model Consider a potentially weakly identied GMM model with Eβ0 [gT (β0 )] = 0 1 P Linear IV: gT (β) = Zt (Yt − Xt β) T identifying assumption For convenience, continue to assume dim (β) =1 Let lim T →∞ Var ! √ T g (β) Σ (β) Σβ (β) √ ∂T = Σβ (β) Σββ (β) T ∂β gT (β) A Simple Two-Step Condence Set Robust test statistics Dene the S statistic of Stock and Wright (2000) as S (β) = T · gT (β)0 Σ̂ (β)−1 gT (β) . S (β0 ) →d χ2k regardless of identication strength Following Kleibergen (2005), let DT (β) = ∂ gT (β) − Σ̂β (β) Σ̂ (β)−1 gT (β) ∂β and dene the K statistic 1 K (β) = T · gT (β)0 Σ̂ (β)− 2 PΣ̂(β)− 12 D K (β0 ) →d χ21 Formal statement 1 T (β) Σ̂ (β)− 2 gT (β) regardless of identication strength A Simple Two-Step Condence Set Local asymptotic equivalence Under standard assumptions, for on an ecient estimator √ sup T kβ−β0 k≤C W (β) a Wald statistic based β̂ kW (β) − K (β)k = op (1) under strong identication Formal statement Thus, the K-statistic condence set β : K (β) ≤ χ21,1−α is asymptotically equivalent to the usual Wald condence set √ on T -neighborhoods of β0 More general statement given in paper A Simple Two-Step Condence Set Non-local non-equivalence Unfortunately, the equivalence of the K and Wald condence sets holds only locally, not globally K condence sets are often inconsistent, in the sense that even in strongly identied models they fail to shrink towards the true parameter value as the sample grows A Simple Two-Step Condence Set A consistent robust condence set To obtain a consistent condence set, for a > 0 consider CSR ,P = β : K (β) + a · S (β) ≤ χ21,1−α . K (β) + a · S (β) is a linear combination statistic, as in Andrews (2013) This condence set has coverage 1 − α − γ (a) = Pr (1 + a) · χ21 + a · χ2k −1 ≤ χ21,1−α regardless of identication strength γ→0 as level of γ a → 0, and we can choose a to obtain any desired A Simple Two-Step Condence Set Detecting weak identication For CSNR the Wald condence set, under strong identication Pr {CSR ,P ⊆ CSNR } → 1 To asses whether the usual strong-identication approximations are reasonable, can check whether CSR ,P ⊆ CSNR Motivates choice Gives φICS = 1 {CSR ,P 6⊆ CSNR } Pr {β0 ∈ CS2S } ≥ 1 − α − γ Outline 1 Introduction 2 Problems with the First-Stage F-statistic 3 A Simple Two-Step Condence Set 4 Simulation Performance 5 Reporting Results 6 Conclusion Simulation Performance Linear IV For comparability with my earlier simulation results, I set CSR ,P has coverage 85% Thus, CS2S has coverage at least 85% as well γ = 10%, so I again start by considering the homoskedastic case Simulation Performance Homoskedastic Linear IV Moderate Endogeneity k=5 k = 10 k = 20 k = 5 k = 10 k = 20 LIML 92.6% 92.4% 90.4% 86% 87% 85% 2SLS 92.8% 92.8% 92.9% 86% 87% 85% Minimal Coverage CS2S CS2S High Endogeneity Table : Minimal coverage for dierent condence sets in homoskedastic IV simulations with 10,000 observations. Simulation Performance Homoskedastic Linear IV We see that CS2S controls coverage distortions in the homoskedastic case Since procedures based on the rst-stage F statistic also work here, we can compare their performance In particular, we might worry that my distortions by setting φICS = 1 based on the rst-stage F This isn't the case CS2S controls coverage more often than procedures Simulation Performance Pretest comparison 1 SY LIML ICS LIML SY 2SLS ICS 2SLS 0.9 0.8 0.7 E[φICS] 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 Mean of First Stage F−statistic Figure : E [φICS ] = Pr {φICS = 1} plotted against the mean of the rst stage F-statistic in moderate endogeneity homoskedastic linear IV calibration with k = 10. Simulation Performance Heteroskedastic Linear IV Moderate Endogeneity Minimal Coverage CS2S LIML CS2S 2SLS CS2S CUGMM CS2S 2SGMM High Endogeneity k=5 k = 10 k = 20 k=5 k = 10 k = 20 94.1% 92.4% 93% 86.8% 86.8% 85.2% 93.7% 94% 94.3% 86.7% 86.6% 85.2% 95% 94.1% 93.4% 86.8% 92.6% 88.8% 94.5% 94% 93.9% 86.8% 92.8% 88.8% Table : Minimal coverage for dierent condence sets in heteroskedastic IV simulations with 10,000 observations. Outline 1 Introduction 2 Problems with the First-Stage F-statistic 3 A Simple Two-Step Condence Set 4 Simulation Performance 5 Reporting Results 6 Conclusion Reporting Results Picking γ 1 My discussion so far assumes a choice of the maximal distortion γ What if you don't like my 2 γ? When used at all in empirical work robust condence sets typically supplement, rather than replace, non-robust ones Reporting Results Pick some γmin ≥ 0. For γ ≥ γmin , consider the family of robust condence sets CSR ,P (γ) = β : K (β) + a (γ) · S (β) ≤ χ21,1−α where CSR ,P (γ) has coverage 1 − α − γ Reporting Results Note that decreasing in γ : γ ≤ γ 0 ⇒ CSR ,P (γ 0 ) ⊆ CSR ,P (γ) CSR such that CSR ,P (γmin ) ⊆ CSR Dene the maximal distortion cuto Consider γ̂ = min {γ ≥ γmin : CSR ,P (γ) ⊆ CSNR } Reporting Results CSNR , CSR , and γ̂ ∗ If I adopt the rule that I will use CSNR if γ̂ ≤ γ , and will use CSR otherwise, my condence set has coverage at least We can report 1 − α − γ∗ Reporting Results Income and Democracy Acemoglu, Johnson, Robinson, and Yared (2008) study the relationship between per-capita income and measures of political democracy They argue that once one controls for country xed eects, there is no signicant relationship Consider a number of specications, including one which instruments income with the trade-weighted income of a country's trading partners Cervellati, Jung, Sunde and Vischer (2014) argue that this zero eect nding masks heterogeneity across countries, and in particular that the relationship between income and democracy is negative in former European colonies I re-examine the identifying power of the trade-weighted income instrument Reporting Results Income and Democracy Specication I II III 2SLS -0.120 (0.11) -1.37 (23.0) -0.16 (0.07) First Stage F 26.5 0.0 57.6 γ̂ 20.2% 0% 57.9% S CS [-1,1] [-1,1] [-1,1] Sample Restriction None Non-colonies Former Colonies Observations 895 218 718 Table : Results based on Acemoglu et. al. (2008) data Conclusion I show that two-step condence sets based on the rst-stage F-statistic can be highly unreliable in non-homoskedastic data I propose an alternative approach to detecting weak identication and constructing two-step condence sets which Controls coverage distortions under weak identication and is applicable to general GMM models Indicates strong identication with probability tending to one under standard asymptotics While I've framed my discussion using linear combination condence sets, many other options also work Close connection between ecient robust condence sets and reliable assessments of identication strength The End Thank you! Formal Results Weak identication assumptions Assumption For JT (β) = ET 1 √ T X T t =1 vec h ∂ ∂β 0 gT i (β) , gt (β0 ) ∂ ∂β 0 gt (β0 ) ! − JT (β0 ) →d N Σg 0, Σθg Σg θ Σθ where Σg is positive denite and Σg Σβ g Σg β Σββ = lim T →∞ is consistently estimable. 1 VarT ,ξ √ T X T t =1 gt (β0 ) vec ∂β∂ 0 gt (β0 ) !! Formal Results Weak identication assumptions Assumption √ There exists sequence of full-rank O ( T ) normalizing matrices Λ1,T such that DT (β0 ) Λ1,T →d D for a (possibly degenerate) Gaussian random matrix D which is full rank almost surely Formal Results Identication-robust tests Theorem Under these assumptions, (K (β0 ) , S (β0 ) − K (β0 )) →d χ2p , χ2k −p and K (β0 ) and S (β0 ) − K (β0 ) are asymptotically independent Return Formal Results Strong identication assumptions Assumption 1 2 3 gT (β) →p limT →∞ ET [gt (β)] uniformly, and limT →∞ ET [gt (β)] is continuous in β. Σ̂g (β) →p Σg (β) uniformly for Σg (β) continuous in β and everywhere positive denite with a uniformly bounded maximal eigenvalue and a minimal eigenvalue bounded away from zero For all ε > 0 there exists δ > 0 such that 0 lim E [gt (β)] Ω (β) lim ET [gt (β)] < δ T →∞ T T →∞ only if kθ − θ0 k < ε. Formal Results Strong identication assumptions Assumption 1 2 3 β0 belongs to the interior of its parameter space gT (β) and Σ̂g (β) are almost surely continuously dierentiable on some open ball B (β0 ) around β0 For ∂ g (β) , J (β) = lim ET 0 T T →∞ ∂β J (β) is continuous at β0 , GT (β) = ∂β∂ 0 gT (β) →p J (β) uniformly on B (β0 ), and J (β0 ) is full-rank 4 supβ∈B (β ) 0 ∂ vec(Σ̂g (β)) = Op (1) ∂β 0 Formal Results Identication-robust tests Theorem Under these assumptions we have that for all C ≥ 0 √ Return sup T kβ−β0 k≤C kW (β) − K (β)k = op (1).