Path, Phat, and State Dependence in Observation-driven Markov Models

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 3

views

Report

Comments

Description

Download Path, Phat, and State Dependence in Observation-driven Markov Models

Transcript

Path, Phat, and State Dependence in Observation-driven Markov Models

Path, Phat, and State Dependence in Observation-driven Markov
Models
Robert W. Walker∗
Department of Political Science
Program in Applied Statistics and Computation
Washington University in Saint Louis
E-mail: [email protected]
July 18, 2007
Abstract
Many social science theories posit dynamics that depend in important ways on the present state
and focus on a reasonably small number of states. Despite the importance of theoretical notions
of path dependence, empirical models, with a few exceptions (Przeworski, Alvarez, Cheibub
and Limongi (2000); Beck, Epstein, Jackman and O’Halloran (N.d.); Epstein, O’Halloran, Bates,
Goldstone and Kristensen (2006)), have paid little attention to the implications of state dependence for empirical studies. This despite the fact that there are many possible ways in which
history might matter – we focus on the categorization given by Page (2006) – and these different ways that history might matter manifest themselves in sets of models that can be tested
and compared. This paper considers the basic properties of observation-driven Markov chains
[stationarity/time homogeneity, communication, transience, periodicity, irreducibility, and ergodicity] and the issues that arise in their implementation as likelihood estimators to provide
a window into methods for the study of path dependence. Application of these concepts to
longitudinal data on human rights abuses and exchange rate regime transitions provides evidence that history may also not exert uniform effects. The empirical examples highlight the
subtle substantive assumptions that manifest in different modeling choices. The human rights
example calls for an important qualification in the widely studied relationship between democracy and human rights abuses. The exchange rate regime example highlights the usefulness of
Markov models for multinomial processes.
∗I
thank Andrew Martin for encouraging me to write this down. All the usual disclaimers apply.
1
Introduction
Path dependence and the mantra that “history matters” are frequently invoked as explanations for
the evolution of political, social, and economic institutions. For example, Pierson (2000, 2004) is
primarily focused on describing a reasonable range for the claim that history matters and much of
the comparative historical-institutionalist school in political science, in one way or other, invokes
path dependence as an important causal force. Political methodology has primarily focused on
appropriate modeling practices for “temporal dependence” and the argument of Pierson (2000,
2004) is that this pays insufficient attention to the interaction of temporal sequencing and specific
inputs of interest.1 Our goal is to place these arguments in a broader framework for understanding
the important role of dynamics in statistical/econometric models.
We begin by briefly reviewing the range of arguments regarding the broad claim that “history
matters”. We first explore these arguments and the barriers that they present for the political
methodologist. Borrowing the typology of Page (2006) we then lay out the models that correspond
to different classes of such arguments and briefly comment on methods for distinguishing among
them paying central attention to the importance of the Ergodic Theorem. We then turn to the
development and characterization of one such model that facilitates a comparison of methods
and the substantive implications of modeling choices. We conclude with directions for continuing
this inquiry.
2
Path and State Dependence
A broad literature has invoked the general notion of path dependence as key to understanding
the temporal evolution of important political and economic phenomena. For example, economists
1 The
term “temporal dependence” is often used to describe a correction or the inclusion of a nuisance parameter
rather than rendering the temporal process itself an estimand of interest, particularly when divergences in nuisance
have important substantive consequences.
1
have long considered the path dependence of technology (e.g. BetaMax versus VHS, the QWERTY
keyboard). We begin with a brief bit of terminology that should make the discussion more clear.
Suppose that we observe some set of units (countries, individuals, firms) (i = 1, 2, . . . , N) at multiple discrete points in time (t = 1, 2, . . . , T) and that at any period in time, these individuals must
occupy one of a finite number of discrete states (denoted s j , ∀ j ∈ J). As a result, if individual i at
time t resides in state j, we can write yit = j. Our principal interest is in the temporal evolution of
some outcome defined on a set of discrete states.
Page (2006) clarifies the properties of three distinct forms of what are generally referred to as
“path dependent” processes – path dependence, state dependence, and “phat-dependence”. As
Page shows, there are fundamental differences in these three understandings of the temporal evolution of some process of interest. Though Page’s (2006) argument is primarily concerned with
a theoretical demonstration that the observation of Pierson (2000, 2004) and others that positive
externalities are an integral part of path dependence is largely mistaken, he provides an extraordinarily useful taxonomy for understanding and applying the mantra that “history matters” to
quantitative analyses. The key differences, according to Page, are whether or not sequencing matters and how much of the sequence may be relevant.2 Let us explore these processes in greater
detail before turning to empirical models that represent these processes. Two key themes will
emerge – the set of states in which the process has resided and their relative frequency and the
ordering of the “visits”.
First combining the set of states that we have visited and the ordering of such visits, a pathdependent process suggests that the present period realization of some variable of interest, call it
y(t) , depends on the precise evolution of y prior to the present period. Calling all prior realizations
of y a history and denoting said history (up to and including the present) h(t) , a path dependent
2 In the language of Markov models, the order of the chain describes how much of the sequence is relevant to the
question at hand.
2
process is formalized as
y ( t +1) = G ( h ( t ) ) .
For a path-dependent process, both the set of states in which the process has previously resided
and the order in which we resided in these states is of critical importance. While this is certainly
the most complete rendering of the way in which history matters, the mathematical burden of such
understandings can become quite substantial. To see why this must be true, a stylized example
should suffice. Suppose that the process of interest has been observed for t − 1 periods. It should
be straightforward to capture the t − 1 histories and appropriately model y(t) as a function of the
history h(t−1) . Now let us consider y(t+1) , that is now a function of h(t) . With a discrete state
space, the cardinality of the support of h(t) is bounded below by the cardinality of the support of
h(t−1) – strictly so if the distribution of y(t+1) |h(t) is non-atomic. This implies that any attempt to
empirically capture state dependence in an econometric setting must not only confront the fact
that the number of parameters are increasing in t, but that, for a fixed number of units, degrees
of freedom are a nonincreasing function of t. Such a form of historical determinism will often
imply that further observations of a process make the problem harder and harder to confront.
For example, supposing that we observe T realizations of the process and that these realizations
can be naturally classified into J states, there are J T potential histories. Though this may not
seem daunting, if one thought, for example, that democratization is a path-dependent process
and one extends the model of Epstein et al. (2006) to accomodate path-dependence, their threecategory measure of autocracy/partial democracy/democracy may result in a whopping 1.2 ×
1019 potential histories. With 200 countries and 40 observations per country, the ratio of potential
parameters to data exceeds 1 × 1015 .3 As an empirical model, path dependence is likely to be
3 Of
course, the actual number of histories is strictly bounded above by the size of the total sample so that this ratio
can never exceed one in practice (if every data points arises from a unique history, the ratio is one). To see this, note that
3
elusive except when there is remarkable coincidence in the paths.4 In this respect, Pierson (2004,
p. 78) is correct that “Historically oriented empirical work on sequences can build on a formidible
intellectual tradition. Taken as a whole, this literature quite effectively undercuts the claim that
the social significance of historical processes can be easily incorporated in the ‘values’ of particular
‘variables’ at a moment in time.” One of the simplest ways that this claim is true is that we
cannot observe the same variable if history truly matters because, though the history might be the
variable of interest, each present period history must be distinct from each prior period history by
the addition of a new set of realizations of the process. If this claim is indeed true, the statistical
analysis of path-dependent processes indeed presents a formidable, dare we say inconquerable,
task.
A phat dependent process, by contrast, emphasizes the set of states in which the process has
resided without attention to their order. More formally, Page (2006, p. 97) writes that “a process
is phat dependent if the outcome in any period depends on the set of outcomes and opportunities
that arose in a history but not upon their order” such that y(t+1) = G ({ h(t) }) with { h(t) } uniquely
identifying the distribution of states in prior realizations of y(t+1) . Three points merit consideration. If one were to write down the function generating path dependence, phat dependence would
be a simple restriction on this model because the function mapping path into phat dependence is
either one-to-one or many-to-one.5 Second, phat dependence, like path dependence, results in a
in the first period there are J potential initial conditions and that the number of unique histories at any subsequent time
point can at most be equal to the number of units. For 200 countries, period 5 is the first period in which the number of
histories can equal the number of units observed (in the fourth period, there are 34 or 81 potential unique histories and
in period 5, this number jumps to 243).
4 In the literature on fixed effects estimation, consistent estimators are often difficult to isolate for discrete state spaces
and the asymptotics that need be applied in this case are interesting in their own right. Because the number of histories
is not independent of the number of elapsed time periods, the asymptotic arguments necessary would (generally) be
in the number of units rather than the number of time points. Generally is used because coincident histories provide
some information about the probability of next period outcomes when outcomes diverge given the same history. This
result arises from work on conditional (fixed effects) logit estimation (see Wooldridge (2002, p. 491) for a discussion).
5 Page (2006, p. 97) writes, “Testing for phat dependence requires a different econometric model than testing for path
dependence.” While this is trivially true, phat dependence is nested in path dependence by the function governing the
translation of permutations into combinations. Phat dependence is a restricted form of path dependence where order
4
time-varying number of parameters. Suppose that we observe a process with 3 states in period
1. In the second period, the permissible paths will be {11,22,33,12,21,13,31,23,32} with accompanying phats equal to {11,22,33,12,13,23}. In the third period, there are 33 possible paths with 10
possible phats.6 In very general terms, because the frequency distribution of outcomes will be
time-varying, general phat dependence will have many of the same problems that result from full
path dependence. From an econometric perspective, we suggest a reduced form of phat dependence that may prove valuable for some interesting political problems; we will label this form of
dependence phat support dependence.
The general idea underlying phat support dependence is that it is the set of states a process
has visited and not the amount of time spent in any given state that is relevant. Page’s phat
dependence requires an implicit belief in some form of reinforcement dynamic while phat support dependence suggests that this reinforcement dynamic is either/or and thus, does not vary
depending upon the duration (number of periods) in a given state.
Because our primary interest is in the estimation of discrete state Markov processes, this process is justified because it reduces the number of parameters in a way that does not seem inconsistent with the nature of phat dependence. At the same time, it may also be that the range of
relevant applications for such a process is limited if the importance of recent history dominates
the importance of long past events. Of course, it is possible to use techniques for the testing of
nonnested models to discriminate among hypotheses about how history matters once the estimation problem becomes manageable. Revisiting again the example provided by Epstein et al.
(2006), there are seven possible unordered histories for a three-category characterization of autocracy(A)/partial democracy(P)/democracy(D) {A,P,D,AP,AD,PD,APD}. More generally, if we
does not matter.
6 There are six permutations that result in each state having been visited once (the same phat); 18 permutations where
one state appears twice and some other state appears once (with six accompanying phats), and three paths where the
same state appears in every period with three accompanying phats.
5
define the state space as having cardinality J, the number of potential phat support dependen cies induced is ∑kJ =1 CkJ = ∑kJ =1 kJ = ∑kJ =1 k!( JJ!−k)! . Underlying such a characterization is the
belief that having resided in a particular state is the relevant feature of the dependence. From the
standpoint of the range of relevant possiblities, this is a considerably easier problem. At the same
time, the process is not “dynamic” in any interesting way. A pure-phat support dependent process
takes no explicit account of the actual transitions that have taken place but only the states in which
the process has previously resided. It may matter whether we previously went from autocracy to
democracy and back to partial democracy instead of slowing moving from autocracy to partial
democracy to full-fledged democracy if democracy is self-reinforcing (positive externalities in the
language of Pierson) and that this particular pathway also creates negative externalities for regressions to autocracy/partial democracy. Let us write a “transition” matrix of phat dependence to
illustrate the underlying process,


π A,A
π A,P
π A,D


 π P,A
π P,P
π P,D


 π
π D,P
π D,D

D,A


P( phat) =  π AP,A
π AP,P
π AP,D


 π AD,A π AD,P π AD,D



π PD,P
π PD,D
 π PD,A

π APD,A π APD,P π APD,D


















(1)
where the rows are defined by the states that have been visited in some period prior to t and the
columns are the probabilities of residing in a particular state s j given the set of states that have
been previously visited, Pr (s j = j|{ ht−1 }). In effect, this becomes the equivalent of a fixed-effects
6
estimator where the fixed effects are the set of states previously visited.7
As Page (2006) notes, trivially, path dependent and phat dependent relationships will also be
state dependent. To accomplish this, define either the path or the phat as the state. Where this
observation takes force is in the realization that the key to statistical/econometric applications
is in properly characterizing the precise way in which history matters and that the appropriate
model is one of state dependence; the holy grail is defining the appropriate range of states that
matter. In widespread practice, a state dependent process emphasizes only the state of current
residence and neither the range of previous states in which the process has resided nor in the order in which the process resided in such states. In the language of Page (2006, p. 95), “a process
is state dependent if the outcome in any period depends only upon the state of the process at
that time” or equivalently that y(t+1) = G (st ) with s ∈ S.8 State dependence is the most often
examined, albeit frequently in a fairly limited form, form of dependence in the analysis of discrete
temporal processes. These limitations in part stem from data limitations. For example, the study
of Epstein et al. (2006) models transitions among autocracies/partial democracies/democracies
as a process with state dependence and state-dependent effects of covariates. Masson (2001) and
Masson and Ruge-Murcia (2005) consider the transition among exchange rate regimes as defined
by a state dependent process. The overabiding reason for reliance on state-dependent processes
involves infinite regress; if we do not constrain how far back “history matters” then we are left
with a saturated econometric study that models each realization in terms of its own unique history. Of course, drawing inferences from such processes – an observation common to qualitative
and quantitative researchers alike – would be strengthened by multiple identical histories to iso7 There is a lot more to say about this because fixed effects in an ordered setting can be identified in the interior,
though invariant histories that do not transition contribute nothing to the likelihood. For the multinomial case, the
results are less clear cut. The discussion of fixed and random effects estimators in Wooldridge (2002) is particularly
useful.
8 In his discussion, Page (2006) goes further to emphasize the distinction between initial outcome and recent and
early path-dependence to denote dependence on the first outcome, recent realizations prior to the immediate state, and
earlier histories up to some (long?) past point in time.
7
late the influence of inputs on outcomes, but few politically interesting processes allow for such
randomization/experimentation and we must make do with what we and others have observed.
And this places limitations (for reasons of degrees of freedom) on the degree to which history can
matter. In the language of Markov models, short-order state dependent processes are necessary in
a finite sample world.
In broad summary, there are three basic and interrelated types. Path dependence emphasizes
both experience and ordering of the entire process while phat dependence emphasizes only the
experiences that arise with residence in a particular state without regard to the timing of such
residence. In both such cases, the number of potentially relevant parameters is increasing in time
and this is likely to pose insurmountable problems for empirical practice. Phat-support dependence and state dependence are less prone to this profusion of parameters and can be thought of
as path dependence where only the previous steps we have taken or the/some immediately prior
step(s) matter(s). With these ideas about the general types of temporal dependence that may be
of interest, we turn to statistical/econometric methods for studying discrete temporal processes
with a view toward characterizing their relevant properties and empirically identifying the forms
of state dependence in time series of qualitative variables via transition processes.
3
Markov Processes
In this exposition, the primary variable of interest yt is a discrete variable observed at multiple
time points, t ∈ T.9 For the sake of simplicity in demonstration, we assume a three category
(1)
(2)
ordered process such that yt−1 is equal to one of yt−1 = 1 ∴ ~yt−1 = {1, 0, 0}, yt−1 = 2 ∴ ~yt−1 =
(3)
{0, 1, 0}, or yt−1 = 3 ∴ ~yt−1 = {0, 0, 1}. The relevant Markov matrix P can be specified as,
9 A note on notation is in order. When discussion properties of the transition matrix, subscripts will define states
while superscripts will denote subjects and/or time where necessary for clarity. When discussing variables and their
realizations, we will comply with standard practice and utilize subscripts to index units i and time t when necessary
for clarity.
8

 π11 π12 π13

P = 
 π21 π22 π23

π31 π32 π33






(2)
where rows are defined by yt−1 and columns are defined by yt . This requires that Pr (yt = m|yt−1 =
l ) = πlm . With this information and given some previous state ~y, p(S j )t = ~yP. For example, were
yt−1 = 3, p(S j )t = {π31 , π32 , π33 }. If one wished to know the probabilities at time t + 1 given state
( j)
j at time t − 1, one could write p(S j )t+1 = (~yt−1 P)P. With this in mind, we can define some of the
properties of Markov models [and the associated Markov processes].10 The first two properties
to define will be of central relevance for the substantive problems to be examined (Definitions 3.1
and 3.2), while the conditions arising from the remainder will be useful in defining what is often
the central quantity of interest – the invariant distribution.
Definition 3.1. A Markov chain will be said to have stationary transition probabilities, or to be time
homogeneous, if for all states yt = m and all previous states yt−1 = l
πlm ⊥ t
Definition 3.2. A Markov chain will be said to have homogeneous transition probabilities, or to be
unit homogeneous, if for all states yt = m and all previous states yt−1 = l
πlm ⊥ i
10 The
following definitions are adapted from a combination of Bartoszyński and Niewiadomska-Bugaj (1996), Bhattacharya and Waymire (1990), and Amemiya (1985). Lindsey (2004, ch. 5) provides a particularly intuitive approach to
estimating very basic Markov chains.
9
Underlying the above conditions is that a unique stochastic Markov matrix that does not explicitly depend on a function of the nominal units or time describes the transition among states
in S.11 As we will illustrate, the precise form of unit homogeneity is such that allowing the probabilities to depend on observed covariates will not present particular problems, so long as the
mappings between covariates and probabilities do not depend on the unit for which the mapping
is posited.12 The same arguments apply to time homogeneity. The existence or uniqueness of
the invariant distribution does not critically depend on the presence or absence of time-varying
covariates so long as the mappings are not time-variant.13
Definition 3.3. A set S of states with cardinality J is closed if for every si with i ∈ J, we have
∑ πij = 1
sj ∈ J
Intuitively, this implies that every (one-step) transition results in a move among discrete states
in S and the logical extension of this condition, by induction, implies that it is impossible to leave
a closed set of states. Even though, it is impossible to leave a closed set of states, we should
also be concerned with the nature of transitions among states within the closed set. Two relevant
properties, in this regard, are the communication among states and the reducibility of the Markov
chain (taken as a whole),
Definition 3.4. A Markov chain is said to be irreducible if the only closed set of states is the set
(m)
of all states S. Moreover, states si and s j are said to communicate if πij
(n)
> 0 and π ji > 0 where
i, j ∈ J and m, n ∈ Z.14
11 Kelton and Kelton (1984) describe tests based on aggregate data on the population residing in given states at discrete
times to test hypotheses of stationarity and homogeneity.
12 Even this does not pose particular problems, but the range of inferences would be limited to a forecast for each unit
drawn from their unique parameters.
13 Though the stationarity of the counterfactual also matters. We will be more precise about this later.
14 This property might also be labeled joint accessibility in the sense that state s is accessible from state s in some
j
i
10
Communication, within a closed set of states, describes the range of permissible transitions
among states. For our purposes, a Markov chain is irreducible if and only if all states communicate.
In effect, this simply means that any state can be reached from any other state in some finite
number of moves (or that the invariant distribution has all nonzero probabilities). We have a final
property to define before we state the main theorem from which the analysis proceeds.
(n)
Definition 3.5. State si is periodic if there exists some d > 1 such that pii > 0 implies that
n
d
∈ Z.
If no such d > 1 exists, a state is said to be aperiodic.
A combination of the aforementioned definitions yields an important result regarding the ultimate estimand of interest for many Markov models – the invariant distribution. Following a
statement of the theorem and a brief discussion of estimation and testing procedures, we will
demonstrate that the substantive invariant distribution based on an estimated Markov matrix that
may itself be heterogeneous and nonstationary can yield a stationary and homogenous invariant
distribution for counterfactual inference.
Theorem 3.1. Let P= [ pij ], i, j = 1, . . . , J be the transition probability matrix of an aperiodic and irreducible stationary Markov chain with a finite number of states J. The limits
(n)
lim pij = u j
n→∞
exist for every 1, . . . , J and are independent of the initial state si (t0 ). Furthermore, the vector uj – the
invariant distribution – satisfies the linear system of equations
uj =
∑ u j p jk
k = 1, . . . , J
j
number of steps, m, and state si is accessible from state s j in some number of steps, n where m and n need not be the
same.
11
where ∑ j∈ J u j = 1.
The proof is given in any number of probability/stochastic processes texts including Bartoszyński and Niewiadomska-Bugaj (1996) and Bhattacharya and Waymire (1990). We shall sketch
the principal inputs – aperiodicity and irreducibility – before turning to our principal interest in
the estimation of Markov models for studying dynamic discrete processes.15 First, aperiodicity is
required if we pay particular attention to the role of the initial state. In simple terms, periodicity
leads to divergence rather than convergence. For example, suppose, given the above definition
of periodicity and assume that d = 2, that we begin parallel chains in the periodic state at time
t and time t + 1. At time t + 2, the probability that the first chain resides in the periodic state is,
of necessity, greater than zero while the probability that the second identical chain resides in the
periodic state is zero. In the limit, the first chain will reside in the periodic state only when t is
even while the second chain can only reside in the periodic state when t is odd. Irreducibilility
plays a key role to prevent two parallel chains from never overlapping. To be concrete, suppose
we start parallel chains in two states that do not communicate. With two (or more) closed sets in
the state space, the probability of the two chains reaching the same point is zero. It may be that
within each closed set of states, there is a unique limiting distribution, but these limiting distributions will themselves apply to separate closed sets within the state space and there can be no
unique invariant distribution that is independent of the starting point of the process.
The issue, such as it is, in application of the ergodic theorem to the study of time-series-crosssection data primarily rests on the parameters, not the data. For example, Amemiya (1985) describes two studies, Boskin and Nold’s (1975) study of welfare and Toikka’s (1976) study modeling the probability of transitions in labor force participation. In the former case, the model is
heterogeneous and stationary while the latter examines labor force participation as a function
15 Stationarity
should be obvious as the transition matrix then depends on the index with which limits are taken. The
method of intuitive illustration follows the concept of coupling and the conditions are illustrated in terms of decoupling
parallel chains.
12
of time-varying covariates as a homogenous and nonstationary Markov model. However, neither
model embodies parameters that render the model nonstationary or heterogeneous, the covariates
are the source of these dynamics. As a result of this fact, if we are simply interested in forecasting
the invariant distribution at two different values of any given covariate, taking the mean of other
covariates or some other relevant scenario as given, the ergodic theorem may be applied and the
invariant distribution u j may be summarized ∀ j ∈ J. Put differently, taking the covariates as inputs in the absence of time or unit varying parameters, the resultant Markov matrix to be utilized
for prediction of the invariant distribution is often both time and unit homogenous.16 As a result,
the invariant distribution exists and is independent of the initial state when forecasted sufficiently
far into the future. As we will see, sufficiently far is often quite a small number of time periods
into the future. Furthermore, there is a fundamental distinction between covariates that exercise
an influence on a particular transition probability and on their resultant impact on the invariant
distribution. In applied practice, these two arguments are often conflated because they fail to recognize the distinction between a stochastic process governed by a stochastic transition matrix and
the long-run distribution implied by such an effect.
4
Constructing the Markov Model
To construct a Markov model for ordered data, we must first define an ordered dependent variable.17 Let this dependent variable be y arising from realizations of a set of states S that has
cardinality J with discrete realizations j ∈ J. Amemiya (1985, p. 292: Definition 9.3.1) defines the
16 This is not always the case in the presence of deterministic relationships among covariates, but this can often be
corrected with an appropriate counterfactual. For example, suppose that both GDP per capita and the rate of change in
GDP per capita are posited determinants of some set of transitions. If the rate of change in GDP per capita is not set to zero,
the step-ahead transition matrix must account for the fact that some nonzero growth rate is posited for one of the key
inputs. In such a circumstance, the transition matrix is then nonstationary because assumptions about the growth rate
imply changes in the base covariate that should manifest in a changing transition matrix.
17 To make things conformable everywhere, let us assume that data (denoted by Latin letters) arrive in row vectors
and parameters (denoted by Greek letters) arrive in column vectors.
13
ordered model as,
Definition 4.1. The ordered model is defined by
Pr (y = j|x, θ ) = p(S j )
for some probability measure p depending on x and θ and a finite sequence of successive intervals
{S j } depending on x and θ such that
S
j
S j = R, the real line.
Define y∗ as an unobserved latent variable, x as a k-vector of fixed and predetermined covariates, and θ as a set of parameters to be estimated. The unobserved latent variable y∗ is composed of a systematic and a stochastic component. Define the systematic component as y∗ = xβ
and the stochastic component as e such that β is folded into θ in Definition 4.1. We must now
define a mechanism for differentiating the categories. Define J + 1 threshold parameters, ~τ ∈
{τ0 , τ1 , . . . , τJ }, where the elements in ~τ are a strict order (∀ j ∈ J, τj > τj−1 ) with τ0 = −∞ and
τJ = ∞.18 We can now write the probability that y = j given some well behaved density f as,
Pr (y = j|xβ) = π j =
Z τj
f (y∗ |xβ)dy∗ .
(3)
τj−1
Define
yj =



1
if y = j


0
otherwise,
(4)
and ~y as the concatenation of y j for all j ∈ J.19 . Because f is a proper density, π J = 1 − π1 − . . . −
18 Strict equality is required to ensure that p is not degenerate. If β were to contain a constant, ~
τ would be a J − 2
vector and each element would be equal to the appropriate equivalent element in ø minus the estimated constant. Our
notation is similar to that of Pratt (1981) but the threshold parameters are a reversed strict order; Pratt’s τ0 = ∞.
19 When we refer to ~
y for a specific j, we will denote it ~y( j)
14
π J −1 , we can write the likelihood L as,
J
L( β|y, x) =
∏ π1 1 · π2 2 · . . . · (1 −
y
y
i =1
J −1
∑
πm )y J .
(5)
m =1
The log likelihood can be formed from (5) and insertion of the link function and thresholds generating
J
ln L( β, τ | y, x) =
∑ ∑ ln[ F(τj − xβ) − F(τj−1 − xβ)]
(6)
j =1 y i = j
which is globally concave if f is positive and ln f is concave (Pratt 1981).20 As a result, assuming
that each ordered category is observed, concavity of the log likelihood is assured for the logistic
and normal distributions, and thus the ordered logit and probit models. With the basics of a
regression model for ordered data in mind, we can turn to the definition of a Markov process.
4.1
The Simplest of Ordered Markov Models
Maddala (1983, p. 57) writes down a simple Markov model such as
yit∗ = xit β + ~yi,t−1~α + eit
(7)
that appreciates the dependence of observed yit on previous states ~yi,t−1 .21 Having written the
model, we can explore the implicit assumptions.
This model yields a single equation estimator with k + J − 1 unknowns yielding the following
20 To gain an intuitive sense, note that the first category reduces to ln F ( τ − xβ ) and the J th category reduces to
1
ln[1 − F (τJ −1 − xβ)].
21 A model of this sort is estimated by Hafner-Burton (2005b). It would be inconsistent with an ordered level of
measurement to map yi,t−1 into yit using a scalar multiplier because the very definition of order requires that we not
know the size of the intervals, only their implicit order (Bartoszyński and Niewiadomska-Bugaj 1996, p. 464–65). As a
result, we must utilize the J − 1 dimensional vector ~yi,t−1 . Identification then requires that the dimension of ~α be J − 1
and that the elements correspond to those in ~yi,t−1 .
15
conclusions: (i) the effects of x are independent of the prior state (~yi,t−1 ) and (ii) the latent scale
measure of S j for all J does not depend on the prior state. The first point should be obvious as β is
a scalar, assuming that k = 1, or a k-vector such that there is a single linear mapping from x into
y∗ . The second point will be explored later, but it is useful to note that Diggle, Heagerty, Liang
and Zeger (2002, p. 201) write that a saturated model of the transition matrix can be obtained
by separately fitting an ordered response model to conditional on each of the prior states. They
continue (on p. 203) to argue that the equation can be rewritten as a single, although somewhat
complicated, regression equation using the binary decomposition of the lagged dependent variable (and possible interactions between the prior state and x). What remains to be seen is whether
this is universally valid because avoiding the separate fitting of each equation (with possible constraints on parameters) pools the cutpoint estimates and whether such pooling is justified seems
to be an empirical question. That said, it is straightforward to estimate each equation separately
and avoid this potential complication. With these ideas in mind, let us turn to the specification of
state-dependent effects of x.
4.2
State Dependent Effects in Ordered Markov Models
Amemiya (1985, p. 422) builds on (7) to write the equivalent of
yit∗ = xit ( β + γ j ) + ~yi,t−1~α + eit
(8)
that appreciates the dependence of observed yit on previous states ~yi,t−1 and the potential that x
effects y∗ at time t differently depending on the prior state. To illustrate, let us consider a three
state model of dyadic relations consisting of peace, disputes, and armed conflict. A considerable
literature in international politics says that democracy need not necessarily inhibit the transition
from peace to dispute, but that democratic dyads are quite unlikely to transition from disputes to
16
war. In this case, then, the effect of joint democracy is likely to be zero when considering the transition from peace to disputes, could be zero or negative in the transition from disputes to peace,
but should be large and negative when influencing the probability that disputes transition into
wars. Practically speaking, there are two obvious methods for estimating such a model. Diggle
et al. (2002, p. 202–3) suggest a parameterization based on classifying the state vector at time t − 1
in a cumulative fashion and utilizing these as the basis for capturing the intercepts for the prior
states and the possibility that covariate effects depend on the prior state.22 This method renders
zero null hypothesis tests of the interacted coefficients referenced by the adjacent lower category
and tests of whether or not categories can be combined by an examination of whether or not the α
parameters attached to the elements of ~yC are statistically differentiable from zero.23 The parameterization above employing γ j that only involves J − 1 prior states and their interactions allows
zero null hypothesis tests against the omitted prior state.
4.3
A Trivial Extension to Nominal Scales (or Formalizing the Likelihood)
It is straightforward to generalize (5) to the case of a multinomial likelihood as follows.24 Define the likelihood L as a function of the probability of transition pijk (t) from state j to state k for
individual i at time t as
L =
∏ ∏ ∏ ∏ pijk (t)
t
22 To
i
j
k
yij (t−1)yik (t)
i
· ∏ ∏ pij (t0 )y j (t0 )
i
(9)
t
be precise, for a J category ordered variable, define (suppressing i and t) ~yC = {y1C . . . yCJ−1 } such that y1C = 1 if
and only if yt−1 ≤ 1 and so on until yCJ−1 = 1 if yt−1 ≤ J − 1. The superscript C is used to denote the cumulative prior
states.
23 Epstein et al. (2006) utilize these tests to defend the collapsing of partial-autocracies and partial-democracies. We
suspect, but without their data cannot prove, that the assumption that the cutpoints can be pooled may weigh heavily
on these determinations. It remains for future research to sort this out.
24 This seems unnecessary because the ordered likelihood is the same, our presentation above has made this necessary,
but this should be synthesized for brevity.
17
which can be factored into two important parts – the conditional likelihood (to the left of ·) and the
initial conditions of the state vector (to the right of the ·). To be clear, for the stationary transition
matrix, the likelihood can be rewritten as
L =
∏ ∏ ∏ ∏ pijk
t
i
j
yij (t−1)yik (t)
· ∏ ∏ pij
i
k
yij (t0 )
(10)
t
and the stationary and homogeneous transition matrix can be similarly written (omitting superscript i for the probabilities) as
L =
∏ ∏ ∏ ∏ p jk
t
i
j
k
yij (t−1)yik (t)
i
· ∏ ∏ p j y j ( t0 )
i
(11)
t
Methods of estimation have been widely studied for such models, though the most widely employed are effectively maximum likelihood techniques under the general rubric of generalized
linear models and GEE techniques.25 The former retain all of their optimal properties if indeed
the likelihood is properly specified while the latter retain optimal properties in the presence of
random intercepts employing population-averaged GEE techniques.26 Bayesian analogs relying
on Markov Chain Monte Carlo techniques are also straightforward by analogous reasoning leading to a form of MCMCMC27 estimation. The extension of state-dependent covariates follows the
same interactive logic previously presented.
25 See
Diggle et al. (2002, p. 192–204) for a detailed discussion. Technically, because we are conditioning on the
initial conditions, these are conditional maximum likelihood estimates but the conditional likelihood is identical (when
controlling for initial conditions) to the functions optimized by widely available statistical software.
26 Consistent with the reasoning in Freedman (2006), Diggle et al. (2002, p. 200) suggest that the closeness of robust
and asymptotic covariance matrix estimates provide evidence regarding the robustness of the Markovian assumption.
27 Technically, the appropriate description is Markov Chain Monte Carlo estimation of a Markov Chain.
18
4.4
Testing Markov Models
Leon and Tsai (1998) present formal asymptotic results and evidence of the adequacy of test statistics for quasi-likelihood estimators of Markov regression models in the spirit of Zeger and Qaqish
(1988). Though their simulation results are limited to four relatively small samples (60, 75, 100,
and 150) and do not contain an explicit panel structure, the evidence suggests that for all but the
smallest of samples, quasi-score, quasi-Wald, and quasi-likelihood estimates (defined in the traditional ways using the quasi-likelihood instead of the likelihood) have similar power and are
generally of appropriate size with the standard χ2 distribution with degrees of freedom equal to
the number of imposed restrictions.28 The important point for applied purposes is that the standard array of test statistics can be utilized in ways that are similar to their use for other classes of
models. With models constructed and tests defined, we can turn to applications of these models
to ordered and multinomial time series.
5
Application: Human Rights Abuses
Previous studies of the dynamics of human rights abuses have focused on the importance of
democracy in limiting the level of human rights abuses. A burgeoning literature on the determinants of human rights abuses29 is fundamentally concerned with an ordered dependent variable
– the Political Terror Scale. Indeed, the relationship between democracy is arguably so robust that
that Davenport (2007) has written of a domestic democratic peace analogous to the democratic
28 In
the smallest sample (60), the quasi-score test seems to have the best properties.
incomprehensive list would include works by Apodaca (2001), Boswell and Dixon (1990), Bueno de Mesquita,
Downs, Smith, and Cherif (2005), Bueno de Mesquita, Morrow, Siverson and Smith (2003), Cingranelli and Richards
(1999), Davenport (1995, 1996a, 1996b), Davenport and Armstrong (2004), Fein (1995), Gartner and Regan (1996),
Hafner-Burton (2005a, 2005b), Hafner-Burton and Tsutsui (2005), Hathaway (2002), Henderson (1991), Keith (1999,
2002), McCormick and Mitchell (1997), McKinlay and Cohan (1975), McKinlay and Cohan (1976), Mitchell and McCormick (1988), Meyer (1996), Neumayer (2005), Park (1987), Poe, Carey and Vazquez (2001), Poe, Milner and Leblang
(1999), Poe and Tate (1994), Poe, Tate and Keith (1999), Richards, Gelleny and Sacko (2001), and Zanger (2000), among
others.
29 An
19
peace that suggests the absence of warmaking among democratic societies. Our interest is in characterizing the conditional impact of democracy in a Markovian framework.30 Here we simply
restate and extend the results to a discussion of the invariant distribution and compare multiple
specifications.
5.1
Defining Human Rights
Many existing studies of human rights and democracy, including Poe and Tate (1994), Poe, Tate
and Keith (1999), Davenport and Armstrong (2004), and Bueno de Mesquita, Downs, Smith and
Cherif (2005) utilize the Purdue University “Political Terror Scale” (PTS) a five-category ordinal
scale that measures human rights abuses from lowest to highest, according to increasing levels of
imprisonment, torture, execution, disappearance, and more general forms of political terror. State
Department and Amnesty International Reports are encoded according to five basic criteria that
are reported in Table 1.
While a number of distinct policy choices by governments are included in the evaluation, the
resulting scale is ordered on a single dimension. At Level 1 and Level 2, a fundamental respect
for life remains, though liberties may be circumscribed. By contrast, Level 331 is increasing in
brutality but still largely influences the political sphere; Level 432 expands the numbers and the
deprivation of life is widespread. Level 533 represents widespread and arbitrary societal violence. Though numerous mechanisms are commonly argued to describe the relationship between
democracy and human rights abuses, it is not at all clear how the simple alteration of political in30 For
a detailed development of the theory that justifies this effort, see Walker (N.d.).
receiving this rating at least 12 times during the sample period from one or both of the State Department and Amnesty International include Albania, Bahrain, Bulgaria, Bangladesh, Chile, China, Ecuador, Egypt, Haiti,
Honduras, Jordan, Mexico, Morocco, Paraguay and Syria.
32 Countries receiving this rating at least 12 times during the sample period from one or both of the State Department
and Amnesty International include Brazil, Guatemala, India, Indonesia, Iran, Pakistan, Peru, the Philippines, South
Africa, Sri Lanka, Turkey, and Uganda.
33 Countries receiving this rating at least 8 times during the sample period from one or both of the State Department
and Amnesty International include Algeria, Angola, Colombia, Guatemala, Iran, Rwanda, Sri Lanka, and the Sudan.
31 Countries
20
Level 1: “Countries . . . under a secure rule of law, people are not imprisoned for their
views, and torture is rare or exceptional . . ., political murders are rare.”
Level 2: “There is a limited amount of imprisonment for nonviolent political activity.
However, few persons are affected, torture and beating are exceptional . . . political
murder is rare.”
Level 3: “There is extensive political imprisonment, or a recent history of such imprisonment. Execution or other political murders and brutality may be common. Unlimited detention, with or without trial, for political views is accepted . . .”
Level 4: “The practices of Level 3 are expanded to larger numbers. Murders, disappearance are a common part of life. . . In spite of its generality, on this level terror affects
primarily those who interest themselves in politics or ideas.”
Level 5: “The terrors of Level 4 have been expanded to the whole population. . . The
leaders of these societies place no limits on the means or thoroughness with which
they pursue personal or ideological goals. . .”a
a Source:
Poe and Tate (1994: 867); Gastil (1980), in original.
Table 1: The Political Terror Scale
stitutions in societies with extreme levels of repression should change much of anything because
it is difficult to envision how such institutions can . Though the particular language varies among
studies, there are four primary pathways: (i) democratic institutions increase the costs of repressive actions by providing elections as mechanisms to sanction repressive leaders; (ii) democracy
is supported by a complex web of values that are challenged by the use of repressive instruments;
(iii) democracies provide alternative dispute resolution mechanisms that weaken the justification
for resorting to force for redressing grievances; (iv) international norms and the universal jurisdiction principle of international law allows third parties to prosecute human rights violations and
democracies are more likely to adopt international laws governing at least some human rights
(Hawkins 2003, Landman 2005). In each of these pathways, the relationship is argued to be unconditional, save possibly the second. Certainly, arguments that democratic institutions increase
21
the costs of repression, provide alternative dispute resolution mechanisms (and presumably make
the use of such mechanisms more attractive), and the principle of universal jurisdiction in international law should not be altered by the immediate prior status quo level of repression. Only the
second – democracy is supported by a complex web of values that are challenged by the employment of repressive tools – would induce variation in the effect of democracy on repression because
the prior employment of repression would seemingly undermine the establishment of such values
or provide evidence that such values have insufficiently taken hold. In either case, at least three
of the four mechanisms proposed in the extant literature would be undermined by the finding
that the influence of democracy on human rights abuses depends on past history. With this observation in mind, we turn to constructing a dataset for evaluating the claim that the influence of
democracy depends on past history.
5.2
Data
We employ the Polity IV indicator of democracy ranging from zero to ten (Marshall and Jaggers
2002). Polity measures authority patterns and the Polity IV Democracy measure is a composite
of variables measuring the openness and competitiveness of executive recruitment, the degree of
constraint on the exercise of executive authority, and regulations on political participation. While
other measures are available, this measure is chosen because it is widely used in the literature,
facilitating appropriate comparisons with past research, and because it is available for the broadest
array of countries through time.34
To fully specify the model, we have scoured prior literature for additional controls. In general, the literature suggests that economic development, economic growth, the size of a country’s
population, the change in the population, and involvement in civil and international wars are
34 The
companion paper shows similar patterns for numerous operationalizations involving Polity and indicators
from Freedom House, Vanhanen, Cheibub and Gandhi.
22
important determinants of human rights abuses. Relying on the International Monetary Fund’s
International Financial Statistics, we are able to measure economic development and growth and
the size of a country’s population (from which it is straightforward to calculate the change in population). Furthermore, the PRIO/Uppsala project has measured civil wars, internationalized civil
wars, interstate wars, and other forms of violent conflict (Gleditsch, Wallensteen, Erikson, Sollenberg and Strand 2002). To follow previous literature, we employ their measures of interstate, civil,
and internationalized civil war. The resulting model is remarkably similar to that estimated by
Davenport and Armstrong (2004). The resulting dataset contains over 3300 observations between
1976 and 2003 on 180 countries.
The tests that we conduct will utilize two basic models and will incorporate three primary
forms of tests. The first model is a Markov ordered probit that is a single equation estimator
including, as regressors, measures of the prior state and interactions between the prior state and
democracy – the argued source of state-dependent effects. The second models rely on independent
ordered probit models applied to each of the prior states and a constrained variant of this model
that collapses coefficients that do not vary with the prior state. As we shall see, characteristics of
the invariant distribution depend, at least somewhat, on this modeling assumption. Finally, we
utilize a Bayesian Markov Chain Monte Carlo technique to estimate the single equation ordered
probit model. The advantage of the MCMC techniques (because of the ergodic theorem) is that we
can draw from the invariant distribution of the regression parameters of interest; because the key
issue in assessing state-dependent effects involves sums of random variables, it is extraordinarily
valuable to be able to draw random samples from the relevant posterior and simply characterize
their sum.35 Before turning to the estimates, we report the Markov matrix to be modeled in Table
2.
35 Though
we do not report them here, it also allows one to characterize the inherent uncertainty in estimates of the
transition matrix in a much more natural way.
23
1
2
Amnesty International
Political Terror Scalet−1
3
4
5
Total
1
559
78.51
116
10.83
7
0.82
1
0.23
0
0.00
683
21.18
Amnesty International
Political Terror Scalet
2
3
4
133
19
1
18.68 2.67
0.14
706
218
28
65.92 20.35 2.61
226
493
112
26.62 58.07 13.19
21
126
238
4.76 28.57 53.97
3
6
61
1.99
3.97 40.40
1,089 862
440
33.78 26.74 13.65
5
0
0.00
3
0.28
11
1.30
55
12.47
81
53.64
150
4.65
712
100.00
1,071
100.00
849
100.00
441
100.00
151
100.00
3,224
100.00
Table 2: The Baseline Markov Matrix – (Row Percentages below raw cell frequencies)
Table 2 makes clear that there is substantial first-order state dependence in measures of the
Political Terror Scale and that this variation differs in a fashion that depends on the prior state.
Over 78% of countries receive the lowest score continue to do so in the next period.36 By contrast,
66% of countries that receive the second highest score continue (in the previous period) at that
level in the present period. The third state is persistent at just under 60% while the fourth and
fifth prior values self-replicate at a rate of about 54%.
5.3
Estimates: Part I
The far left column of Table 5 displays a model with full pooling of the coefficients and the cutpoints and we relax these constraints across the columns of the Table 5. We briefly discuss the
results and their implications. Overall, the model in the first column appears to fit well; the model
36 Indeed, the Netherlands, New Zealand, Australia, Sweden, Norway, and ???
never receive any score other than the
lowest throughout the sample period. It may be that the presence of these countries in the sample generates a form of
pooling bias, though an investigation of this question lies beyond our present scope.
24
χ2 statistic is statistically differentiable from zero to the level of computer precision; the cutpoints
form a strict order and the variables capturing the prior states are also a strict order implying a
strong form of state dependence. Given our substantive interest, we first examine the top of the
first column.
There is clear evidence that democracy manifests state dependent effects. To uncover the precise magnitude of these effects for each prior state, it is necessary to pay attention to the additive
sum of the effect of democracy and its interaction with the prior state. For example, the estimate
of -0.23 implies that a one-unit change in the democracy score decreases the latent scale by -0.23
given prior residence in the first state (the lowest possible level of prior human rights abuses).
Prior residence in the second state yields a total effect of (-0.228 + 0.139) 0.089 and a Wald test that
this linear combination equals zero is statistically differentiable to the level of computer precision
(χ2(1) = 20). Democracy decreases human rights abuses conditional on having been at the second
lowest level in the prior period, but the effect is considerably mitigated. The effect is further mitigated when conditioned on residence in the third state in the prior period but is still differentiable
from zero at conventional levels (χ2(1) = 5.96). Turning to estimates conditioned on previous residence in state 4 or 5, the absolute value of the interaction term exceeds the baseline effect. As a
result, we should expect democracy to have no effect on human rights abuses, conditional on a
high past level of abuse. Indeed, Wald tests confirm this fact (χ2(1) = 1.52 and χ2(1) = 0.76) leading
to evidence that democracy only reduces the level of human rights abuses at low levels of prior
abuse. Substantively, in societies that are not characterized by high levels of human rights abuses,
democracy can improve the human rights record, but in those most in need of “democratic pacification” – those with the highest levels of past abuse – democracy exercises no pacifying effect.
To scholars normatively committed to democratization as a mechanism for reducing repression,
this evidence casts doubt on the basic presumption that institutional democracy will improve the
human rights record without regard to the immediate history.
25
To render the conditional effect of democracy on past history more clear, we reestimate the
model using Markov Chain Monte Carlo estimation of an ordered probit model, using MCMCpack by Martin, and Quinn (2006), so that we can plot the posterior densities of the state dependent
effects. The density of these estimates are depicted in Figure 1. The baseline effect (not shown)
is estimated to be 0.14 (not surprisingly, given the probit link it is 1.6 times smaller than the estimate in the Table) and the top two panels are also clearly different from zero and negative. This
is consistent with the claim that democracy reduces human rights abuses. The bottom two panels
tell a very different story. The median of both densities lies to the right of zero indicating weak
evidence that democracy may worsen a nation’s human rights record. Certainly, it is hard to argue
for democratic pacification in the presence of an immediate past history of widespread repression.
Given prior literature and the almost universal finding that democracy decreases human rights
abuses, this result provides an important counterweight and suggests that many of the extant
mechanisms linking human rights and democracy may be flawed.37 With this evidence in mind,
we turn to a further exploration of Table 5
Concluding the discussion of this column, the remaining covariates perform in ways consistent with prior literature. Countries with higher levels of per capita GDP tend to be less repressive;
countries with larger populations tend to be more repressive. GDP Growth tends to reduce human rights abuses (though the substantive size of the effect is rather small) and Population Change
exhibits no clear influence. Not surprisingly, Civil Wars significantly worsen the human rights
records of states, while Internationalized Civil Wars also worsen human rights records, but with
less than half the magnitude. International Wars exhibit no clear statistical impact. With these results in mind, can first examine the robustness of this specification, before turning to estimates
of the long-run distribution of outcomes given our description of their existence according to the
37 Elsewhere, I assess the robustness of this claim to different operationalizations of the effect and different measures
of both democracy and repression; the story does not change.
26
Figure 1: MCMC Ordered Probit – State Dependent Effects of Democracy on Human Rights
β + γ3
30
20
0
10
Density
30
20
0
10
Density
40
β + γ2
−0.08 −0.06 −0.04 −0.02
−0.08
0.00 0.02
N = 1000 Bandwidth = 0.002505
β + γ4
β + γ5
8
0
5
4
10
Density
20
12
N = 1000 Bandwidth = 0.002169
0
Density
−0.04
−0.04
0.00
0.04
0.08
−0.10
N = 1000 Bandwidth = 0.00327
0.00 0.05 0.10
N = 1000 Bandwidth = 0.006125
27
ergodic theorem.
The second through sixth columns display estimates without assuming any form of pooling
and all covariates are allowed to have different effects for the model representing each prior state.
Though most of the effects are consistent with respect to sign, there are a few notable discrepancies. For example, the effect of GDP per capita is negative and statistically differentiable from zero
for low levels of prior repression, but becomes statistically differentiable from zero and positive
at the highest two levels of prior abuse. An omnibus comparison of this model with the model
that has total pooling yields a difference in the log-likelihoods of 81 yielding a likelihood-ratio
statistic comparing the nested total pooling model of 162 with 37 degrees of freedom. This result
is statistically different from zero to the level of computer precision. A bit of further investigation yields the result that GDP per capita is primarily responsible for this result because the sign
changes for different comparisons. At the same time, the resulting model is a bit puzzling because
the prior states are, generally, no longer differentiable from zero and they also no longer form a
strict order.38 Following from a comparison of the pooled and partially pooled models, we turn to
a characterization of the relevant transition matrices to highlight the importance of pooling versus
partial pooling.
NB: Discuss the key differences that appear in Table 4. Though the differences are subtle, most
manifest themselves in the higher present values for the partially-pooled transition matrix. This
makes considerable sense when we recall Table 2. The majority of support in the distribution of
outcomes lies at low levels of political terror; pooling tends to pull the cutpoints toward the mass
of the data making transitions toward worsening human rights conditions generally less likely.
For the broader study of ordered Markov models, columns seven through eleven reestimate
the model with partial pooling – the covariate effects are constrained to be equal across equations with the exception of the democracy variable; democracy is allowed to map to human rights
38 Unpacking
this puzzle is a subject for a different paper.
28
abuses in a way that depends on the prior state.39 The model improves the fit of a model with
total pooling, as evidenced by a statistically significant likelihood-ratio test (χ2 = 31, 10 d.f.).40
This result is somewhat surprising given the claims of Diggle et al. (2002) and Epstein et al. (2006)
that there is an equivalence between the saturated model (with controls for the prior state and
interaction terms between the relevant covariates and the prior states). Deeper reflection makes
clear why this may not be the case. Pooling the regression equations and fixing the variance of
the errors, the effect of any covariate, including the prior state, is to shift the latent distribution
around holding the cutpoints fixed. However, if their innate probability measure depends on the
prior state, the pure pooling estimator will recover a convex combination of the cutpoints from the
partial pooling estimator. Moreover, the measure of the maximum probability is fairly straightforward to derive because the symmetry and unimodality of the normal and logistic distributions
implies that that the maximum category probability would be obtained with an expectation that
bisects two cutpoints.41
Given the fact that none of the parameters in the model is unit or time-varying, we can characterize the invariant or steady-state distribution. Before we do so, a comment is in order. The lack
of time or unit-varying parameters is not a complete characterization of the relevant boundaries
for the invariant distribution to be of interest. More generally, what is required is a “stationary
counterfactual”. What we mean is that the counterfactual should be reasonable in a dynamic
sense. For example, where economic variables that are subject to trends or are best characterized
by random walks (with or without drift), it is unlikely that a characterization of the invariant distribution is meaningful because the underlying assumption that all else is equal is almost certain
not to hold. However, the existence of time-varying or unit-varying Markov matrices need not, in
39 This
was accomplished in R (R Development Core Team 2004) using optim. The function is reported in Table 6.
downside of estimating this model is that forms of multiple equation constrained ordered probit/logit models
do not exist in common statistical software. That said, the ordered regression likelihood given before can be easily
programmed and optimized.
41 For the polar categories, any scenario can be replicated as the parameters run to infinity in absolute value.
40 The
29
itself, invalidate characterization of the invariant distribution for defensible scenarios. However,
these scenarios are rendered dynamic in attempts to characterize the invariant distribution; thus,
an extra burden is placed on the researcher to defend the set of assumptions that are made to
define the scenario in which the invariant distribution is characterized. Turning to the estimands
of interested, all we have statistically shown is that a part of the transition matrix does not respond to changes in the level of democracy. It remains to be shown whether or not democracy
may yet result in an improvement in human rights conditions because of the positive probability,
by random chance alone, that highly repressive states transition to better outcomes where higher
levels of democracy should then improve the human rights situation even further. We display this
evidence in Figure 2.
Figure 2 displays the invariant distribution obtained from the far left column of Table 4. The
counterfactual being evaluated sets all numeric regressors at their means and the binary indicators to zero. The red lines arise from setting the Polity indicator to 10 and the blue lines arise from
setting the Polity indicator to 0. The solid lines reflect the probability of the lowest level of human
rights abuses and, as we can see, a world full of the most democratic of states would consist of
about 37% at the lowest level of human rights abuse while a world full of the least democratic
states would consist of about 5% at the lowest level of repression. The short-dashed lines demonstrate that the invariant distribution of nations at Level 2 would not differ dramatically between
all democracies (41%) and all non-democracies (39%). The dotted lines reflect the proportion of
nations at Level 3 and there are far fewer of these in a world of maximal democracies (17%) than in
a world of nondemocracies (41%). The dot-dash lines display the proportion at the second highest
level of human rights abuses; this condition is far more likely for nondemocratic nations (14%)
than democracies (4%). Finally, the highest level of repression seldom occurs under either scenario, though the invariant distribution makes this more likely under nondemocracies (1.6%) than
under democracies (1%). One key finding is that this counterfactual suggests the highest levels of
30
Figure 2: The Invariant Distribution for Table 2
0.8
Toward the Invariant Distribution
Pr(y=1)
Pr(y=2)
0.6
Pr(y=3)
Pr(y=4)
0.4
0.2
0.0
Pr(y=j)
Pr(y=5)
0
5
10
15
20
25
T
Democracy=10−−Red,Democracy=0−Blue
31
30
repression to be quite unlikely independent of the impact of democracy on human rights abuses,
though democracy does seem, to some extent, to preclude abuse in the long-run.
5.4
Testing Some Extensions
In brief, we consider extended models for describing state dependence in the Political Terror
Scale. We consider a second-order Markov model and briefly describe an extension that may
have broader application when the number of data points is insufficient to examine higher-order
Markov models. We forego the presentation of the second-order Markov model and instead utilize Markov Chain Monte Carlo techniques to graphically display the intuition for why the model
offers and improvement. In general, a formal likelihood ratio test of a first-order specification
nested in a second-order specification yields a χ2 statistic of 244.12 with 18 degrees of freedom;
this statistic is roughly ten times the .05 level critical value. Graphically, Figure 3 plots the firstorder (black-dotted) and second-order estimates of the parameters describing state dependence.
The purple solid lines represent the second lag equal to one and the red solid lines capture a second lag equal to two. The solid blue densities represent a second lag of three while green and
brown solid lines represent the densities of lag two fours and fives, respectively. The dashed black
line represents the density of estimates from a first order model and the identity of the first order model is given in the title above each panel. Beginning with the first lag equal to one, we
see a black dashed density very similar to the solid purple density with a slight shift toward the
second lag equal to two. The lag two equal to three effect is quite distinct, but there is little support in the data to estimate this parameter. Turning to the first lag equal to two, the first order
density is almost identical to the second-order effect of having resided in state two in both periods with two preceded by one slightly to the left and the remaining densities to the right. Even
from an examination of these two states alone, we can see evidence that first order dependence
does not capture all of the relevant state dependence. Turning to level three in the prior period,
32
the estimated first-order effect is a convex combination of second-order lag one to the left, almost
identical to second-order lag two, with second-order lag three just to the right and the remaining
densities some distance away. Level four in the prior period is marked by a somewhat clear distinction between second-order effects of categories one and two that are detectably to the right.
The estimated first order effect of level four is very similar to the reported effects of second-order
levels of three and four with a considerable amount of the density of the second-order effect of
lag five clearly distinguishable from the others. Finally, for lag five in the prior period, there is
considerable clumping of the second order effects for all but category five which is again distinct
and to the right of the others. In short, while the first-order estimates are interior to the more refined second-order effects, there is also evidence that the first-order transition model masks some
important forms of heterogeneity. However, if we recall the likelihood ratio test cited earlier, we
have consumed an additional eighteen degrees of freedom. To conclude the analysis of the political terror scale, we consider one possible reduction technique and briefly discuss but do not
estimate another.
One possible alternative that may improve the fit and parsimony of the model is to recognize
that having witnessed the highest level of political terror in either of the two most recent periods
appears to lead to different dynamics than residence in the remaining states. Because this observation does not depend on when a given nation experience society-wide terror, we examine and
compare phat support dependence as a plausible alternative. Of course, as we already pointed
out, phat dependence, and in a significantly reduced way, phat support dependence suffer from
the same problem of an increasing number of parameters as histories become longer, so some
way of shortening the conditioning set is needed. Here, we nest a second-order phat dependent
specification in the full second-order Markov model and examine the restriction. The associated
χ2 statistic is approximately 800 with 13 degrees of freedom so we fail to justify the restriction.
Indeed, it appears that the trajectory matters as comparing the phat-support dependence specifi33
Figure 3: MCMC Ordered Probit – Comparing the Influence of the Prior States for First- and
Second-Order Markov Models. Black dashed lines are estimates from the first-order specification. The
solid colored densities represent the influence of the prior states obtained from estimates of the second-order
Markov model.
0.0
0
2
4
0
2
3
Lag 1 = 2
Lag 1 = 3
4
5
4
5
1.5
0.0
1.5
NA
3.0
NA
0.0
2
3
4
5
0
1
2
3
NA
NA
Lag 1 = 4
Lag 1 = 5
0.0
0.0
1.5
NA
3.0
1
3.0
0
1.5
NA
1
NA
3.0
−2
NA
Lag 2 = 1
Lag 2 = 2
Lag 2 = 3
Lag 2 = 4
Lag 2 = 5
Pooled
1.5
NA
1.5
0.0
NA
3.0
Legend
3.0
Lag 1 = 1
0
1
2
3
4
5
0
NA
1
2
3
NA
34
4
5
6
cation to the first-order specification yields AIC/BIC evidence in favor of the first-order Markov
specification. The summary from these results is that, for political terror, there is clear evidence of
state dependence. Though for all of these models, there is also evidence that the central theoretical
claim of interest, that the effect of democracy depends on the prior state, is robust to all of these
various specification twists, further investigation is required to precisely pin down the order of the
transition model and the appropriate mix of history and parsimony in the modeling enterprise.
Consistent with the value of parsimony, one approach suggested to reduce the number of parameters while retaining second order dependence is the mixture-transition distribution approach
of Raftery (1985) and Berchtold and Raftery (2002). The mixture transition distribution model essentially eliminates the cross-lags in the various histories and instead estimates a linear combination of the separate inputs of each lag (which are constrained to have the same effect without
regard to the order). In effect, we are pooling over lag order using this model. As a result, in some
ways, it approximates a low-order phat dependent process because having resided in states sl and
sm have a joint impact that is the linear combination of the effect of having been in state sl and
state sm in the number of periods determined by the order, but whether one precedes the other or
not is irrelevant insofar as the model is concerned. To formalize the model, it can be described by
a probability model such that, for L relevant lags
( L)
Pr (yt = j|ht ) =
L
L
l =1
l =1
∑ ηl Pr(yt = j|yt−l = s0 ) = ∑ ηl πs s
0 j
(12)
where ∑kJ =1 ηk = 1 The simplest intuition for the model is that the model pools over lags from
different orders of the process with the same transitions. The parameters η define a set of weights
(that sum to one), that allow the longer past to be more or less heavily weighted than more recent observations with the number of parameters in η depending on L (minus 1). More generally,
the model requires only one additional parameter for each addition lag offering promise in con-
35
fronting the quickly growing number of parameters when deep histories are under consideration.
Indeed, Berchtold and Raftery (1999, p. 4) point to this as one of the most significant advantages
of applying the mixture transition distribution (MTD) to Markov chains. For a fully specified
Markov Chain of order L with a state space of cardinality J, the number of free parameters is
bounded above by J L ( J − 1) because there are J L combinations of prior states given L-order dependence and each row of the Markov matrix contains J − 1 free probabilities because estimation
of the first J − 1 establishes the value of the J th probability – valid probabilities must sum to one
in a closed set. By contrast, the marginal cost of an additional order of dependence in the MTD
is simply a single new parameter. For example, a first-order MTD has no lags other than one to
pool over and the η is given as one by the summation constraint. Thus, there are J ( J − 1) free
probabilities. A second-order MTD would contain one free η (because the other is determined by
the summation constraint) while relying on the same number of free probabilities to yield a model
with J ( J − 1) + 1 parameters. With these characteristics of the estimation of ordered Markov models in mind, we turn to a brief exposition of something that should be obvious – all of these claims
readily and quite simply extend to the multinomial case, also.
6
A Brief Discussion of Exchange Rate Regimes
There is a small literature among economists examining exchange rate transitions. For example,
Masson (2001) subjects the “two poles” hypothesis to empirical scrutiny, finding little evidence
that intermediate exchange rate regimes are disappearing. Building on his prior work, Masson
and Ruge-Murcia (2005) utilize measures of inflation, openness, growth, and reserves as determinants of transition probabilities. We expand their analysis further applying the same basic methods of inquiry while expanding the study of exchange rate regime transitions to monthly data.
The single and most fundamental argument that we advance is that there is no obvious reason to
36
assume that exchange rate regimes can be easily ordered on any single underlying dimension and
such arguments then necessitate the examination of models for multinomial choices.
A flurry of recent research has considered the proper measurement of exchange rate regime
choice. More than two decades of research on exchange rate regimes was based on data culled
from the International Monetary Fund’s Annual Report on Exchange Arrangements and Exchange
Restrictions, but these reports are based upon official exchange rates and declarations of exchange
rate regimes that may or may not comport with actual practice. In response to overreliance on de
jure exchange rate regimes, Ghosh, Gulde and Wolf (2003), Levy-Yeyati and Sturzenegger (1999),
Reinhart and Rogoff (2003), and Shambaugh (2004) have produced de facto measures of exchange
rate regimes. And all the available evidence suggests that the determinants of de jure and de facto
exchange rate regimes differ substantially (Simmons and Hainmueller N.d., von Hagen and Zhou
N.d.).
Exploring a detailed model of over 77000 monthly observations is a sizable task, in part because few covariates are available spanning the 40 year period covered by the exchange rate
regime series. Nonetheless, these data are useful for examining at least two interesting questions.
First, is there evidence that the transition matrix is influenced by the end of the Bretton Woods
period.42 Second, and of primary interest for those interested in the empirical determinants of
exchange rate regimes, simple models of the dynamics of exchange rate regimes provide some
information about the usefulness of attempts to treat exchange rate regimes as fundamentally ordered.
The nature of the transition matrix, reported in Table 7, reveals minor difficulties for the analysis owing to extreme sparseness in the transition matrix. A related difficulty arises from how
to treat specific categories. Reinhart and Rogoff’s (2003) data were originally classified into 14
categories, though they have combined classes to generate a “gross” coding with six category.
42 Of
course, it would be preferable not to be forced to specify, a priori, the time of the split.
37
However, two of these categories seem fundamentally different from the others. Category 6 represents exchange rate regimes that are centered around the existence of parallel exchange rates
implying that the market for foreign exchange is not unified. Because, in some sense, foreign exchange is necessarily rationed under parallel markets, it is not clear how to treat them. Similarly,
Code 5 belongs to countries that are “freely falling”. Because it is likely that such regimes result from pathological combinations of macroeconomic policies, transitions into and out of freely
falling may have little to do with exchange rate policy but with the reconciliation of fundamentally contradictory macroeconomic objective pursued in violation of Tinbergen’s Law. Whatever
the reason, there are compelling reasons to ignore transitions into and out of these two regimes.
Much less defensibly, we have also chosen to remove free floats from the sample because all three
permissible transitions occur less than twice (two occur only once). Thus, we model the submatrix
made up of the first three rows and columns of Table 7.
The three categories that make up the matrix for analysis consist of countries with no separate legal tender, pegs and currency boards, horizontal bands of less than 2% and de facto pegs
constitute what we label as Fix/Peg Regimes. Crawling pegs and crawling bands, particularly
pre-announced policies to limit volatility within small (2%) bands describe the range of Crawling
Pegs that we examine. Finally, Bands/Floats combine de facto bands of less than 5%, moving
bands, and managed floats.
In order to answer questions about the impact of the Smithsonian Agreements on the behavior
of de facto exchange rate regimes, we begin a simple intervention in January of 1972 and allow it
to run through the completion of the time period.43 This intervention will allow us to differentiate
both the Markov matrix and the invariant distribution of this Markov chain in the pre- and post1972 periods. With these ideas in mind, we turn to the model estimates reported in Table 3.
43 In the conclusions, we mention work that we have recently begun to utilize MCMC structural changepoint techniques to isolate a probability distribution over the precise timing of the change.
38
Variable
Coefficient
Crawling Pegs
Fix/Peg
-7.840∗∗
Crawling Pegs
5.106∗∗
Bands/Floats
-0.693†
Fix/Peg (Post-1972)
1.766∗∗
Crawling Pegs (Post-1972) 1.078∗∗
Bands/Floats (Post-1972)
1.863∗∗
Bands/Floats
Fix/Peg
-6.577∗∗
Crawling Pegs
-1.540∗
Bands/Floats
5.493∗∗
Fix/Peg (Post-1972)
-0.245
Crawling Pegs (Post-1972) 1.931∗∗
Bands/Floats (Post-1972)
1.443∗∗
N
Log-likelihood
χ2(12)
Significance levels :
(Std. Err.)
(0.277)
(0.268)
(0.369)
(0.321)
(0.340)
(0.531)
(0.148)
(0.636)
(0.214)
(0.278)
(0.691)
(0.396)
77880
-1867.768
167384.315
† : 10%
∗ : 5%
∗∗ : 1%
Table 3: Multinomial Markov Model Estimates: Fix/Peg is the omitted category.
39
Recalling that every regressor is simply a realization of the prior state and noting that the
reference category is fixed/pegged rates, a fix in the prior period makes a crawling peg strongly
less likely, though this effect is, to some degree mitigated in the post-1972 period. Crawling pegs in
the immediate past make crawling pegs in the present more likely and this effect is augmented for
the post-1972 period. Finally, bands/floats in the previous period have a weak negative effect on
the comparison between crawling pegs and fixed/pegged regimes that becomes weakly positive
after 1972. For the portion of the Table covering bands/floats, Fix/Pegs in the immediate past
make present period bands/floats quite unlikely and this effect is unchanged after 1972. Crawling
pegs in the prior period make fix/pegs more likely and bands/floats less likely before 1972, all
other things equal, but have no effect after 1972. Finally, bands/floats in the immediate past
make bands/floats in the present period much more likely and this effect is strengthened in the
post-1972 period. To offer a simplistic depiction of the difference between the pre- and post-1972
periods, we can summarize the difference in the two Markov matrices. Subtracting the post-1972
matrix from the pre-1972 matrix, we find the entire first column to be negative. Moreover, the
probability of a band/float conditional on a page fix/peg is negative; all the remaining differences
are positive. This implies that fix/pegs are decreasing in likelihood (albeit small in magnitude)
while more flexible arrangements are becoming more likely.
The second area of interest is in the convergence to the invariant distribution. Figure 4 displays
convergence toward the invariant distribution from a starting point in each state with probability
one for the transition matrix prior to 1972. There is so little difference in the invariant distribution
that we forego a presentation of both.44 In particular, the solid lines report the probability of a
fix/peg; the dashed lines represent the probability of a crawling peg and the dotted lines represent the probability of bands/floats. The colors signify the state in which the process began (red
- fix/peg, blue - crawling pegs, and green - bands/floats). As we see, the process takes a con44 This
difference between statistical and substantive significance is not uncommon in very large samples.
40
siderable amount of time to converge, but no matter the starting state, we converge on the same
invariant distribution as would be expected of an ergodic Markov chain.
A final comment on convergence is in order that is reminiscent of MCMC estimation. A simple
glance at Figure 7 reveals extraordinarily high levels of persistence and very few transitions. The
more dependent the process, the longer it takes to arrive at the invariant distribution and this is
clearly the case here. Comparing the invariant distribution plots between the human rights and
exchange rate regimes examples shows that the human rights process is quite close to the limit
in about 30 iterations. The exchange rate regimes example is still quite far from the invariant
distribution even after 200 iterations.
7
Concluding Remarks
This paper has highlighted theoretical notions of state dependence and methods for examining
state dependence in the general framework of Markov transition models. Following a brief discussion of some classes of dependence, we have focused on econometric estimators for rendering state
dependent processes estimable. Paying particular attention to the properties of ergodic Markov
chains, we have highlighted the importance of the invariant distribution as an estimand of interest for assessing the long-run outcomes of dynamic discrete processes and showcased methods
of measuring and testing central hypotheses arising from theories of dependent processes in an
application of Markov models to the relationship between democracy and human rights abuses.
Though these models have promise, considerable work remains.
First, the centrality of stationarity in facilitating an examination of long-run outcome distributions merits deep scrutiny. One plausibly valuable application of work by Chib and others
regarding changepoint processes is to utilize these techniques to examine classes of time-invariant
Markov chains. Though we have shown an example of what appears to be two distinct (or at
41
Figure 4: Multinomial Estimates Convergence on the Invariant Distribution
0.0
0.2
0.4
NA
0.6
0.8
1.0
Invariant Distribution
0
200
400
600
NA
Pre−1972
42
800
1000
least statistically differentiable) transition matrices before and after 1972 (the conclusion of the
Smithsonian Agreements), considerable further work remains. Though this and neighboring time
periods are plausible, there is value in a less ad hoc understanding of the precise of timing of
moves between the two stationary transition matrices. Maybe more important, when covariates
are introduced, it is the stationarity (in levels, not trends or otherwise) that is key to the existence
of an invariant distribution. To mix terms, a stationary counterfactual given the model results in
an ability to characterize the invariant distribution, while the introduction of covariates that are
both statistically significant and are likely subject to temporal dynamics are likely to undercut the
ability to estimate this theoretically interesting quantity.
Second, there is the looming question of distinguishing heterogeneity from state dependence.
Trivially, if there is some unobserved unit specific factor that makes residence in a particular state
more likely, this heterogeneity is particularly difficult to uncover for highly persistent processes.
At the same time, posing the initial question of whether or not history matters, this distinction assumes the fore. If we cannot differentiate heterogeneity from state dependence, we cannot answer
questions about whether or not every historical path is idiosyncratic or whether there is something about history, independent of the idiosyncracies of the units, that is driving the evolution of
qualitative time series.
Of utmost importance, we hope to have established a reasonable set of boundaries and classes
of candidate models for exploring the general claim that history matters. While estimation may
often, as a practical matter, restrict the richness of history that can be captured in a statistical
model, taking temporal dynamics seriously requires further steps toward treating dynamics as a
process of interest rather than relegating them to the status of nuisance parameters. Indeed, we
would argue it is not farfetched to suggest that dynamics are significantly underexplored and our
hope is to expand the interest in studying time series of qualitative variables by making clear the
simplicity with which they can be implemented.
43
P OLITY IV D EMOCRACY = 0
All Cutpoints Estimated
.347 .519 .127 .007
.064 .634 .264 .035
.007 .259 .606 .117
.003 .063 .364 .504
0
.038 .076 .556
0
.004
.0104
.065
.330
y t −1
y t −1
y t −1
y t −1
y t −1
=1
=2
=3
=4
=5
.317
.069
.013
.002
.0003
.616
.622
.269
.060
.009
Pooled Cutpoints
.062 .004
.0003
.281 .027
.002
.575 .134
.01
.441 .441
.056
.116 .581
.294
P OLITY IV D EMOCRACY = 10
All Cutpoints Estimated
.819 .163 .017 .001
.146 .706 .132 .014
.013 .383 .529 .069
.003 .063 .364 .504
0
.038 .076 .556
0
.0015
.006
.065
.33
y t −1
y t −1
y t −1
y t −1
y t −1
=1
=2
=3
=4
=5
.819
.153
.022
.002
.0003
.173
.692
.383
.0474
.0075
Pooled Cutpoints
.007 .00045 .00003
.143 .0112
.0007
.507 .0824
.0057
.391 .489
.071
.098 .559
.335
Table 4: Comparing Markov Matrices – Cutpoint Pooling
References
Amemiya, T. 1985. Advanced Econometrics. Cambridge, MA: Harvard University Press.
Apodaca, Clair. 2001. “Global Economic Patterns and Personal Integrity Rights after the Cold
War.” International Studies Quarterly 45:587–602.
Bartoszyński, Robert and Magdalena Niewiadomska-Bugaj. 1996. Probability and Statistical Inference. New York, NY: John Wiley & Sons[Wiley Series in Probability and Statistics].
Beck, Nathaniel, David Epstein, Simon Jackman and Sharyn O’Halloran. N.d. “Alternative Models of Dynamics in Binary Time-Series-Cross-Section Models: The Example of State Failure.”
Paper presented at the 2001 Annual Meeting of the Society for Political Methodology, Emory
University (Draft: July 12, 2002).
Berchtold, Andre and Adrian E. Raftery. 1999. The Mixture Transition Distribution Model for
High-Order Markov Chains and Non-Gaussian Time Series. Technical Report 360 Department of Statistics, University of Washington.
Berchtold, Andre and Adrian E. Raftery. 2002. “The Mixture Transition Distribution Model for
High-Order Markov Chains and Non-Gaussian Time Series.” Statistical Science 17(3):328–56.
Bhattacharya, Rabi N. and Edward C. Waymire. 1990. Stochastic Processes with Applications. New
York, NY: John Wiley & Sons [Wiley Series in Probability and Mathematical Statistics].
44
Boskin, M. J. and F. C. Nold. 1975. “A Markov Model of Turnover in Aid to Families with Dependent Children.” Journal of Human Resources 10:476–81.
Boswell, T. and W. Dixon. 1990. “Dependency and Rebellion: A Cross-National Analysis.” American Sociological Review 55:549–559.
Bueno de Mesquita, B., G. W. Downs, A. M. Smith and F. M. Cherif. 2005. “Thinking Inside the Box:
A Closer Look at Democracy and Human Rights.” International Studies Quarterly 49:439–57.
Bueno de Mesquita, Bruce, James D. Morrow, Randolph M. Siverson and Alastair M. Smith. 2003.
The Logic of Political Survival. Cambridge, MA: MIT Press.
Cingranelli, D. L. and D. L. Richards. 1999. “Measuring the Level, Pattern, and Sequence of Government Respect for Physical Integrity Rights.” International Studies Quarterly 43(2):407–417.
Davenport, C. 1995. “Multidimensional Threat Perception and State Repression: An Inquiry Into
Why States Apply Negative Sanctions.” American Journal of Political Science 39:683–713.
Davenport, C. 1996a. “Constitutional Promises and Repressive Reality: A Cross-National Time
Series Investigation of Why Political and Civil Liberties are Suppressed.” Journal of Politics
58:627–54.
Davenport, C. 1996b. “The Weight of the Past: Exploring the Lagged Determinants of Political
Repression.” Political Research Quarterly 49:377–405.
Davenport, C. and D. A. Armstrong. 2004. “Democracy and the Violation of Human Rights: A
Statistical Analysis from 1976-1996.” American Journal of Political Science 48(3):Forthcoming.
Davenport, Christian. 2007. “State Repression and Political Order.” Annual Review of Political Science 10:1–23.
Diggle, Peter J., Patrick Heagerty, Kung-Yee Liang and Scott L. Zeger. 2002. Analysis of Longitudinal
Data. Second edition ed. Oxford, UK: Oxford University Press.
Epstein, David, Sharyn O’Halloran, Robert Bates, Jack Goldstone and Ida Kristensen. 2006.
“Democratic Transitions.” American Journal of Political Science 50(3):551–69.
Fein, H. 1995. “More Murder in the Middle: Life Integrity Violations and Democracy in the World,
1987.” Human Rights Quarterly 17:170–91.
Freedman, David A. 2006. “On the So-Called “Huber Sandwich Estimator” and “Robust Standard
Errors”.” The American Statistician 60(4):299–302.
Gartner, S. S. and P. M. Regan. 1996. “Threat and Repression: The Non-Linear Relationship between Government and Opposition Violence.” Journal of Peace Research 33(3):273–87.
45
Ghosh, Atish, Anne-Marie Gulde and Holger Wolf. 2003. Exchange Rate Regimes: Choices and Consequences. Cambridge, MA: MIT Press.
Gleditsch, Nils Petter, Peter Wallensteen, Mikael Erikson, Margareta Sollenberg and Haavard
Strand. 2002. “Armed Conflict 1946–2001: A New Dataset.” Journal of Peace Research 39(5):615–
37.
Hafner-Burton, Emilie M. 2005a. “Right or Robust? The Sensitive Nature of Political Repression
in an Era of Globalization.” Journal of Peace Research 42(6):679–98.
Hafner-Burton, Emilie M. 2005b. “Trading Human Rights: How Preferential Trade Agreements
Influence Government Repression.” International Organization 59(3):593–629.
Hafner-Burton, Emilie M. and Kiyoteru Tsutsui. 2005. “Human Rights in a Globalizing World:
The Paradox of Empty Promises.” American Journal of Sociology 110(5):1373–1411.
Hathaway, Oona. 2002.
111:1935–2042.
“Do Human Rights Treaties Make a Difference?” Yale Law Journal
Hawkins, Darren G. 2003. “Universal Jurisdiction for Human Rights: From Legal Principle to
Limited Reality.” Global Governance 9(3):347–65.
Henderson, C. 1991. “Conditions Affecting the Use of Political Repression.” Journal of Conflict
Resolution 35:120–42.
Keith, Linda Camp. 1999. “The United Nations Covenant on Civil and Political Rights: Does it
Make a Difference in Human Rights Behavior?” Journal of Peace Research 36(1):95–118.
Keith, Linda Camp. 2002. “Constitutional Provisions for Individual Human Rights 1966-1977: Are
They More than Mere Window Dressing.” Political Research Quarterly 55:111–43.
Kelton, W. David and Christina M. L. Kelton. 1984. “Hypothesis Tests for Markov Process Models Estimated from Aggregate Frequency Data.” Journal of the American Statistical Association
79(388):922–28.
Landman, Todd. 2005. Protecting Human Rights: A Comparative Study. Washington DC: Georgetown University Press.
Leon, Larry F. and Chih-Ling Tsai. 1998. “Assessment of Model Adequacy for Markov Regression
Time Series Models.” Biometrics 54(3):1165–75.
Levy-Yeyati, Eduardo and Frederico Sturzenegger. 1999. “Classifying Exchange Rate Regimes:
Deeds vs. Words.” Business School, Universidad Torcuato Di Tella, December.
46
Lindsey, J. K. 2004. Statistical Analysis of Stochastic Processes in Time. Cambridge Series in Statistical
and Probabilistic Mathematics Cambridge University Press.
Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge, UK:
Cambridge University Press.
Marshall, Monty G. and Keith Jaggers. 2002. Polity IV Project: Political Regime Characteristics and
Transitions, 1800-2002. Technical report University of Maryland, College Park and Colorado
State University.
Martin, Andrew D., and Kevin M. Quinn. 2006. MCMCpack: Markov chain Monte Carlo (MCMC)
Package. R package version 0.7-4.
URL: http://mcmcpack.wustl.edu
Masson, Paul R. 2001. “Exchange Rate Regime Transitions.” Journal of Development Economics
64(2):571–86.
Masson, Paul R. and Francisco J. Ruge-Murcia. 2005. “Explaining the Transition between Exchange
Rate Regimes.” Scandanavian Journal of Economics 107(2):261–78.
McCormick, J. M. and N. J. Mitchell. 1997. “Human Rights, Umbrella Concepts, and Empirical
Analysis.” World Politics 49(4):510–25.
McKinlay, R. D. and A. S. Cohan. 1975. “A Comparative Analysis of the Political and Economic
Performance of Military and Civilian Regimes.” Comparative Politics 7(3):1–30.
McKinlay, R. D. and A. S. Cohan. 1976. “Performance and Stability in Military and Nonmilitary
Regimes.” American Political Science Review 70:850–64.
Meyer, W. 1996. “Human Rights and MNCs: Theory and Quantitative Evidence.” Human Rights
Quarterly 18:368–97.
Mitchell, N. J. and J. M. McCormick. 1988. “Economic and Political Explanations of Human Rights
Violations.” World Politics 40:476–98.
Neumayer, Eric. 2005. “Do International Human Rights Treaties Improve Respect for Human
Rights?” Journal of Conflict Resolution 49(6):925–53.
Page, Scott E. 2006. “Path Dependence.” Quarterly Journal of Political Science 1(1):87–115.
Park, H. 1987. “Correlates of Human Rights: Global Tendencies.” Comparative Politics 9:405–13.
Pierson, Paul. 2000. “Increasing Returns, Path Dependence, and the Study of Politics.” American
Political Science Review 94(2):251–67.
47
Pierson, Paul. 2004. Politics in Time: History, Institutions, and Social Analysis. Princeton, NJ: Princeton University Press.
Poe, S. C., S. C. Carey and T. C. Vazquez. 2001. “How are These Pictures Different? A Quantitative
Comparison of the US State Department and Amnesty International Human Rights Reports,
1976–1995.” Human Rights Quarterly 23(3):650–77.
Poe, Steven C. and C. Neal Tate. 1994. “Repression of Human Rights to Personal Integrity in the
1980s: A Global Analysis.” American Political Science Review 88(4):853–872.
Poe, Steven C., C. Neal Tate and Linda C. Keith. 1999. “Repression of the Human Right to Personal
Integrity Revisited: A Global Cross-National Study Covering the Years 1976.” International
Studies Quarterly 43(2):291–313.
Poe, Steven C., Wesley T. Milner and David A. Leblang. 1999. “Security Rights, Subsistence Rights,
and Liberties: A Theoretical Survey of the Empirical Landscape.” Human Rights Quarterly
21(2):403–43.
Pratt, John W. 1981. “Concavity of the Log Likelihood.” Journal of the American Statistical Association
76(373):103–106.
Przeworski, A., M. E. Alvarez, J. A. Cheibub and F. Limongi. 2000. Democracy and Development:
Political Institutions and Well-Being in the World, 1950–1980. New York, NY: Cambridge University Press.
R Development Core Team. 2004. R: A Language and Environment for Statistical Computing. Vienna,
Austria: R Foundation for Statistical Computing. ISBN 3-900051-00-3.
Raftery, Adrian E. 1985. “A Model for Higher-Order Markov Chains.” Journal of the Royal Statistical
Society, Series B 47:528–39.
Reinhart, Carmen and Kenneth Rogoff. 2003. “The Modern History of Exchange Rate Arrangements: A Reinterpretation.” Quarterly Journal of Economics 119(1):1–48.
Richards, D. L., R. D. Gelleny and D. H. Sacko. 2001. “Money with a Mean Streak? Foreign Economic Penetration and Government Respect for Human Rights.” International Studies Quarterly 45:219–39.
Shambaugh, Jay C. 2004. “The Effect of Fixed Exchange Rates on Monetary Policy.” Quarterly
Journal of Economics 119(1):301–52.
Simmons, B. A. and J. Hainmueller. N.d. “Can Domestic Institutions Explain Exchange Rate
Regime Choice? The Political Economy of Monetary Institutions Reconsidered.” Working
Paper, Weatherhead Center for International Affair, Harvard University, 19 May 2005.
48
Toikka, R. S. 1976. “A Markovian Model of Labor Market Decisions by Workers.” American Economic Review 66:821–34.
von Hagen, Juergen and Jizhong Zhou. N.d. “Fear of Floating and Fear of Pegging: An Empirical
Analysis of De Facto Exchange Rate Regimes in Developing Countries.” Working Paper, ZEI,
Bonn, February 2004.
Walker, Robert W. N.d. “Democracy and Human Rights Abuse: Implications from a First-Order
Markov Model.” Working Paper, Department of Political Science, Washington University in
Saint Louis.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA:
MIT Press.
Zanger, S. 2000. “A Global Analysis of the Effect of Political Regime Changes on Life Integrity
Violations, 1977–93.” Journal of Peace Research 37(2):213–33.
Zeger, Scott L. and Bahjat Qaqish. 1988. “Markov Regression Models for Time Series: A QuasiLikelihood Approach.” Biometrics 44(4):1019–31.
49
50
10.885***
0.56
8.068***
0.539
5.343***
0.524
1.935***
0.513
N
3223
chi2
3813.15
bic
5777.135
* p<0.05, ** p<0.01, *** p<0.001
cut4
Constant
cut3
Constant
cut2
Constant
cut1
Constant
Lag AI=5
Lag AI=4
Lag AI=3
Lag AI=2
log(Population)
Chg. in Population
log(GDP per capita)
Growth in GDP per capita
Int. Civil Wars
International Wars
Civil Wars
Polity*Lag AI=5
Polity*Lag AI=4
Polity*Lag AI=3
Polity*Lag AI=2
Polity 4 Democracy
712
122.206
811.061
5.143***
1.511
2.043
1.154
-0.557
1.138
0.581
0.682
0.636
0.647
0.032
0.018
-0.403***
0.081
-0.204*
0.083
0.228**
0.076
Lag=1
-0.226***
0.037
1071
127.446
1993.763
8.959***
1.07
6.581***
0.918
4.155***
0.895
0.5
0.885
2.148
1.688
0.056
0.861
-0.051
0.46
-0.007
0.013
-0.342***
0.05
0.089
0.054
0.346***
0.053
Lag=2
-0.051*
0.021
848
104.622
1724.331
11.415***
1.047
8.757***
1
5.711***
0.964
1.791
1.021
1.687***
0.416
1.086
0.657
1.386**
0.45
-0.013
0.011
0.052
0.063
0.176**
0.063
0.383***
0.049
Lag=3
-0.067**
0.022
441
76.963
974.423
8.09***
1.415
5.048***
1.381
2.615
1.37
-0.561
1.677
1.899***
0.279
1.495
0.79
1.036*
0.452
-0.009
0.02
0.35***
0.1
-0.089
0.055
0.196**
0.067
Lag=4
-0.032
0.029
151
21.382
307.506
1.118
2.565
-1.734
2.596
-3.035
2.635
0.924*
0.36
-0.03
0.867
-0.19
0.614
-0.036*
0.018
0.42*
0.176
0.198
0.149
-0.134
0.129
Lag=5
0.005
0.055
Table 5: Markov Models: Variants of Ordered Logit
Pooled
-0.228***
0.027
0.139***
0.029
0.174***
0.032
0.253***
0.036
0.247***
0.055
1.377***
0.185
0.603
0.325
0.579**
0.219
-0.011
0.007
-0.161***
0.031
0.011
0.03
0.245***
0.027
1.839***
0.236
3.578***
0.245
5.352***
0.277
7.307***
0.348
4.692
0.549
2.159
0.516
1.409***
0.189
0.612*
0.326
0.576**
0.22
-0.011
0.007
-0.162***
0.031
0.015
0.03
0.249***
0.027
PP-Lag=1
-0.212***
0.027
8.359
0.754
5.984
0.517
3.598
0.488
0.08
0.491
0.12***
0.029
PP-Lag=2
7.314
0.595
4.682
0.518
1.745
0.503
-2.13
0.623
0.154***
0.032
PP-Lag=3
5.654
0.554
2.713
0.53
0.306
0.56
0.235***
0.036
PP-Lag=4
3.632
0.561
0.873
0.629
-0.31
0.784
0.229***
0.055
PP-Lag=5
Table 6: The optim function
llik.ologit.verify <- function(par, X1, X2, Xchg, y, ylag) {
beta <- par
Y <- as.matrix(y)
X1 <- as.matrix(X1)
X2 <- as.matrix(X2)
ylag <- as.vector(ylag)
democ.var <- Xchg
beta.fix1 <- beta[1:6] # No civil wars for lag=1
beta.fix2 <- beta[1:7] # Including civil wars
beta.d1 <- beta[[8]]
# Construct (gamma + beta_democracy_lag)
beta.d2 <- beta[[8]]+beta[[9]]
beta.d3 <- beta[[8]]+beta[[10]]
beta.d4 <- beta[[8]]+beta[[11]]
beta.d5 <- beta[[8]]+beta[[12]]
score.base1 <- as.vector(X1%*%beta.fix1)
score.base2 <- as.vector(X2%*%beta.fix2)
democ1 <- as.vector(democ.var*beta.d1)
democ2 <- as.vector(democ.var*beta.d2)
democ3 <- as.vector(democ.var*beta.d3)
democ4 <- as.vector(democ.var*beta.d4)
democ5 <- as.vector(democ.var*beta.d5)
score1 <- as.vector(score.base1 + democ1)
score2 <- as.vector(score.base2 + democ2)
score3 <- as.vector(score.base2 + democ3)
score4 <- as.vector(score.base2 + democ4)
score5 <- as.vector(score.base2 + democ5)
p11 <- plogis((beta[13] - score1), log.p=TRUE)
p12 <- log(plogis(beta[14] - score1) - plogis(beta[13] - score1))
p13 <- log(plogis(beta[15] - score1) - plogis(beta[14] - score1))
p14 <- log(1 - plogis(beta[15] - score1))
p21 <- plogis((beta[16] - score2), log.p=TRUE)
p22 <- log(plogis(beta[17] - score2) - plogis(beta[16] - score2))
p23 <- log(plogis(beta[18] - score2) - plogis(beta[17] - score2))
p24 <- log(plogis(beta[19] - score2) - plogis(beta[18] - score2))
p25 <- log(1 - plogis(beta[19] - score2))
p31 <- plogis((beta[20] - score3), log.p=TRUE)
p32 <- log(plogis(beta[21] - score3) - plogis(beta[20] - score3))
p33 <- log(plogis(beta[22] - score3) - plogis(beta[21] - score3))
p34 <- log(plogis(beta[23] - score3) - plogis(beta[22] - score3))
p35 <- log(1 - plogis(beta[23] - score3))
p41 <- plogis((beta[24] - score4), log.p=TRUE)
p42 <- log(plogis(beta[25] - score4) - plogis(beta[24] - score4))
p43 <- log(plogis(beta[26] - score4) - plogis(beta[25] - score4))
p44 <- log(plogis(beta[27] - score4) - plogis(beta[26] - score4))
p45 <- log(1 - plogis(beta[27] - score4))
p52 <- plogis((beta[28] - score5), log.p=TRUE)
p53 <-log(plogis(beta[29] - score5) - plogis(beta[28] - score5))
p54 <- log(plogis(beta[30] - score5) - plogis(beta[29] - score5))
p55 <- log(1 - plogis(beta[30] - score5))
phi <- (Y==1 & ylag==1)|*p11 + (Y==2 & ylag==1)*p12 + (Y==3 & ylag==1)*p13
(Y==1 & ylag==2)*p21 + (Y==2 & ylag==2)*p22 + (Y==3 & ylag==2)*p23 + (Y==4
(Y==1 & ylag==3)*p31 + (Y==2 & ylag==3)*p32 + (Y==3 & ylag==3)*p33 + (Y==4
(Y==1 & ylag==4)*p41 + (Y==2 & ylag==4)*p42 + (Y==3 & ylag==4)*p43 + (Y==4
(Y==2 & ylag==5)*p52 + (Y==3 & ylag==5)*p53 + (Y==4 & ylag==5)*p54 + (Y==5
return(-sum(phi))
}
51
+
&
&
&
&
(Y==4 & ylag==1)*p14
ylag==2)*p24 + (Y==5
ylag==3)*p34 + (Y==5
ylag==4)*p44 + (Y==5
ylag==5)*p55
+
& ylag==2)*p25 +
& ylag==3)*p35 +
& ylag==4)*p45 +
|
Reinhart-Rogoff Coarse Codes
Lag code |
1
2
3
4
5
6 |
Total
-----------+------------------------------------------------------------------+---------1 |
49,552
51
64
1
26
16 |
49,710
|
99.68
0.10
0.13
0.00
0.05
0.03 |
100.00
-----------+------------------------------------------------------------------+---------2 |
37
13,466
37
4
15
2 |
13,561
|
0.27
99.30
0.27
0.03
0.11
0.01 |
100.00
-----------+------------------------------------------------------------------+---------3 |
31
40
14,602
3
62
3 |
14,741
|
0.21
0.27
99.06
0.02
0.42
0.02 |
100.00
-----------+------------------------------------------------------------------+---------4 |
2
1
1
1,853
7
0 |
1,864
|
0.11
0.05
0.05
99.41
0.38
0.00 |
100.00
-----------+------------------------------------------------------------------+---------5 |
23
33
60
11
5,777
10 |
5,914
|
0.39
0.56
1.01
0.19
97.68
0.17 |
100.00
-----------+------------------------------------------------------------------+---------6 |
20
3
6
1
8
2,692 |
2,730
|
0.73
0.11
0.22
0.04
0.29
98.61 |
100.00
-----------+------------------------------------------------------------------+---------Total |
49,665
13,594
14,770
1,873
5,895
2,723 |
88,520
|
56.11
15.36
16.69
2.12
6.66
3.08 |
100.00
Table 7: First-Order Markov Matrix: Monthly Reinhart-Rogoff Measures
52