Path, Phat, and State Dependence in Observation-driven Markov Models
by user
Comments
Transcript
Path, Phat, and State Dependence in Observation-driven Markov Models
Path, Phat, and State Dependence in Observation-driven Markov Models Robert W. Walker∗ Department of Political Science Program in Applied Statistics and Computation Washington University in Saint Louis E-mail: [email protected] July 18, 2007 Abstract Many social science theories posit dynamics that depend in important ways on the present state and focus on a reasonably small number of states. Despite the importance of theoretical notions of path dependence, empirical models, with a few exceptions (Przeworski, Alvarez, Cheibub and Limongi (2000); Beck, Epstein, Jackman and O’Halloran (N.d.); Epstein, O’Halloran, Bates, Goldstone and Kristensen (2006)), have paid little attention to the implications of state dependence for empirical studies. This despite the fact that there are many possible ways in which history might matter – we focus on the categorization given by Page (2006) – and these different ways that history might matter manifest themselves in sets of models that can be tested and compared. This paper considers the basic properties of observation-driven Markov chains [stationarity/time homogeneity, communication, transience, periodicity, irreducibility, and ergodicity] and the issues that arise in their implementation as likelihood estimators to provide a window into methods for the study of path dependence. Application of these concepts to longitudinal data on human rights abuses and exchange rate regime transitions provides evidence that history may also not exert uniform effects. The empirical examples highlight the subtle substantive assumptions that manifest in different modeling choices. The human rights example calls for an important qualification in the widely studied relationship between democracy and human rights abuses. The exchange rate regime example highlights the usefulness of Markov models for multinomial processes. ∗I thank Andrew Martin for encouraging me to write this down. All the usual disclaimers apply. 1 Introduction Path dependence and the mantra that “history matters” are frequently invoked as explanations for the evolution of political, social, and economic institutions. For example, Pierson (2000, 2004) is primarily focused on describing a reasonable range for the claim that history matters and much of the comparative historical-institutionalist school in political science, in one way or other, invokes path dependence as an important causal force. Political methodology has primarily focused on appropriate modeling practices for “temporal dependence” and the argument of Pierson (2000, 2004) is that this pays insufficient attention to the interaction of temporal sequencing and specific inputs of interest.1 Our goal is to place these arguments in a broader framework for understanding the important role of dynamics in statistical/econometric models. We begin by briefly reviewing the range of arguments regarding the broad claim that “history matters”. We first explore these arguments and the barriers that they present for the political methodologist. Borrowing the typology of Page (2006) we then lay out the models that correspond to different classes of such arguments and briefly comment on methods for distinguishing among them paying central attention to the importance of the Ergodic Theorem. We then turn to the development and characterization of one such model that facilitates a comparison of methods and the substantive implications of modeling choices. We conclude with directions for continuing this inquiry. 2 Path and State Dependence A broad literature has invoked the general notion of path dependence as key to understanding the temporal evolution of important political and economic phenomena. For example, economists 1 The term “temporal dependence” is often used to describe a correction or the inclusion of a nuisance parameter rather than rendering the temporal process itself an estimand of interest, particularly when divergences in nuisance have important substantive consequences. 1 have long considered the path dependence of technology (e.g. BetaMax versus VHS, the QWERTY keyboard). We begin with a brief bit of terminology that should make the discussion more clear. Suppose that we observe some set of units (countries, individuals, firms) (i = 1, 2, . . . , N) at multiple discrete points in time (t = 1, 2, . . . , T) and that at any period in time, these individuals must occupy one of a finite number of discrete states (denoted s j , ∀ j ∈ J). As a result, if individual i at time t resides in state j, we can write yit = j. Our principal interest is in the temporal evolution of some outcome defined on a set of discrete states. Page (2006) clarifies the properties of three distinct forms of what are generally referred to as “path dependent” processes – path dependence, state dependence, and “phat-dependence”. As Page shows, there are fundamental differences in these three understandings of the temporal evolution of some process of interest. Though Page’s (2006) argument is primarily concerned with a theoretical demonstration that the observation of Pierson (2000, 2004) and others that positive externalities are an integral part of path dependence is largely mistaken, he provides an extraordinarily useful taxonomy for understanding and applying the mantra that “history matters” to quantitative analyses. The key differences, according to Page, are whether or not sequencing matters and how much of the sequence may be relevant.2 Let us explore these processes in greater detail before turning to empirical models that represent these processes. Two key themes will emerge – the set of states in which the process has resided and their relative frequency and the ordering of the “visits”. First combining the set of states that we have visited and the ordering of such visits, a pathdependent process suggests that the present period realization of some variable of interest, call it y(t) , depends on the precise evolution of y prior to the present period. Calling all prior realizations of y a history and denoting said history (up to and including the present) h(t) , a path dependent 2 In the language of Markov models, the order of the chain describes how much of the sequence is relevant to the question at hand. 2 process is formalized as y ( t +1) = G ( h ( t ) ) . For a path-dependent process, both the set of states in which the process has previously resided and the order in which we resided in these states is of critical importance. While this is certainly the most complete rendering of the way in which history matters, the mathematical burden of such understandings can become quite substantial. To see why this must be true, a stylized example should suffice. Suppose that the process of interest has been observed for t − 1 periods. It should be straightforward to capture the t − 1 histories and appropriately model y(t) as a function of the history h(t−1) . Now let us consider y(t+1) , that is now a function of h(t) . With a discrete state space, the cardinality of the support of h(t) is bounded below by the cardinality of the support of h(t−1) – strictly so if the distribution of y(t+1) |h(t) is non-atomic. This implies that any attempt to empirically capture state dependence in an econometric setting must not only confront the fact that the number of parameters are increasing in t, but that, for a fixed number of units, degrees of freedom are a nonincreasing function of t. Such a form of historical determinism will often imply that further observations of a process make the problem harder and harder to confront. For example, supposing that we observe T realizations of the process and that these realizations can be naturally classified into J states, there are J T potential histories. Though this may not seem daunting, if one thought, for example, that democratization is a path-dependent process and one extends the model of Epstein et al. (2006) to accomodate path-dependence, their threecategory measure of autocracy/partial democracy/democracy may result in a whopping 1.2 × 1019 potential histories. With 200 countries and 40 observations per country, the ratio of potential parameters to data exceeds 1 × 1015 .3 As an empirical model, path dependence is likely to be 3 Of course, the actual number of histories is strictly bounded above by the size of the total sample so that this ratio can never exceed one in practice (if every data points arises from a unique history, the ratio is one). To see this, note that 3 elusive except when there is remarkable coincidence in the paths.4 In this respect, Pierson (2004, p. 78) is correct that “Historically oriented empirical work on sequences can build on a formidible intellectual tradition. Taken as a whole, this literature quite effectively undercuts the claim that the social significance of historical processes can be easily incorporated in the ‘values’ of particular ‘variables’ at a moment in time.” One of the simplest ways that this claim is true is that we cannot observe the same variable if history truly matters because, though the history might be the variable of interest, each present period history must be distinct from each prior period history by the addition of a new set of realizations of the process. If this claim is indeed true, the statistical analysis of path-dependent processes indeed presents a formidable, dare we say inconquerable, task. A phat dependent process, by contrast, emphasizes the set of states in which the process has resided without attention to their order. More formally, Page (2006, p. 97) writes that “a process is phat dependent if the outcome in any period depends on the set of outcomes and opportunities that arose in a history but not upon their order” such that y(t+1) = G ({ h(t) }) with { h(t) } uniquely identifying the distribution of states in prior realizations of y(t+1) . Three points merit consideration. If one were to write down the function generating path dependence, phat dependence would be a simple restriction on this model because the function mapping path into phat dependence is either one-to-one or many-to-one.5 Second, phat dependence, like path dependence, results in a in the first period there are J potential initial conditions and that the number of unique histories at any subsequent time point can at most be equal to the number of units. For 200 countries, period 5 is the first period in which the number of histories can equal the number of units observed (in the fourth period, there are 34 or 81 potential unique histories and in period 5, this number jumps to 243). 4 In the literature on fixed effects estimation, consistent estimators are often difficult to isolate for discrete state spaces and the asymptotics that need be applied in this case are interesting in their own right. Because the number of histories is not independent of the number of elapsed time periods, the asymptotic arguments necessary would (generally) be in the number of units rather than the number of time points. Generally is used because coincident histories provide some information about the probability of next period outcomes when outcomes diverge given the same history. This result arises from work on conditional (fixed effects) logit estimation (see Wooldridge (2002, p. 491) for a discussion). 5 Page (2006, p. 97) writes, “Testing for phat dependence requires a different econometric model than testing for path dependence.” While this is trivially true, phat dependence is nested in path dependence by the function governing the translation of permutations into combinations. Phat dependence is a restricted form of path dependence where order 4 time-varying number of parameters. Suppose that we observe a process with 3 states in period 1. In the second period, the permissible paths will be {11,22,33,12,21,13,31,23,32} with accompanying phats equal to {11,22,33,12,13,23}. In the third period, there are 33 possible paths with 10 possible phats.6 In very general terms, because the frequency distribution of outcomes will be time-varying, general phat dependence will have many of the same problems that result from full path dependence. From an econometric perspective, we suggest a reduced form of phat dependence that may prove valuable for some interesting political problems; we will label this form of dependence phat support dependence. The general idea underlying phat support dependence is that it is the set of states a process has visited and not the amount of time spent in any given state that is relevant. Page’s phat dependence requires an implicit belief in some form of reinforcement dynamic while phat support dependence suggests that this reinforcement dynamic is either/or and thus, does not vary depending upon the duration (number of periods) in a given state. Because our primary interest is in the estimation of discrete state Markov processes, this process is justified because it reduces the number of parameters in a way that does not seem inconsistent with the nature of phat dependence. At the same time, it may also be that the range of relevant applications for such a process is limited if the importance of recent history dominates the importance of long past events. Of course, it is possible to use techniques for the testing of nonnested models to discriminate among hypotheses about how history matters once the estimation problem becomes manageable. Revisiting again the example provided by Epstein et al. (2006), there are seven possible unordered histories for a three-category characterization of autocracy(A)/partial democracy(P)/democracy(D) {A,P,D,AP,AD,PD,APD}. More generally, if we does not matter. 6 There are six permutations that result in each state having been visited once (the same phat); 18 permutations where one state appears twice and some other state appears once (with six accompanying phats), and three paths where the same state appears in every period with three accompanying phats. 5 define the state space as having cardinality J, the number of potential phat support dependen cies induced is ∑kJ =1 CkJ = ∑kJ =1 kJ = ∑kJ =1 k!( JJ!−k)! . Underlying such a characterization is the belief that having resided in a particular state is the relevant feature of the dependence. From the standpoint of the range of relevant possiblities, this is a considerably easier problem. At the same time, the process is not “dynamic” in any interesting way. A pure-phat support dependent process takes no explicit account of the actual transitions that have taken place but only the states in which the process has previously resided. It may matter whether we previously went from autocracy to democracy and back to partial democracy instead of slowing moving from autocracy to partial democracy to full-fledged democracy if democracy is self-reinforcing (positive externalities in the language of Pierson) and that this particular pathway also creates negative externalities for regressions to autocracy/partial democracy. Let us write a “transition” matrix of phat dependence to illustrate the underlying process, π A,A π A,P π A,D π P,A π P,P π P,D π π D,P π D,D D,A P( phat) = π AP,A π AP,P π AP,D π AD,A π AD,P π AD,D π PD,P π PD,D π PD,A π APD,A π APD,P π APD,D (1) where the rows are defined by the states that have been visited in some period prior to t and the columns are the probabilities of residing in a particular state s j given the set of states that have been previously visited, Pr (s j = j|{ ht−1 }). In effect, this becomes the equivalent of a fixed-effects 6 estimator where the fixed effects are the set of states previously visited.7 As Page (2006) notes, trivially, path dependent and phat dependent relationships will also be state dependent. To accomplish this, define either the path or the phat as the state. Where this observation takes force is in the realization that the key to statistical/econometric applications is in properly characterizing the precise way in which history matters and that the appropriate model is one of state dependence; the holy grail is defining the appropriate range of states that matter. In widespread practice, a state dependent process emphasizes only the state of current residence and neither the range of previous states in which the process has resided nor in the order in which the process resided in such states. In the language of Page (2006, p. 95), “a process is state dependent if the outcome in any period depends only upon the state of the process at that time” or equivalently that y(t+1) = G (st ) with s ∈ S.8 State dependence is the most often examined, albeit frequently in a fairly limited form, form of dependence in the analysis of discrete temporal processes. These limitations in part stem from data limitations. For example, the study of Epstein et al. (2006) models transitions among autocracies/partial democracies/democracies as a process with state dependence and state-dependent effects of covariates. Masson (2001) and Masson and Ruge-Murcia (2005) consider the transition among exchange rate regimes as defined by a state dependent process. The overabiding reason for reliance on state-dependent processes involves infinite regress; if we do not constrain how far back “history matters” then we are left with a saturated econometric study that models each realization in terms of its own unique history. Of course, drawing inferences from such processes – an observation common to qualitative and quantitative researchers alike – would be strengthened by multiple identical histories to iso7 There is a lot more to say about this because fixed effects in an ordered setting can be identified in the interior, though invariant histories that do not transition contribute nothing to the likelihood. For the multinomial case, the results are less clear cut. The discussion of fixed and random effects estimators in Wooldridge (2002) is particularly useful. 8 In his discussion, Page (2006) goes further to emphasize the distinction between initial outcome and recent and early path-dependence to denote dependence on the first outcome, recent realizations prior to the immediate state, and earlier histories up to some (long?) past point in time. 7 late the influence of inputs on outcomes, but few politically interesting processes allow for such randomization/experimentation and we must make do with what we and others have observed. And this places limitations (for reasons of degrees of freedom) on the degree to which history can matter. In the language of Markov models, short-order state dependent processes are necessary in a finite sample world. In broad summary, there are three basic and interrelated types. Path dependence emphasizes both experience and ordering of the entire process while phat dependence emphasizes only the experiences that arise with residence in a particular state without regard to the timing of such residence. In both such cases, the number of potentially relevant parameters is increasing in time and this is likely to pose insurmountable problems for empirical practice. Phat-support dependence and state dependence are less prone to this profusion of parameters and can be thought of as path dependence where only the previous steps we have taken or the/some immediately prior step(s) matter(s). With these ideas about the general types of temporal dependence that may be of interest, we turn to statistical/econometric methods for studying discrete temporal processes with a view toward characterizing their relevant properties and empirically identifying the forms of state dependence in time series of qualitative variables via transition processes. 3 Markov Processes In this exposition, the primary variable of interest yt is a discrete variable observed at multiple time points, t ∈ T.9 For the sake of simplicity in demonstration, we assume a three category (1) (2) ordered process such that yt−1 is equal to one of yt−1 = 1 ∴ ~yt−1 = {1, 0, 0}, yt−1 = 2 ∴ ~yt−1 = (3) {0, 1, 0}, or yt−1 = 3 ∴ ~yt−1 = {0, 0, 1}. The relevant Markov matrix P can be specified as, 9 A note on notation is in order. When discussion properties of the transition matrix, subscripts will define states while superscripts will denote subjects and/or time where necessary for clarity. When discussing variables and their realizations, we will comply with standard practice and utilize subscripts to index units i and time t when necessary for clarity. 8 π11 π12 π13 P = π21 π22 π23 π31 π32 π33 (2) where rows are defined by yt−1 and columns are defined by yt . This requires that Pr (yt = m|yt−1 = l ) = πlm . With this information and given some previous state ~y, p(S j )t = ~yP. For example, were yt−1 = 3, p(S j )t = {π31 , π32 , π33 }. If one wished to know the probabilities at time t + 1 given state ( j) j at time t − 1, one could write p(S j )t+1 = (~yt−1 P)P. With this in mind, we can define some of the properties of Markov models [and the associated Markov processes].10 The first two properties to define will be of central relevance for the substantive problems to be examined (Definitions 3.1 and 3.2), while the conditions arising from the remainder will be useful in defining what is often the central quantity of interest – the invariant distribution. Definition 3.1. A Markov chain will be said to have stationary transition probabilities, or to be time homogeneous, if for all states yt = m and all previous states yt−1 = l πlm ⊥ t Definition 3.2. A Markov chain will be said to have homogeneous transition probabilities, or to be unit homogeneous, if for all states yt = m and all previous states yt−1 = l πlm ⊥ i 10 The following definitions are adapted from a combination of Bartoszyński and Niewiadomska-Bugaj (1996), Bhattacharya and Waymire (1990), and Amemiya (1985). Lindsey (2004, ch. 5) provides a particularly intuitive approach to estimating very basic Markov chains. 9 Underlying the above conditions is that a unique stochastic Markov matrix that does not explicitly depend on a function of the nominal units or time describes the transition among states in S.11 As we will illustrate, the precise form of unit homogeneity is such that allowing the probabilities to depend on observed covariates will not present particular problems, so long as the mappings between covariates and probabilities do not depend on the unit for which the mapping is posited.12 The same arguments apply to time homogeneity. The existence or uniqueness of the invariant distribution does not critically depend on the presence or absence of time-varying covariates so long as the mappings are not time-variant.13 Definition 3.3. A set S of states with cardinality J is closed if for every si with i ∈ J, we have ∑ πij = 1 sj ∈ J Intuitively, this implies that every (one-step) transition results in a move among discrete states in S and the logical extension of this condition, by induction, implies that it is impossible to leave a closed set of states. Even though, it is impossible to leave a closed set of states, we should also be concerned with the nature of transitions among states within the closed set. Two relevant properties, in this regard, are the communication among states and the reducibility of the Markov chain (taken as a whole), Definition 3.4. A Markov chain is said to be irreducible if the only closed set of states is the set (m) of all states S. Moreover, states si and s j are said to communicate if πij (n) > 0 and π ji > 0 where i, j ∈ J and m, n ∈ Z.14 11 Kelton and Kelton (1984) describe tests based on aggregate data on the population residing in given states at discrete times to test hypotheses of stationarity and homogeneity. 12 Even this does not pose particular problems, but the range of inferences would be limited to a forecast for each unit drawn from their unique parameters. 13 Though the stationarity of the counterfactual also matters. We will be more precise about this later. 14 This property might also be labeled joint accessibility in the sense that state s is accessible from state s in some j i 10 Communication, within a closed set of states, describes the range of permissible transitions among states. For our purposes, a Markov chain is irreducible if and only if all states communicate. In effect, this simply means that any state can be reached from any other state in some finite number of moves (or that the invariant distribution has all nonzero probabilities). We have a final property to define before we state the main theorem from which the analysis proceeds. (n) Definition 3.5. State si is periodic if there exists some d > 1 such that pii > 0 implies that n d ∈ Z. If no such d > 1 exists, a state is said to be aperiodic. A combination of the aforementioned definitions yields an important result regarding the ultimate estimand of interest for many Markov models – the invariant distribution. Following a statement of the theorem and a brief discussion of estimation and testing procedures, we will demonstrate that the substantive invariant distribution based on an estimated Markov matrix that may itself be heterogeneous and nonstationary can yield a stationary and homogenous invariant distribution for counterfactual inference. Theorem 3.1. Let P= [ pij ], i, j = 1, . . . , J be the transition probability matrix of an aperiodic and irreducible stationary Markov chain with a finite number of states J. The limits (n) lim pij = u j n→∞ exist for every 1, . . . , J and are independent of the initial state si (t0 ). Furthermore, the vector uj – the invariant distribution – satisfies the linear system of equations uj = ∑ u j p jk k = 1, . . . , J j number of steps, m, and state si is accessible from state s j in some number of steps, n where m and n need not be the same. 11 where ∑ j∈ J u j = 1. The proof is given in any number of probability/stochastic processes texts including Bartoszyński and Niewiadomska-Bugaj (1996) and Bhattacharya and Waymire (1990). We shall sketch the principal inputs – aperiodicity and irreducibility – before turning to our principal interest in the estimation of Markov models for studying dynamic discrete processes.15 First, aperiodicity is required if we pay particular attention to the role of the initial state. In simple terms, periodicity leads to divergence rather than convergence. For example, suppose, given the above definition of periodicity and assume that d = 2, that we begin parallel chains in the periodic state at time t and time t + 1. At time t + 2, the probability that the first chain resides in the periodic state is, of necessity, greater than zero while the probability that the second identical chain resides in the periodic state is zero. In the limit, the first chain will reside in the periodic state only when t is even while the second chain can only reside in the periodic state when t is odd. Irreducibilility plays a key role to prevent two parallel chains from never overlapping. To be concrete, suppose we start parallel chains in two states that do not communicate. With two (or more) closed sets in the state space, the probability of the two chains reaching the same point is zero. It may be that within each closed set of states, there is a unique limiting distribution, but these limiting distributions will themselves apply to separate closed sets within the state space and there can be no unique invariant distribution that is independent of the starting point of the process. The issue, such as it is, in application of the ergodic theorem to the study of time-series-crosssection data primarily rests on the parameters, not the data. For example, Amemiya (1985) describes two studies, Boskin and Nold’s (1975) study of welfare and Toikka’s (1976) study modeling the probability of transitions in labor force participation. In the former case, the model is heterogeneous and stationary while the latter examines labor force participation as a function 15 Stationarity should be obvious as the transition matrix then depends on the index with which limits are taken. The method of intuitive illustration follows the concept of coupling and the conditions are illustrated in terms of decoupling parallel chains. 12 of time-varying covariates as a homogenous and nonstationary Markov model. However, neither model embodies parameters that render the model nonstationary or heterogeneous, the covariates are the source of these dynamics. As a result of this fact, if we are simply interested in forecasting the invariant distribution at two different values of any given covariate, taking the mean of other covariates or some other relevant scenario as given, the ergodic theorem may be applied and the invariant distribution u j may be summarized ∀ j ∈ J. Put differently, taking the covariates as inputs in the absence of time or unit varying parameters, the resultant Markov matrix to be utilized for prediction of the invariant distribution is often both time and unit homogenous.16 As a result, the invariant distribution exists and is independent of the initial state when forecasted sufficiently far into the future. As we will see, sufficiently far is often quite a small number of time periods into the future. Furthermore, there is a fundamental distinction between covariates that exercise an influence on a particular transition probability and on their resultant impact on the invariant distribution. In applied practice, these two arguments are often conflated because they fail to recognize the distinction between a stochastic process governed by a stochastic transition matrix and the long-run distribution implied by such an effect. 4 Constructing the Markov Model To construct a Markov model for ordered data, we must first define an ordered dependent variable.17 Let this dependent variable be y arising from realizations of a set of states S that has cardinality J with discrete realizations j ∈ J. Amemiya (1985, p. 292: Definition 9.3.1) defines the 16 This is not always the case in the presence of deterministic relationships among covariates, but this can often be corrected with an appropriate counterfactual. For example, suppose that both GDP per capita and the rate of change in GDP per capita are posited determinants of some set of transitions. If the rate of change in GDP per capita is not set to zero, the step-ahead transition matrix must account for the fact that some nonzero growth rate is posited for one of the key inputs. In such a circumstance, the transition matrix is then nonstationary because assumptions about the growth rate imply changes in the base covariate that should manifest in a changing transition matrix. 17 To make things conformable everywhere, let us assume that data (denoted by Latin letters) arrive in row vectors and parameters (denoted by Greek letters) arrive in column vectors. 13 ordered model as, Definition 4.1. The ordered model is defined by Pr (y = j|x, θ ) = p(S j ) for some probability measure p depending on x and θ and a finite sequence of successive intervals {S j } depending on x and θ such that S j S j = R, the real line. Define y∗ as an unobserved latent variable, x as a k-vector of fixed and predetermined covariates, and θ as a set of parameters to be estimated. The unobserved latent variable y∗ is composed of a systematic and a stochastic component. Define the systematic component as y∗ = xβ and the stochastic component as e such that β is folded into θ in Definition 4.1. We must now define a mechanism for differentiating the categories. Define J + 1 threshold parameters, ~τ ∈ {τ0 , τ1 , . . . , τJ }, where the elements in ~τ are a strict order (∀ j ∈ J, τj > τj−1 ) with τ0 = −∞ and τJ = ∞.18 We can now write the probability that y = j given some well behaved density f as, Pr (y = j|xβ) = π j = Z τj f (y∗ |xβ)dy∗ . (3) τj−1 Define yj = 1 if y = j 0 otherwise, (4) and ~y as the concatenation of y j for all j ∈ J.19 . Because f is a proper density, π J = 1 − π1 − . . . − 18 Strict equality is required to ensure that p is not degenerate. If β were to contain a constant, ~ τ would be a J − 2 vector and each element would be equal to the appropriate equivalent element in ø minus the estimated constant. Our notation is similar to that of Pratt (1981) but the threshold parameters are a reversed strict order; Pratt’s τ0 = ∞. 19 When we refer to ~ y for a specific j, we will denote it ~y( j) 14 π J −1 , we can write the likelihood L as, J L( β|y, x) = ∏ π1 1 · π2 2 · . . . · (1 − y y i =1 J −1 ∑ πm )y J . (5) m =1 The log likelihood can be formed from (5) and insertion of the link function and thresholds generating J ln L( β, τ | y, x) = ∑ ∑ ln[ F(τj − xβ) − F(τj−1 − xβ)] (6) j =1 y i = j which is globally concave if f is positive and ln f is concave (Pratt 1981).20 As a result, assuming that each ordered category is observed, concavity of the log likelihood is assured for the logistic and normal distributions, and thus the ordered logit and probit models. With the basics of a regression model for ordered data in mind, we can turn to the definition of a Markov process. 4.1 The Simplest of Ordered Markov Models Maddala (1983, p. 57) writes down a simple Markov model such as yit∗ = xit β + ~yi,t−1~α + eit (7) that appreciates the dependence of observed yit on previous states ~yi,t−1 .21 Having written the model, we can explore the implicit assumptions. This model yields a single equation estimator with k + J − 1 unknowns yielding the following 20 To gain an intuitive sense, note that the first category reduces to ln F ( τ − xβ ) and the J th category reduces to 1 ln[1 − F (τJ −1 − xβ)]. 21 A model of this sort is estimated by Hafner-Burton (2005b). It would be inconsistent with an ordered level of measurement to map yi,t−1 into yit using a scalar multiplier because the very definition of order requires that we not know the size of the intervals, only their implicit order (Bartoszyński and Niewiadomska-Bugaj 1996, p. 464–65). As a result, we must utilize the J − 1 dimensional vector ~yi,t−1 . Identification then requires that the dimension of ~α be J − 1 and that the elements correspond to those in ~yi,t−1 . 15 conclusions: (i) the effects of x are independent of the prior state (~yi,t−1 ) and (ii) the latent scale measure of S j for all J does not depend on the prior state. The first point should be obvious as β is a scalar, assuming that k = 1, or a k-vector such that there is a single linear mapping from x into y∗ . The second point will be explored later, but it is useful to note that Diggle, Heagerty, Liang and Zeger (2002, p. 201) write that a saturated model of the transition matrix can be obtained by separately fitting an ordered response model to conditional on each of the prior states. They continue (on p. 203) to argue that the equation can be rewritten as a single, although somewhat complicated, regression equation using the binary decomposition of the lagged dependent variable (and possible interactions between the prior state and x). What remains to be seen is whether this is universally valid because avoiding the separate fitting of each equation (with possible constraints on parameters) pools the cutpoint estimates and whether such pooling is justified seems to be an empirical question. That said, it is straightforward to estimate each equation separately and avoid this potential complication. With these ideas in mind, let us turn to the specification of state-dependent effects of x. 4.2 State Dependent Effects in Ordered Markov Models Amemiya (1985, p. 422) builds on (7) to write the equivalent of yit∗ = xit ( β + γ j ) + ~yi,t−1~α + eit (8) that appreciates the dependence of observed yit on previous states ~yi,t−1 and the potential that x effects y∗ at time t differently depending on the prior state. To illustrate, let us consider a three state model of dyadic relations consisting of peace, disputes, and armed conflict. A considerable literature in international politics says that democracy need not necessarily inhibit the transition from peace to dispute, but that democratic dyads are quite unlikely to transition from disputes to 16 war. In this case, then, the effect of joint democracy is likely to be zero when considering the transition from peace to disputes, could be zero or negative in the transition from disputes to peace, but should be large and negative when influencing the probability that disputes transition into wars. Practically speaking, there are two obvious methods for estimating such a model. Diggle et al. (2002, p. 202–3) suggest a parameterization based on classifying the state vector at time t − 1 in a cumulative fashion and utilizing these as the basis for capturing the intercepts for the prior states and the possibility that covariate effects depend on the prior state.22 This method renders zero null hypothesis tests of the interacted coefficients referenced by the adjacent lower category and tests of whether or not categories can be combined by an examination of whether or not the α parameters attached to the elements of ~yC are statistically differentiable from zero.23 The parameterization above employing γ j that only involves J − 1 prior states and their interactions allows zero null hypothesis tests against the omitted prior state. 4.3 A Trivial Extension to Nominal Scales (or Formalizing the Likelihood) It is straightforward to generalize (5) to the case of a multinomial likelihood as follows.24 Define the likelihood L as a function of the probability of transition pijk (t) from state j to state k for individual i at time t as L = ∏ ∏ ∏ ∏ pijk (t) t 22 To i j k yij (t−1)yik (t) i · ∏ ∏ pij (t0 )y j (t0 ) i (9) t be precise, for a J category ordered variable, define (suppressing i and t) ~yC = {y1C . . . yCJ−1 } such that y1C = 1 if and only if yt−1 ≤ 1 and so on until yCJ−1 = 1 if yt−1 ≤ J − 1. The superscript C is used to denote the cumulative prior states. 23 Epstein et al. (2006) utilize these tests to defend the collapsing of partial-autocracies and partial-democracies. We suspect, but without their data cannot prove, that the assumption that the cutpoints can be pooled may weigh heavily on these determinations. It remains for future research to sort this out. 24 This seems unnecessary because the ordered likelihood is the same, our presentation above has made this necessary, but this should be synthesized for brevity. 17 which can be factored into two important parts – the conditional likelihood (to the left of ·) and the initial conditions of the state vector (to the right of the ·). To be clear, for the stationary transition matrix, the likelihood can be rewritten as L = ∏ ∏ ∏ ∏ pijk t i j yij (t−1)yik (t) · ∏ ∏ pij i k yij (t0 ) (10) t and the stationary and homogeneous transition matrix can be similarly written (omitting superscript i for the probabilities) as L = ∏ ∏ ∏ ∏ p jk t i j k yij (t−1)yik (t) i · ∏ ∏ p j y j ( t0 ) i (11) t Methods of estimation have been widely studied for such models, though the most widely employed are effectively maximum likelihood techniques under the general rubric of generalized linear models and GEE techniques.25 The former retain all of their optimal properties if indeed the likelihood is properly specified while the latter retain optimal properties in the presence of random intercepts employing population-averaged GEE techniques.26 Bayesian analogs relying on Markov Chain Monte Carlo techniques are also straightforward by analogous reasoning leading to a form of MCMCMC27 estimation. The extension of state-dependent covariates follows the same interactive logic previously presented. 25 See Diggle et al. (2002, p. 192–204) for a detailed discussion. Technically, because we are conditioning on the initial conditions, these are conditional maximum likelihood estimates but the conditional likelihood is identical (when controlling for initial conditions) to the functions optimized by widely available statistical software. 26 Consistent with the reasoning in Freedman (2006), Diggle et al. (2002, p. 200) suggest that the closeness of robust and asymptotic covariance matrix estimates provide evidence regarding the robustness of the Markovian assumption. 27 Technically, the appropriate description is Markov Chain Monte Carlo estimation of a Markov Chain. 18 4.4 Testing Markov Models Leon and Tsai (1998) present formal asymptotic results and evidence of the adequacy of test statistics for quasi-likelihood estimators of Markov regression models in the spirit of Zeger and Qaqish (1988). Though their simulation results are limited to four relatively small samples (60, 75, 100, and 150) and do not contain an explicit panel structure, the evidence suggests that for all but the smallest of samples, quasi-score, quasi-Wald, and quasi-likelihood estimates (defined in the traditional ways using the quasi-likelihood instead of the likelihood) have similar power and are generally of appropriate size with the standard χ2 distribution with degrees of freedom equal to the number of imposed restrictions.28 The important point for applied purposes is that the standard array of test statistics can be utilized in ways that are similar to their use for other classes of models. With models constructed and tests defined, we can turn to applications of these models to ordered and multinomial time series. 5 Application: Human Rights Abuses Previous studies of the dynamics of human rights abuses have focused on the importance of democracy in limiting the level of human rights abuses. A burgeoning literature on the determinants of human rights abuses29 is fundamentally concerned with an ordered dependent variable – the Political Terror Scale. Indeed, the relationship between democracy is arguably so robust that that Davenport (2007) has written of a domestic democratic peace analogous to the democratic 28 In the smallest sample (60), the quasi-score test seems to have the best properties. incomprehensive list would include works by Apodaca (2001), Boswell and Dixon (1990), Bueno de Mesquita, Downs, Smith, and Cherif (2005), Bueno de Mesquita, Morrow, Siverson and Smith (2003), Cingranelli and Richards (1999), Davenport (1995, 1996a, 1996b), Davenport and Armstrong (2004), Fein (1995), Gartner and Regan (1996), Hafner-Burton (2005a, 2005b), Hafner-Burton and Tsutsui (2005), Hathaway (2002), Henderson (1991), Keith (1999, 2002), McCormick and Mitchell (1997), McKinlay and Cohan (1975), McKinlay and Cohan (1976), Mitchell and McCormick (1988), Meyer (1996), Neumayer (2005), Park (1987), Poe, Carey and Vazquez (2001), Poe, Milner and Leblang (1999), Poe and Tate (1994), Poe, Tate and Keith (1999), Richards, Gelleny and Sacko (2001), and Zanger (2000), among others. 29 An 19 peace that suggests the absence of warmaking among democratic societies. Our interest is in characterizing the conditional impact of democracy in a Markovian framework.30 Here we simply restate and extend the results to a discussion of the invariant distribution and compare multiple specifications. 5.1 Defining Human Rights Many existing studies of human rights and democracy, including Poe and Tate (1994), Poe, Tate and Keith (1999), Davenport and Armstrong (2004), and Bueno de Mesquita, Downs, Smith and Cherif (2005) utilize the Purdue University “Political Terror Scale” (PTS) a five-category ordinal scale that measures human rights abuses from lowest to highest, according to increasing levels of imprisonment, torture, execution, disappearance, and more general forms of political terror. State Department and Amnesty International Reports are encoded according to five basic criteria that are reported in Table 1. While a number of distinct policy choices by governments are included in the evaluation, the resulting scale is ordered on a single dimension. At Level 1 and Level 2, a fundamental respect for life remains, though liberties may be circumscribed. By contrast, Level 331 is increasing in brutality but still largely influences the political sphere; Level 432 expands the numbers and the deprivation of life is widespread. Level 533 represents widespread and arbitrary societal violence. Though numerous mechanisms are commonly argued to describe the relationship between democracy and human rights abuses, it is not at all clear how the simple alteration of political in30 For a detailed development of the theory that justifies this effort, see Walker (N.d.). receiving this rating at least 12 times during the sample period from one or both of the State Department and Amnesty International include Albania, Bahrain, Bulgaria, Bangladesh, Chile, China, Ecuador, Egypt, Haiti, Honduras, Jordan, Mexico, Morocco, Paraguay and Syria. 32 Countries receiving this rating at least 12 times during the sample period from one or both of the State Department and Amnesty International include Brazil, Guatemala, India, Indonesia, Iran, Pakistan, Peru, the Philippines, South Africa, Sri Lanka, Turkey, and Uganda. 33 Countries receiving this rating at least 8 times during the sample period from one or both of the State Department and Amnesty International include Algeria, Angola, Colombia, Guatemala, Iran, Rwanda, Sri Lanka, and the Sudan. 31 Countries 20 Level 1: “Countries . . . under a secure rule of law, people are not imprisoned for their views, and torture is rare or exceptional . . ., political murders are rare.” Level 2: “There is a limited amount of imprisonment for nonviolent political activity. However, few persons are affected, torture and beating are exceptional . . . political murder is rare.” Level 3: “There is extensive political imprisonment, or a recent history of such imprisonment. Execution or other political murders and brutality may be common. Unlimited detention, with or without trial, for political views is accepted . . .” Level 4: “The practices of Level 3 are expanded to larger numbers. Murders, disappearance are a common part of life. . . In spite of its generality, on this level terror affects primarily those who interest themselves in politics or ideas.” Level 5: “The terrors of Level 4 have been expanded to the whole population. . . The leaders of these societies place no limits on the means or thoroughness with which they pursue personal or ideological goals. . .”a a Source: Poe and Tate (1994: 867); Gastil (1980), in original. Table 1: The Political Terror Scale stitutions in societies with extreme levels of repression should change much of anything because it is difficult to envision how such institutions can . Though the particular language varies among studies, there are four primary pathways: (i) democratic institutions increase the costs of repressive actions by providing elections as mechanisms to sanction repressive leaders; (ii) democracy is supported by a complex web of values that are challenged by the use of repressive instruments; (iii) democracies provide alternative dispute resolution mechanisms that weaken the justification for resorting to force for redressing grievances; (iv) international norms and the universal jurisdiction principle of international law allows third parties to prosecute human rights violations and democracies are more likely to adopt international laws governing at least some human rights (Hawkins 2003, Landman 2005). In each of these pathways, the relationship is argued to be unconditional, save possibly the second. Certainly, arguments that democratic institutions increase 21 the costs of repression, provide alternative dispute resolution mechanisms (and presumably make the use of such mechanisms more attractive), and the principle of universal jurisdiction in international law should not be altered by the immediate prior status quo level of repression. Only the second – democracy is supported by a complex web of values that are challenged by the employment of repressive tools – would induce variation in the effect of democracy on repression because the prior employment of repression would seemingly undermine the establishment of such values or provide evidence that such values have insufficiently taken hold. In either case, at least three of the four mechanisms proposed in the extant literature would be undermined by the finding that the influence of democracy on human rights abuses depends on past history. With this observation in mind, we turn to constructing a dataset for evaluating the claim that the influence of democracy depends on past history. 5.2 Data We employ the Polity IV indicator of democracy ranging from zero to ten (Marshall and Jaggers 2002). Polity measures authority patterns and the Polity IV Democracy measure is a composite of variables measuring the openness and competitiveness of executive recruitment, the degree of constraint on the exercise of executive authority, and regulations on political participation. While other measures are available, this measure is chosen because it is widely used in the literature, facilitating appropriate comparisons with past research, and because it is available for the broadest array of countries through time.34 To fully specify the model, we have scoured prior literature for additional controls. In general, the literature suggests that economic development, economic growth, the size of a country’s population, the change in the population, and involvement in civil and international wars are 34 The companion paper shows similar patterns for numerous operationalizations involving Polity and indicators from Freedom House, Vanhanen, Cheibub and Gandhi. 22 important determinants of human rights abuses. Relying on the International Monetary Fund’s International Financial Statistics, we are able to measure economic development and growth and the size of a country’s population (from which it is straightforward to calculate the change in population). Furthermore, the PRIO/Uppsala project has measured civil wars, internationalized civil wars, interstate wars, and other forms of violent conflict (Gleditsch, Wallensteen, Erikson, Sollenberg and Strand 2002). To follow previous literature, we employ their measures of interstate, civil, and internationalized civil war. The resulting model is remarkably similar to that estimated by Davenport and Armstrong (2004). The resulting dataset contains over 3300 observations between 1976 and 2003 on 180 countries. The tests that we conduct will utilize two basic models and will incorporate three primary forms of tests. The first model is a Markov ordered probit that is a single equation estimator including, as regressors, measures of the prior state and interactions between the prior state and democracy – the argued source of state-dependent effects. The second models rely on independent ordered probit models applied to each of the prior states and a constrained variant of this model that collapses coefficients that do not vary with the prior state. As we shall see, characteristics of the invariant distribution depend, at least somewhat, on this modeling assumption. Finally, we utilize a Bayesian Markov Chain Monte Carlo technique to estimate the single equation ordered probit model. The advantage of the MCMC techniques (because of the ergodic theorem) is that we can draw from the invariant distribution of the regression parameters of interest; because the key issue in assessing state-dependent effects involves sums of random variables, it is extraordinarily valuable to be able to draw random samples from the relevant posterior and simply characterize their sum.35 Before turning to the estimates, we report the Markov matrix to be modeled in Table 2. 35 Though we do not report them here, it also allows one to characterize the inherent uncertainty in estimates of the transition matrix in a much more natural way. 23 1 2 Amnesty International Political Terror Scalet−1 3 4 5 Total 1 559 78.51 116 10.83 7 0.82 1 0.23 0 0.00 683 21.18 Amnesty International Political Terror Scalet 2 3 4 133 19 1 18.68 2.67 0.14 706 218 28 65.92 20.35 2.61 226 493 112 26.62 58.07 13.19 21 126 238 4.76 28.57 53.97 3 6 61 1.99 3.97 40.40 1,089 862 440 33.78 26.74 13.65 5 0 0.00 3 0.28 11 1.30 55 12.47 81 53.64 150 4.65 712 100.00 1,071 100.00 849 100.00 441 100.00 151 100.00 3,224 100.00 Table 2: The Baseline Markov Matrix – (Row Percentages below raw cell frequencies) Table 2 makes clear that there is substantial first-order state dependence in measures of the Political Terror Scale and that this variation differs in a fashion that depends on the prior state. Over 78% of countries receive the lowest score continue to do so in the next period.36 By contrast, 66% of countries that receive the second highest score continue (in the previous period) at that level in the present period. The third state is persistent at just under 60% while the fourth and fifth prior values self-replicate at a rate of about 54%. 5.3 Estimates: Part I The far left column of Table 5 displays a model with full pooling of the coefficients and the cutpoints and we relax these constraints across the columns of the Table 5. We briefly discuss the results and their implications. Overall, the model in the first column appears to fit well; the model 36 Indeed, the Netherlands, New Zealand, Australia, Sweden, Norway, and ??? never receive any score other than the lowest throughout the sample period. It may be that the presence of these countries in the sample generates a form of pooling bias, though an investigation of this question lies beyond our present scope. 24 χ2 statistic is statistically differentiable from zero to the level of computer precision; the cutpoints form a strict order and the variables capturing the prior states are also a strict order implying a strong form of state dependence. Given our substantive interest, we first examine the top of the first column. There is clear evidence that democracy manifests state dependent effects. To uncover the precise magnitude of these effects for each prior state, it is necessary to pay attention to the additive sum of the effect of democracy and its interaction with the prior state. For example, the estimate of -0.23 implies that a one-unit change in the democracy score decreases the latent scale by -0.23 given prior residence in the first state (the lowest possible level of prior human rights abuses). Prior residence in the second state yields a total effect of (-0.228 + 0.139) 0.089 and a Wald test that this linear combination equals zero is statistically differentiable to the level of computer precision (χ2(1) = 20). Democracy decreases human rights abuses conditional on having been at the second lowest level in the prior period, but the effect is considerably mitigated. The effect is further mitigated when conditioned on residence in the third state in the prior period but is still differentiable from zero at conventional levels (χ2(1) = 5.96). Turning to estimates conditioned on previous residence in state 4 or 5, the absolute value of the interaction term exceeds the baseline effect. As a result, we should expect democracy to have no effect on human rights abuses, conditional on a high past level of abuse. Indeed, Wald tests confirm this fact (χ2(1) = 1.52 and χ2(1) = 0.76) leading to evidence that democracy only reduces the level of human rights abuses at low levels of prior abuse. Substantively, in societies that are not characterized by high levels of human rights abuses, democracy can improve the human rights record, but in those most in need of “democratic pacification” – those with the highest levels of past abuse – democracy exercises no pacifying effect. To scholars normatively committed to democratization as a mechanism for reducing repression, this evidence casts doubt on the basic presumption that institutional democracy will improve the human rights record without regard to the immediate history. 25 To render the conditional effect of democracy on past history more clear, we reestimate the model using Markov Chain Monte Carlo estimation of an ordered probit model, using MCMCpack by Martin, and Quinn (2006), so that we can plot the posterior densities of the state dependent effects. The density of these estimates are depicted in Figure 1. The baseline effect (not shown) is estimated to be 0.14 (not surprisingly, given the probit link it is 1.6 times smaller than the estimate in the Table) and the top two panels are also clearly different from zero and negative. This is consistent with the claim that democracy reduces human rights abuses. The bottom two panels tell a very different story. The median of both densities lies to the right of zero indicating weak evidence that democracy may worsen a nation’s human rights record. Certainly, it is hard to argue for democratic pacification in the presence of an immediate past history of widespread repression. Given prior literature and the almost universal finding that democracy decreases human rights abuses, this result provides an important counterweight and suggests that many of the extant mechanisms linking human rights and democracy may be flawed.37 With this evidence in mind, we turn to a further exploration of Table 5 Concluding the discussion of this column, the remaining covariates perform in ways consistent with prior literature. Countries with higher levels of per capita GDP tend to be less repressive; countries with larger populations tend to be more repressive. GDP Growth tends to reduce human rights abuses (though the substantive size of the effect is rather small) and Population Change exhibits no clear influence. Not surprisingly, Civil Wars significantly worsen the human rights records of states, while Internationalized Civil Wars also worsen human rights records, but with less than half the magnitude. International Wars exhibit no clear statistical impact. With these results in mind, can first examine the robustness of this specification, before turning to estimates of the long-run distribution of outcomes given our description of their existence according to the 37 Elsewhere, I assess the robustness of this claim to different operationalizations of the effect and different measures of both democracy and repression; the story does not change. 26 Figure 1: MCMC Ordered Probit – State Dependent Effects of Democracy on Human Rights β + γ3 30 20 0 10 Density 30 20 0 10 Density 40 β + γ2 −0.08 −0.06 −0.04 −0.02 −0.08 0.00 0.02 N = 1000 Bandwidth = 0.002505 β + γ4 β + γ5 8 0 5 4 10 Density 20 12 N = 1000 Bandwidth = 0.002169 0 Density −0.04 −0.04 0.00 0.04 0.08 −0.10 N = 1000 Bandwidth = 0.00327 0.00 0.05 0.10 N = 1000 Bandwidth = 0.006125 27 ergodic theorem. The second through sixth columns display estimates without assuming any form of pooling and all covariates are allowed to have different effects for the model representing each prior state. Though most of the effects are consistent with respect to sign, there are a few notable discrepancies. For example, the effect of GDP per capita is negative and statistically differentiable from zero for low levels of prior repression, but becomes statistically differentiable from zero and positive at the highest two levels of prior abuse. An omnibus comparison of this model with the model that has total pooling yields a difference in the log-likelihoods of 81 yielding a likelihood-ratio statistic comparing the nested total pooling model of 162 with 37 degrees of freedom. This result is statistically different from zero to the level of computer precision. A bit of further investigation yields the result that GDP per capita is primarily responsible for this result because the sign changes for different comparisons. At the same time, the resulting model is a bit puzzling because the prior states are, generally, no longer differentiable from zero and they also no longer form a strict order.38 Following from a comparison of the pooled and partially pooled models, we turn to a characterization of the relevant transition matrices to highlight the importance of pooling versus partial pooling. NB: Discuss the key differences that appear in Table 4. Though the differences are subtle, most manifest themselves in the higher present values for the partially-pooled transition matrix. This makes considerable sense when we recall Table 2. The majority of support in the distribution of outcomes lies at low levels of political terror; pooling tends to pull the cutpoints toward the mass of the data making transitions toward worsening human rights conditions generally less likely. For the broader study of ordered Markov models, columns seven through eleven reestimate the model with partial pooling – the covariate effects are constrained to be equal across equations with the exception of the democracy variable; democracy is allowed to map to human rights 38 Unpacking this puzzle is a subject for a different paper. 28 abuses in a way that depends on the prior state.39 The model improves the fit of a model with total pooling, as evidenced by a statistically significant likelihood-ratio test (χ2 = 31, 10 d.f.).40 This result is somewhat surprising given the claims of Diggle et al. (2002) and Epstein et al. (2006) that there is an equivalence between the saturated model (with controls for the prior state and interaction terms between the relevant covariates and the prior states). Deeper reflection makes clear why this may not be the case. Pooling the regression equations and fixing the variance of the errors, the effect of any covariate, including the prior state, is to shift the latent distribution around holding the cutpoints fixed. However, if their innate probability measure depends on the prior state, the pure pooling estimator will recover a convex combination of the cutpoints from the partial pooling estimator. Moreover, the measure of the maximum probability is fairly straightforward to derive because the symmetry and unimodality of the normal and logistic distributions implies that that the maximum category probability would be obtained with an expectation that bisects two cutpoints.41 Given the fact that none of the parameters in the model is unit or time-varying, we can characterize the invariant or steady-state distribution. Before we do so, a comment is in order. The lack of time or unit-varying parameters is not a complete characterization of the relevant boundaries for the invariant distribution to be of interest. More generally, what is required is a “stationary counterfactual”. What we mean is that the counterfactual should be reasonable in a dynamic sense. For example, where economic variables that are subject to trends or are best characterized by random walks (with or without drift), it is unlikely that a characterization of the invariant distribution is meaningful because the underlying assumption that all else is equal is almost certain not to hold. However, the existence of time-varying or unit-varying Markov matrices need not, in 39 This was accomplished in R (R Development Core Team 2004) using optim. The function is reported in Table 6. downside of estimating this model is that forms of multiple equation constrained ordered probit/logit models do not exist in common statistical software. That said, the ordered regression likelihood given before can be easily programmed and optimized. 41 For the polar categories, any scenario can be replicated as the parameters run to infinity in absolute value. 40 The 29 itself, invalidate characterization of the invariant distribution for defensible scenarios. However, these scenarios are rendered dynamic in attempts to characterize the invariant distribution; thus, an extra burden is placed on the researcher to defend the set of assumptions that are made to define the scenario in which the invariant distribution is characterized. Turning to the estimands of interested, all we have statistically shown is that a part of the transition matrix does not respond to changes in the level of democracy. It remains to be shown whether or not democracy may yet result in an improvement in human rights conditions because of the positive probability, by random chance alone, that highly repressive states transition to better outcomes where higher levels of democracy should then improve the human rights situation even further. We display this evidence in Figure 2. Figure 2 displays the invariant distribution obtained from the far left column of Table 4. The counterfactual being evaluated sets all numeric regressors at their means and the binary indicators to zero. The red lines arise from setting the Polity indicator to 10 and the blue lines arise from setting the Polity indicator to 0. The solid lines reflect the probability of the lowest level of human rights abuses and, as we can see, a world full of the most democratic of states would consist of about 37% at the lowest level of human rights abuse while a world full of the least democratic states would consist of about 5% at the lowest level of repression. The short-dashed lines demonstrate that the invariant distribution of nations at Level 2 would not differ dramatically between all democracies (41%) and all non-democracies (39%). The dotted lines reflect the proportion of nations at Level 3 and there are far fewer of these in a world of maximal democracies (17%) than in a world of nondemocracies (41%). The dot-dash lines display the proportion at the second highest level of human rights abuses; this condition is far more likely for nondemocratic nations (14%) than democracies (4%). Finally, the highest level of repression seldom occurs under either scenario, though the invariant distribution makes this more likely under nondemocracies (1.6%) than under democracies (1%). One key finding is that this counterfactual suggests the highest levels of 30 Figure 2: The Invariant Distribution for Table 2 0.8 Toward the Invariant Distribution Pr(y=1) Pr(y=2) 0.6 Pr(y=3) Pr(y=4) 0.4 0.2 0.0 Pr(y=j) Pr(y=5) 0 5 10 15 20 25 T Democracy=10−−Red,Democracy=0−Blue 31 30 repression to be quite unlikely independent of the impact of democracy on human rights abuses, though democracy does seem, to some extent, to preclude abuse in the long-run. 5.4 Testing Some Extensions In brief, we consider extended models for describing state dependence in the Political Terror Scale. We consider a second-order Markov model and briefly describe an extension that may have broader application when the number of data points is insufficient to examine higher-order Markov models. We forego the presentation of the second-order Markov model and instead utilize Markov Chain Monte Carlo techniques to graphically display the intuition for why the model offers and improvement. In general, a formal likelihood ratio test of a first-order specification nested in a second-order specification yields a χ2 statistic of 244.12 with 18 degrees of freedom; this statistic is roughly ten times the .05 level critical value. Graphically, Figure 3 plots the firstorder (black-dotted) and second-order estimates of the parameters describing state dependence. The purple solid lines represent the second lag equal to one and the red solid lines capture a second lag equal to two. The solid blue densities represent a second lag of three while green and brown solid lines represent the densities of lag two fours and fives, respectively. The dashed black line represents the density of estimates from a first order model and the identity of the first order model is given in the title above each panel. Beginning with the first lag equal to one, we see a black dashed density very similar to the solid purple density with a slight shift toward the second lag equal to two. The lag two equal to three effect is quite distinct, but there is little support in the data to estimate this parameter. Turning to the first lag equal to two, the first order density is almost identical to the second-order effect of having resided in state two in both periods with two preceded by one slightly to the left and the remaining densities to the right. Even from an examination of these two states alone, we can see evidence that first order dependence does not capture all of the relevant state dependence. Turning to level three in the prior period, 32 the estimated first-order effect is a convex combination of second-order lag one to the left, almost identical to second-order lag two, with second-order lag three just to the right and the remaining densities some distance away. Level four in the prior period is marked by a somewhat clear distinction between second-order effects of categories one and two that are detectably to the right. The estimated first order effect of level four is very similar to the reported effects of second-order levels of three and four with a considerable amount of the density of the second-order effect of lag five clearly distinguishable from the others. Finally, for lag five in the prior period, there is considerable clumping of the second order effects for all but category five which is again distinct and to the right of the others. In short, while the first-order estimates are interior to the more refined second-order effects, there is also evidence that the first-order transition model masks some important forms of heterogeneity. However, if we recall the likelihood ratio test cited earlier, we have consumed an additional eighteen degrees of freedom. To conclude the analysis of the political terror scale, we consider one possible reduction technique and briefly discuss but do not estimate another. One possible alternative that may improve the fit and parsimony of the model is to recognize that having witnessed the highest level of political terror in either of the two most recent periods appears to lead to different dynamics than residence in the remaining states. Because this observation does not depend on when a given nation experience society-wide terror, we examine and compare phat support dependence as a plausible alternative. Of course, as we already pointed out, phat dependence, and in a significantly reduced way, phat support dependence suffer from the same problem of an increasing number of parameters as histories become longer, so some way of shortening the conditioning set is needed. Here, we nest a second-order phat dependent specification in the full second-order Markov model and examine the restriction. The associated χ2 statistic is approximately 800 with 13 degrees of freedom so we fail to justify the restriction. Indeed, it appears that the trajectory matters as comparing the phat-support dependence specifi33 Figure 3: MCMC Ordered Probit – Comparing the Influence of the Prior States for First- and Second-Order Markov Models. Black dashed lines are estimates from the first-order specification. The solid colored densities represent the influence of the prior states obtained from estimates of the second-order Markov model. 0.0 0 2 4 0 2 3 Lag 1 = 2 Lag 1 = 3 4 5 4 5 1.5 0.0 1.5 NA 3.0 NA 0.0 2 3 4 5 0 1 2 3 NA NA Lag 1 = 4 Lag 1 = 5 0.0 0.0 1.5 NA 3.0 1 3.0 0 1.5 NA 1 NA 3.0 −2 NA Lag 2 = 1 Lag 2 = 2 Lag 2 = 3 Lag 2 = 4 Lag 2 = 5 Pooled 1.5 NA 1.5 0.0 NA 3.0 Legend 3.0 Lag 1 = 1 0 1 2 3 4 5 0 NA 1 2 3 NA 34 4 5 6 cation to the first-order specification yields AIC/BIC evidence in favor of the first-order Markov specification. The summary from these results is that, for political terror, there is clear evidence of state dependence. Though for all of these models, there is also evidence that the central theoretical claim of interest, that the effect of democracy depends on the prior state, is robust to all of these various specification twists, further investigation is required to precisely pin down the order of the transition model and the appropriate mix of history and parsimony in the modeling enterprise. Consistent with the value of parsimony, one approach suggested to reduce the number of parameters while retaining second order dependence is the mixture-transition distribution approach of Raftery (1985) and Berchtold and Raftery (2002). The mixture transition distribution model essentially eliminates the cross-lags in the various histories and instead estimates a linear combination of the separate inputs of each lag (which are constrained to have the same effect without regard to the order). In effect, we are pooling over lag order using this model. As a result, in some ways, it approximates a low-order phat dependent process because having resided in states sl and sm have a joint impact that is the linear combination of the effect of having been in state sl and state sm in the number of periods determined by the order, but whether one precedes the other or not is irrelevant insofar as the model is concerned. To formalize the model, it can be described by a probability model such that, for L relevant lags ( L) Pr (yt = j|ht ) = L L l =1 l =1 ∑ ηl Pr(yt = j|yt−l = s0 ) = ∑ ηl πs s 0 j (12) where ∑kJ =1 ηk = 1 The simplest intuition for the model is that the model pools over lags from different orders of the process with the same transitions. The parameters η define a set of weights (that sum to one), that allow the longer past to be more or less heavily weighted than more recent observations with the number of parameters in η depending on L (minus 1). More generally, the model requires only one additional parameter for each addition lag offering promise in con- 35 fronting the quickly growing number of parameters when deep histories are under consideration. Indeed, Berchtold and Raftery (1999, p. 4) point to this as one of the most significant advantages of applying the mixture transition distribution (MTD) to Markov chains. For a fully specified Markov Chain of order L with a state space of cardinality J, the number of free parameters is bounded above by J L ( J − 1) because there are J L combinations of prior states given L-order dependence and each row of the Markov matrix contains J − 1 free probabilities because estimation of the first J − 1 establishes the value of the J th probability – valid probabilities must sum to one in a closed set. By contrast, the marginal cost of an additional order of dependence in the MTD is simply a single new parameter. For example, a first-order MTD has no lags other than one to pool over and the η is given as one by the summation constraint. Thus, there are J ( J − 1) free probabilities. A second-order MTD would contain one free η (because the other is determined by the summation constraint) while relying on the same number of free probabilities to yield a model with J ( J − 1) + 1 parameters. With these characteristics of the estimation of ordered Markov models in mind, we turn to a brief exposition of something that should be obvious – all of these claims readily and quite simply extend to the multinomial case, also. 6 A Brief Discussion of Exchange Rate Regimes There is a small literature among economists examining exchange rate transitions. For example, Masson (2001) subjects the “two poles” hypothesis to empirical scrutiny, finding little evidence that intermediate exchange rate regimes are disappearing. Building on his prior work, Masson and Ruge-Murcia (2005) utilize measures of inflation, openness, growth, and reserves as determinants of transition probabilities. We expand their analysis further applying the same basic methods of inquiry while expanding the study of exchange rate regime transitions to monthly data. The single and most fundamental argument that we advance is that there is no obvious reason to 36 assume that exchange rate regimes can be easily ordered on any single underlying dimension and such arguments then necessitate the examination of models for multinomial choices. A flurry of recent research has considered the proper measurement of exchange rate regime choice. More than two decades of research on exchange rate regimes was based on data culled from the International Monetary Fund’s Annual Report on Exchange Arrangements and Exchange Restrictions, but these reports are based upon official exchange rates and declarations of exchange rate regimes that may or may not comport with actual practice. In response to overreliance on de jure exchange rate regimes, Ghosh, Gulde and Wolf (2003), Levy-Yeyati and Sturzenegger (1999), Reinhart and Rogoff (2003), and Shambaugh (2004) have produced de facto measures of exchange rate regimes. And all the available evidence suggests that the determinants of de jure and de facto exchange rate regimes differ substantially (Simmons and Hainmueller N.d., von Hagen and Zhou N.d.). Exploring a detailed model of over 77000 monthly observations is a sizable task, in part because few covariates are available spanning the 40 year period covered by the exchange rate regime series. Nonetheless, these data are useful for examining at least two interesting questions. First, is there evidence that the transition matrix is influenced by the end of the Bretton Woods period.42 Second, and of primary interest for those interested in the empirical determinants of exchange rate regimes, simple models of the dynamics of exchange rate regimes provide some information about the usefulness of attempts to treat exchange rate regimes as fundamentally ordered. The nature of the transition matrix, reported in Table 7, reveals minor difficulties for the analysis owing to extreme sparseness in the transition matrix. A related difficulty arises from how to treat specific categories. Reinhart and Rogoff’s (2003) data were originally classified into 14 categories, though they have combined classes to generate a “gross” coding with six category. 42 Of course, it would be preferable not to be forced to specify, a priori, the time of the split. 37 However, two of these categories seem fundamentally different from the others. Category 6 represents exchange rate regimes that are centered around the existence of parallel exchange rates implying that the market for foreign exchange is not unified. Because, in some sense, foreign exchange is necessarily rationed under parallel markets, it is not clear how to treat them. Similarly, Code 5 belongs to countries that are “freely falling”. Because it is likely that such regimes result from pathological combinations of macroeconomic policies, transitions into and out of freely falling may have little to do with exchange rate policy but with the reconciliation of fundamentally contradictory macroeconomic objective pursued in violation of Tinbergen’s Law. Whatever the reason, there are compelling reasons to ignore transitions into and out of these two regimes. Much less defensibly, we have also chosen to remove free floats from the sample because all three permissible transitions occur less than twice (two occur only once). Thus, we model the submatrix made up of the first three rows and columns of Table 7. The three categories that make up the matrix for analysis consist of countries with no separate legal tender, pegs and currency boards, horizontal bands of less than 2% and de facto pegs constitute what we label as Fix/Peg Regimes. Crawling pegs and crawling bands, particularly pre-announced policies to limit volatility within small (2%) bands describe the range of Crawling Pegs that we examine. Finally, Bands/Floats combine de facto bands of less than 5%, moving bands, and managed floats. In order to answer questions about the impact of the Smithsonian Agreements on the behavior of de facto exchange rate regimes, we begin a simple intervention in January of 1972 and allow it to run through the completion of the time period.43 This intervention will allow us to differentiate both the Markov matrix and the invariant distribution of this Markov chain in the pre- and post1972 periods. With these ideas in mind, we turn to the model estimates reported in Table 3. 43 In the conclusions, we mention work that we have recently begun to utilize MCMC structural changepoint techniques to isolate a probability distribution over the precise timing of the change. 38 Variable Coefficient Crawling Pegs Fix/Peg -7.840∗∗ Crawling Pegs 5.106∗∗ Bands/Floats -0.693† Fix/Peg (Post-1972) 1.766∗∗ Crawling Pegs (Post-1972) 1.078∗∗ Bands/Floats (Post-1972) 1.863∗∗ Bands/Floats Fix/Peg -6.577∗∗ Crawling Pegs -1.540∗ Bands/Floats 5.493∗∗ Fix/Peg (Post-1972) -0.245 Crawling Pegs (Post-1972) 1.931∗∗ Bands/Floats (Post-1972) 1.443∗∗ N Log-likelihood χ2(12) Significance levels : (Std. Err.) (0.277) (0.268) (0.369) (0.321) (0.340) (0.531) (0.148) (0.636) (0.214) (0.278) (0.691) (0.396) 77880 -1867.768 167384.315 † : 10% ∗ : 5% ∗∗ : 1% Table 3: Multinomial Markov Model Estimates: Fix/Peg is the omitted category. 39 Recalling that every regressor is simply a realization of the prior state and noting that the reference category is fixed/pegged rates, a fix in the prior period makes a crawling peg strongly less likely, though this effect is, to some degree mitigated in the post-1972 period. Crawling pegs in the immediate past make crawling pegs in the present more likely and this effect is augmented for the post-1972 period. Finally, bands/floats in the previous period have a weak negative effect on the comparison between crawling pegs and fixed/pegged regimes that becomes weakly positive after 1972. For the portion of the Table covering bands/floats, Fix/Pegs in the immediate past make present period bands/floats quite unlikely and this effect is unchanged after 1972. Crawling pegs in the prior period make fix/pegs more likely and bands/floats less likely before 1972, all other things equal, but have no effect after 1972. Finally, bands/floats in the immediate past make bands/floats in the present period much more likely and this effect is strengthened in the post-1972 period. To offer a simplistic depiction of the difference between the pre- and post-1972 periods, we can summarize the difference in the two Markov matrices. Subtracting the post-1972 matrix from the pre-1972 matrix, we find the entire first column to be negative. Moreover, the probability of a band/float conditional on a page fix/peg is negative; all the remaining differences are positive. This implies that fix/pegs are decreasing in likelihood (albeit small in magnitude) while more flexible arrangements are becoming more likely. The second area of interest is in the convergence to the invariant distribution. Figure 4 displays convergence toward the invariant distribution from a starting point in each state with probability one for the transition matrix prior to 1972. There is so little difference in the invariant distribution that we forego a presentation of both.44 In particular, the solid lines report the probability of a fix/peg; the dashed lines represent the probability of a crawling peg and the dotted lines represent the probability of bands/floats. The colors signify the state in which the process began (red - fix/peg, blue - crawling pegs, and green - bands/floats). As we see, the process takes a con44 This difference between statistical and substantive significance is not uncommon in very large samples. 40 siderable amount of time to converge, but no matter the starting state, we converge on the same invariant distribution as would be expected of an ergodic Markov chain. A final comment on convergence is in order that is reminiscent of MCMC estimation. A simple glance at Figure 7 reveals extraordinarily high levels of persistence and very few transitions. The more dependent the process, the longer it takes to arrive at the invariant distribution and this is clearly the case here. Comparing the invariant distribution plots between the human rights and exchange rate regimes examples shows that the human rights process is quite close to the limit in about 30 iterations. The exchange rate regimes example is still quite far from the invariant distribution even after 200 iterations. 7 Concluding Remarks This paper has highlighted theoretical notions of state dependence and methods for examining state dependence in the general framework of Markov transition models. Following a brief discussion of some classes of dependence, we have focused on econometric estimators for rendering state dependent processes estimable. Paying particular attention to the properties of ergodic Markov chains, we have highlighted the importance of the invariant distribution as an estimand of interest for assessing the long-run outcomes of dynamic discrete processes and showcased methods of measuring and testing central hypotheses arising from theories of dependent processes in an application of Markov models to the relationship between democracy and human rights abuses. Though these models have promise, considerable work remains. First, the centrality of stationarity in facilitating an examination of long-run outcome distributions merits deep scrutiny. One plausibly valuable application of work by Chib and others regarding changepoint processes is to utilize these techniques to examine classes of time-invariant Markov chains. Though we have shown an example of what appears to be two distinct (or at 41 Figure 4: Multinomial Estimates Convergence on the Invariant Distribution 0.0 0.2 0.4 NA 0.6 0.8 1.0 Invariant Distribution 0 200 400 600 NA Pre−1972 42 800 1000 least statistically differentiable) transition matrices before and after 1972 (the conclusion of the Smithsonian Agreements), considerable further work remains. Though this and neighboring time periods are plausible, there is value in a less ad hoc understanding of the precise of timing of moves between the two stationary transition matrices. Maybe more important, when covariates are introduced, it is the stationarity (in levels, not trends or otherwise) that is key to the existence of an invariant distribution. To mix terms, a stationary counterfactual given the model results in an ability to characterize the invariant distribution, while the introduction of covariates that are both statistically significant and are likely subject to temporal dynamics are likely to undercut the ability to estimate this theoretically interesting quantity. Second, there is the looming question of distinguishing heterogeneity from state dependence. Trivially, if there is some unobserved unit specific factor that makes residence in a particular state more likely, this heterogeneity is particularly difficult to uncover for highly persistent processes. At the same time, posing the initial question of whether or not history matters, this distinction assumes the fore. If we cannot differentiate heterogeneity from state dependence, we cannot answer questions about whether or not every historical path is idiosyncratic or whether there is something about history, independent of the idiosyncracies of the units, that is driving the evolution of qualitative time series. Of utmost importance, we hope to have established a reasonable set of boundaries and classes of candidate models for exploring the general claim that history matters. While estimation may often, as a practical matter, restrict the richness of history that can be captured in a statistical model, taking temporal dynamics seriously requires further steps toward treating dynamics as a process of interest rather than relegating them to the status of nuisance parameters. Indeed, we would argue it is not farfetched to suggest that dynamics are significantly underexplored and our hope is to expand the interest in studying time series of qualitative variables by making clear the simplicity with which they can be implemented. 43 P OLITY IV D EMOCRACY = 0 All Cutpoints Estimated .347 .519 .127 .007 .064 .634 .264 .035 .007 .259 .606 .117 .003 .063 .364 .504 0 .038 .076 .556 0 .004 .0104 .065 .330 y t −1 y t −1 y t −1 y t −1 y t −1 =1 =2 =3 =4 =5 .317 .069 .013 .002 .0003 .616 .622 .269 .060 .009 Pooled Cutpoints .062 .004 .0003 .281 .027 .002 .575 .134 .01 .441 .441 .056 .116 .581 .294 P OLITY IV D EMOCRACY = 10 All Cutpoints Estimated .819 .163 .017 .001 .146 .706 .132 .014 .013 .383 .529 .069 .003 .063 .364 .504 0 .038 .076 .556 0 .0015 .006 .065 .33 y t −1 y t −1 y t −1 y t −1 y t −1 =1 =2 =3 =4 =5 .819 .153 .022 .002 .0003 .173 .692 .383 .0474 .0075 Pooled Cutpoints .007 .00045 .00003 .143 .0112 .0007 .507 .0824 .0057 .391 .489 .071 .098 .559 .335 Table 4: Comparing Markov Matrices – Cutpoint Pooling References Amemiya, T. 1985. Advanced Econometrics. Cambridge, MA: Harvard University Press. Apodaca, Clair. 2001. “Global Economic Patterns and Personal Integrity Rights after the Cold War.” International Studies Quarterly 45:587–602. Bartoszyński, Robert and Magdalena Niewiadomska-Bugaj. 1996. Probability and Statistical Inference. New York, NY: John Wiley & Sons[Wiley Series in Probability and Statistics]. Beck, Nathaniel, David Epstein, Simon Jackman and Sharyn O’Halloran. N.d. “Alternative Models of Dynamics in Binary Time-Series-Cross-Section Models: The Example of State Failure.” Paper presented at the 2001 Annual Meeting of the Society for Political Methodology, Emory University (Draft: July 12, 2002). Berchtold, Andre and Adrian E. Raftery. 1999. The Mixture Transition Distribution Model for High-Order Markov Chains and Non-Gaussian Time Series. Technical Report 360 Department of Statistics, University of Washington. Berchtold, Andre and Adrian E. Raftery. 2002. “The Mixture Transition Distribution Model for High-Order Markov Chains and Non-Gaussian Time Series.” Statistical Science 17(3):328–56. Bhattacharya, Rabi N. and Edward C. Waymire. 1990. Stochastic Processes with Applications. New York, NY: John Wiley & Sons [Wiley Series in Probability and Mathematical Statistics]. 44 Boskin, M. J. and F. C. Nold. 1975. “A Markov Model of Turnover in Aid to Families with Dependent Children.” Journal of Human Resources 10:476–81. Boswell, T. and W. Dixon. 1990. “Dependency and Rebellion: A Cross-National Analysis.” American Sociological Review 55:549–559. Bueno de Mesquita, B., G. W. Downs, A. M. Smith and F. M. Cherif. 2005. “Thinking Inside the Box: A Closer Look at Democracy and Human Rights.” International Studies Quarterly 49:439–57. Bueno de Mesquita, Bruce, James D. Morrow, Randolph M. Siverson and Alastair M. Smith. 2003. The Logic of Political Survival. Cambridge, MA: MIT Press. Cingranelli, D. L. and D. L. Richards. 1999. “Measuring the Level, Pattern, and Sequence of Government Respect for Physical Integrity Rights.” International Studies Quarterly 43(2):407–417. Davenport, C. 1995. “Multidimensional Threat Perception and State Repression: An Inquiry Into Why States Apply Negative Sanctions.” American Journal of Political Science 39:683–713. Davenport, C. 1996a. “Constitutional Promises and Repressive Reality: A Cross-National Time Series Investigation of Why Political and Civil Liberties are Suppressed.” Journal of Politics 58:627–54. Davenport, C. 1996b. “The Weight of the Past: Exploring the Lagged Determinants of Political Repression.” Political Research Quarterly 49:377–405. Davenport, C. and D. A. Armstrong. 2004. “Democracy and the Violation of Human Rights: A Statistical Analysis from 1976-1996.” American Journal of Political Science 48(3):Forthcoming. Davenport, Christian. 2007. “State Repression and Political Order.” Annual Review of Political Science 10:1–23. Diggle, Peter J., Patrick Heagerty, Kung-Yee Liang and Scott L. Zeger. 2002. Analysis of Longitudinal Data. Second edition ed. Oxford, UK: Oxford University Press. Epstein, David, Sharyn O’Halloran, Robert Bates, Jack Goldstone and Ida Kristensen. 2006. “Democratic Transitions.” American Journal of Political Science 50(3):551–69. Fein, H. 1995. “More Murder in the Middle: Life Integrity Violations and Democracy in the World, 1987.” Human Rights Quarterly 17:170–91. Freedman, David A. 2006. “On the So-Called “Huber Sandwich Estimator” and “Robust Standard Errors”.” The American Statistician 60(4):299–302. Gartner, S. S. and P. M. Regan. 1996. “Threat and Repression: The Non-Linear Relationship between Government and Opposition Violence.” Journal of Peace Research 33(3):273–87. 45 Ghosh, Atish, Anne-Marie Gulde and Holger Wolf. 2003. Exchange Rate Regimes: Choices and Consequences. Cambridge, MA: MIT Press. Gleditsch, Nils Petter, Peter Wallensteen, Mikael Erikson, Margareta Sollenberg and Haavard Strand. 2002. “Armed Conflict 1946–2001: A New Dataset.” Journal of Peace Research 39(5):615– 37. Hafner-Burton, Emilie M. 2005a. “Right or Robust? The Sensitive Nature of Political Repression in an Era of Globalization.” Journal of Peace Research 42(6):679–98. Hafner-Burton, Emilie M. 2005b. “Trading Human Rights: How Preferential Trade Agreements Influence Government Repression.” International Organization 59(3):593–629. Hafner-Burton, Emilie M. and Kiyoteru Tsutsui. 2005. “Human Rights in a Globalizing World: The Paradox of Empty Promises.” American Journal of Sociology 110(5):1373–1411. Hathaway, Oona. 2002. 111:1935–2042. “Do Human Rights Treaties Make a Difference?” Yale Law Journal Hawkins, Darren G. 2003. “Universal Jurisdiction for Human Rights: From Legal Principle to Limited Reality.” Global Governance 9(3):347–65. Henderson, C. 1991. “Conditions Affecting the Use of Political Repression.” Journal of Conflict Resolution 35:120–42. Keith, Linda Camp. 1999. “The United Nations Covenant on Civil and Political Rights: Does it Make a Difference in Human Rights Behavior?” Journal of Peace Research 36(1):95–118. Keith, Linda Camp. 2002. “Constitutional Provisions for Individual Human Rights 1966-1977: Are They More than Mere Window Dressing.” Political Research Quarterly 55:111–43. Kelton, W. David and Christina M. L. Kelton. 1984. “Hypothesis Tests for Markov Process Models Estimated from Aggregate Frequency Data.” Journal of the American Statistical Association 79(388):922–28. Landman, Todd. 2005. Protecting Human Rights: A Comparative Study. Washington DC: Georgetown University Press. Leon, Larry F. and Chih-Ling Tsai. 1998. “Assessment of Model Adequacy for Markov Regression Time Series Models.” Biometrics 54(3):1165–75. Levy-Yeyati, Eduardo and Frederico Sturzenegger. 1999. “Classifying Exchange Rate Regimes: Deeds vs. Words.” Business School, Universidad Torcuato Di Tella, December. 46 Lindsey, J. K. 2004. Statistical Analysis of Stochastic Processes in Time. Cambridge Series in Statistical and Probabilistic Mathematics Cambridge University Press. Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge, UK: Cambridge University Press. Marshall, Monty G. and Keith Jaggers. 2002. Polity IV Project: Political Regime Characteristics and Transitions, 1800-2002. Technical report University of Maryland, College Park and Colorado State University. Martin, Andrew D., and Kevin M. Quinn. 2006. MCMCpack: Markov chain Monte Carlo (MCMC) Package. R package version 0.7-4. URL: http://mcmcpack.wustl.edu Masson, Paul R. 2001. “Exchange Rate Regime Transitions.” Journal of Development Economics 64(2):571–86. Masson, Paul R. and Francisco J. Ruge-Murcia. 2005. “Explaining the Transition between Exchange Rate Regimes.” Scandanavian Journal of Economics 107(2):261–78. McCormick, J. M. and N. J. Mitchell. 1997. “Human Rights, Umbrella Concepts, and Empirical Analysis.” World Politics 49(4):510–25. McKinlay, R. D. and A. S. Cohan. 1975. “A Comparative Analysis of the Political and Economic Performance of Military and Civilian Regimes.” Comparative Politics 7(3):1–30. McKinlay, R. D. and A. S. Cohan. 1976. “Performance and Stability in Military and Nonmilitary Regimes.” American Political Science Review 70:850–64. Meyer, W. 1996. “Human Rights and MNCs: Theory and Quantitative Evidence.” Human Rights Quarterly 18:368–97. Mitchell, N. J. and J. M. McCormick. 1988. “Economic and Political Explanations of Human Rights Violations.” World Politics 40:476–98. Neumayer, Eric. 2005. “Do International Human Rights Treaties Improve Respect for Human Rights?” Journal of Conflict Resolution 49(6):925–53. Page, Scott E. 2006. “Path Dependence.” Quarterly Journal of Political Science 1(1):87–115. Park, H. 1987. “Correlates of Human Rights: Global Tendencies.” Comparative Politics 9:405–13. Pierson, Paul. 2000. “Increasing Returns, Path Dependence, and the Study of Politics.” American Political Science Review 94(2):251–67. 47 Pierson, Paul. 2004. Politics in Time: History, Institutions, and Social Analysis. Princeton, NJ: Princeton University Press. Poe, S. C., S. C. Carey and T. C. Vazquez. 2001. “How are These Pictures Different? A Quantitative Comparison of the US State Department and Amnesty International Human Rights Reports, 1976–1995.” Human Rights Quarterly 23(3):650–77. Poe, Steven C. and C. Neal Tate. 1994. “Repression of Human Rights to Personal Integrity in the 1980s: A Global Analysis.” American Political Science Review 88(4):853–872. Poe, Steven C., C. Neal Tate and Linda C. Keith. 1999. “Repression of the Human Right to Personal Integrity Revisited: A Global Cross-National Study Covering the Years 1976.” International Studies Quarterly 43(2):291–313. Poe, Steven C., Wesley T. Milner and David A. Leblang. 1999. “Security Rights, Subsistence Rights, and Liberties: A Theoretical Survey of the Empirical Landscape.” Human Rights Quarterly 21(2):403–43. Pratt, John W. 1981. “Concavity of the Log Likelihood.” Journal of the American Statistical Association 76(373):103–106. Przeworski, A., M. E. Alvarez, J. A. Cheibub and F. Limongi. 2000. Democracy and Development: Political Institutions and Well-Being in the World, 1950–1980. New York, NY: Cambridge University Press. R Development Core Team. 2004. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-00-3. Raftery, Adrian E. 1985. “A Model for Higher-Order Markov Chains.” Journal of the Royal Statistical Society, Series B 47:528–39. Reinhart, Carmen and Kenneth Rogoff. 2003. “The Modern History of Exchange Rate Arrangements: A Reinterpretation.” Quarterly Journal of Economics 119(1):1–48. Richards, D. L., R. D. Gelleny and D. H. Sacko. 2001. “Money with a Mean Streak? Foreign Economic Penetration and Government Respect for Human Rights.” International Studies Quarterly 45:219–39. Shambaugh, Jay C. 2004. “The Effect of Fixed Exchange Rates on Monetary Policy.” Quarterly Journal of Economics 119(1):301–52. Simmons, B. A. and J. Hainmueller. N.d. “Can Domestic Institutions Explain Exchange Rate Regime Choice? The Political Economy of Monetary Institutions Reconsidered.” Working Paper, Weatherhead Center for International Affair, Harvard University, 19 May 2005. 48 Toikka, R. S. 1976. “A Markovian Model of Labor Market Decisions by Workers.” American Economic Review 66:821–34. von Hagen, Juergen and Jizhong Zhou. N.d. “Fear of Floating and Fear of Pegging: An Empirical Analysis of De Facto Exchange Rate Regimes in Developing Countries.” Working Paper, ZEI, Bonn, February 2004. Walker, Robert W. N.d. “Democracy and Human Rights Abuse: Implications from a First-Order Markov Model.” Working Paper, Department of Political Science, Washington University in Saint Louis. Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. Zanger, S. 2000. “A Global Analysis of the Effect of Political Regime Changes on Life Integrity Violations, 1977–93.” Journal of Peace Research 37(2):213–33. Zeger, Scott L. and Bahjat Qaqish. 1988. “Markov Regression Models for Time Series: A QuasiLikelihood Approach.” Biometrics 44(4):1019–31. 49 50 10.885*** 0.56 8.068*** 0.539 5.343*** 0.524 1.935*** 0.513 N 3223 chi2 3813.15 bic 5777.135 * p<0.05, ** p<0.01, *** p<0.001 cut4 Constant cut3 Constant cut2 Constant cut1 Constant Lag AI=5 Lag AI=4 Lag AI=3 Lag AI=2 log(Population) Chg. in Population log(GDP per capita) Growth in GDP per capita Int. Civil Wars International Wars Civil Wars Polity*Lag AI=5 Polity*Lag AI=4 Polity*Lag AI=3 Polity*Lag AI=2 Polity 4 Democracy 712 122.206 811.061 5.143*** 1.511 2.043 1.154 -0.557 1.138 0.581 0.682 0.636 0.647 0.032 0.018 -0.403*** 0.081 -0.204* 0.083 0.228** 0.076 Lag=1 -0.226*** 0.037 1071 127.446 1993.763 8.959*** 1.07 6.581*** 0.918 4.155*** 0.895 0.5 0.885 2.148 1.688 0.056 0.861 -0.051 0.46 -0.007 0.013 -0.342*** 0.05 0.089 0.054 0.346*** 0.053 Lag=2 -0.051* 0.021 848 104.622 1724.331 11.415*** 1.047 8.757*** 1 5.711*** 0.964 1.791 1.021 1.687*** 0.416 1.086 0.657 1.386** 0.45 -0.013 0.011 0.052 0.063 0.176** 0.063 0.383*** 0.049 Lag=3 -0.067** 0.022 441 76.963 974.423 8.09*** 1.415 5.048*** 1.381 2.615 1.37 -0.561 1.677 1.899*** 0.279 1.495 0.79 1.036* 0.452 -0.009 0.02 0.35*** 0.1 -0.089 0.055 0.196** 0.067 Lag=4 -0.032 0.029 151 21.382 307.506 1.118 2.565 -1.734 2.596 -3.035 2.635 0.924* 0.36 -0.03 0.867 -0.19 0.614 -0.036* 0.018 0.42* 0.176 0.198 0.149 -0.134 0.129 Lag=5 0.005 0.055 Table 5: Markov Models: Variants of Ordered Logit Pooled -0.228*** 0.027 0.139*** 0.029 0.174*** 0.032 0.253*** 0.036 0.247*** 0.055 1.377*** 0.185 0.603 0.325 0.579** 0.219 -0.011 0.007 -0.161*** 0.031 0.011 0.03 0.245*** 0.027 1.839*** 0.236 3.578*** 0.245 5.352*** 0.277 7.307*** 0.348 4.692 0.549 2.159 0.516 1.409*** 0.189 0.612* 0.326 0.576** 0.22 -0.011 0.007 -0.162*** 0.031 0.015 0.03 0.249*** 0.027 PP-Lag=1 -0.212*** 0.027 8.359 0.754 5.984 0.517 3.598 0.488 0.08 0.491 0.12*** 0.029 PP-Lag=2 7.314 0.595 4.682 0.518 1.745 0.503 -2.13 0.623 0.154*** 0.032 PP-Lag=3 5.654 0.554 2.713 0.53 0.306 0.56 0.235*** 0.036 PP-Lag=4 3.632 0.561 0.873 0.629 -0.31 0.784 0.229*** 0.055 PP-Lag=5 Table 6: The optim function llik.ologit.verify <- function(par, X1, X2, Xchg, y, ylag) { beta <- par Y <- as.matrix(y) X1 <- as.matrix(X1) X2 <- as.matrix(X2) ylag <- as.vector(ylag) democ.var <- Xchg beta.fix1 <- beta[1:6] # No civil wars for lag=1 beta.fix2 <- beta[1:7] # Including civil wars beta.d1 <- beta[[8]] # Construct (gamma + beta_democracy_lag) beta.d2 <- beta[[8]]+beta[[9]] beta.d3 <- beta[[8]]+beta[[10]] beta.d4 <- beta[[8]]+beta[[11]] beta.d5 <- beta[[8]]+beta[[12]] score.base1 <- as.vector(X1%*%beta.fix1) score.base2 <- as.vector(X2%*%beta.fix2) democ1 <- as.vector(democ.var*beta.d1) democ2 <- as.vector(democ.var*beta.d2) democ3 <- as.vector(democ.var*beta.d3) democ4 <- as.vector(democ.var*beta.d4) democ5 <- as.vector(democ.var*beta.d5) score1 <- as.vector(score.base1 + democ1) score2 <- as.vector(score.base2 + democ2) score3 <- as.vector(score.base2 + democ3) score4 <- as.vector(score.base2 + democ4) score5 <- as.vector(score.base2 + democ5) p11 <- plogis((beta[13] - score1), log.p=TRUE) p12 <- log(plogis(beta[14] - score1) - plogis(beta[13] - score1)) p13 <- log(plogis(beta[15] - score1) - plogis(beta[14] - score1)) p14 <- log(1 - plogis(beta[15] - score1)) p21 <- plogis((beta[16] - score2), log.p=TRUE) p22 <- log(plogis(beta[17] - score2) - plogis(beta[16] - score2)) p23 <- log(plogis(beta[18] - score2) - plogis(beta[17] - score2)) p24 <- log(plogis(beta[19] - score2) - plogis(beta[18] - score2)) p25 <- log(1 - plogis(beta[19] - score2)) p31 <- plogis((beta[20] - score3), log.p=TRUE) p32 <- log(plogis(beta[21] - score3) - plogis(beta[20] - score3)) p33 <- log(plogis(beta[22] - score3) - plogis(beta[21] - score3)) p34 <- log(plogis(beta[23] - score3) - plogis(beta[22] - score3)) p35 <- log(1 - plogis(beta[23] - score3)) p41 <- plogis((beta[24] - score4), log.p=TRUE) p42 <- log(plogis(beta[25] - score4) - plogis(beta[24] - score4)) p43 <- log(plogis(beta[26] - score4) - plogis(beta[25] - score4)) p44 <- log(plogis(beta[27] - score4) - plogis(beta[26] - score4)) p45 <- log(1 - plogis(beta[27] - score4)) p52 <- plogis((beta[28] - score5), log.p=TRUE) p53 <-log(plogis(beta[29] - score5) - plogis(beta[28] - score5)) p54 <- log(plogis(beta[30] - score5) - plogis(beta[29] - score5)) p55 <- log(1 - plogis(beta[30] - score5)) phi <- (Y==1 & ylag==1)|*p11 + (Y==2 & ylag==1)*p12 + (Y==3 & ylag==1)*p13 (Y==1 & ylag==2)*p21 + (Y==2 & ylag==2)*p22 + (Y==3 & ylag==2)*p23 + (Y==4 (Y==1 & ylag==3)*p31 + (Y==2 & ylag==3)*p32 + (Y==3 & ylag==3)*p33 + (Y==4 (Y==1 & ylag==4)*p41 + (Y==2 & ylag==4)*p42 + (Y==3 & ylag==4)*p43 + (Y==4 (Y==2 & ylag==5)*p52 + (Y==3 & ylag==5)*p53 + (Y==4 & ylag==5)*p54 + (Y==5 return(-sum(phi)) } 51 + & & & & (Y==4 & ylag==1)*p14 ylag==2)*p24 + (Y==5 ylag==3)*p34 + (Y==5 ylag==4)*p44 + (Y==5 ylag==5)*p55 + & ylag==2)*p25 + & ylag==3)*p35 + & ylag==4)*p45 + | Reinhart-Rogoff Coarse Codes Lag code | 1 2 3 4 5 6 | Total -----------+------------------------------------------------------------------+---------1 | 49,552 51 64 1 26 16 | 49,710 | 99.68 0.10 0.13 0.00 0.05 0.03 | 100.00 -----------+------------------------------------------------------------------+---------2 | 37 13,466 37 4 15 2 | 13,561 | 0.27 99.30 0.27 0.03 0.11 0.01 | 100.00 -----------+------------------------------------------------------------------+---------3 | 31 40 14,602 3 62 3 | 14,741 | 0.21 0.27 99.06 0.02 0.42 0.02 | 100.00 -----------+------------------------------------------------------------------+---------4 | 2 1 1 1,853 7 0 | 1,864 | 0.11 0.05 0.05 99.41 0.38 0.00 | 100.00 -----------+------------------------------------------------------------------+---------5 | 23 33 60 11 5,777 10 | 5,914 | 0.39 0.56 1.01 0.19 97.68 0.17 | 100.00 -----------+------------------------------------------------------------------+---------6 | 20 3 6 1 8 2,692 | 2,730 | 0.73 0.11 0.22 0.04 0.29 98.61 | 100.00 -----------+------------------------------------------------------------------+---------Total | 49,665 13,594 14,770 1,873 5,895 2,723 | 88,520 | 56.11 15.36 16.69 2.12 6.66 3.08 | 100.00 Table 7: First-Order Markov Matrix: Monthly Reinhart-Rogoff Measures 52