Department of Political Science Campus Box 1063 Washington University
by user
Comments
Transcript
Department of Political Science Campus Box 1063 Washington University
Operationalizing and Testing Spatial Theories of Voting Kevin M. Quinn Andrew D. Martin Department of Political Science Department of Political Science Campus Box 1063 Campus Box 1063 Washington University Washington University St. Louis, MO 63130 St. Louis, MO 63130 Email: [email protected] Email: [email protected] Department of Political Science Paper # 346 Preliminary Draft { Comments Are Welcome April 15, 1998 Abstract Spatial models of voting behavior provide the foundation for a substantial number of theoretical results. Nonetheless, empirical work involving the spatial model faces a number of potential diculties. First, measures of the latent voter and candidate issue positions must be obtained. Second, evaluating the t of competing statistical models of voter choice is often more complicated than previously realized. In this paper, we discuss precisely these issues. We argue that conrmatory factor analysis applied to mass-level issue preference questions is an attractive means of measuring voter ideal points. We also show how party issue positions can be recovered using a variation of this strategy. We go on to discuss the problems of assessing the t of competing statistical models (multinomial logit vs. multinomial probit) and competing explanations (those based on spatial theory vs. those derived from other theories of voting such as sociological theories). We demonstrate how the Bayesian perspective not only provides computational advantages in the case of tting the multinomial probit model, but also how it facilitates both types of comparison mentioned above. Results from the Netherlands and Denmark suggest that even when the computational cost of multinomial probit is disregarded, the decision whether to use multinomial probit (MNP) or multinomial logit (MNL) is not clear-cut. Paper presented at the 1998 Midwest Political Science Association Annual Meeting, April 1998, Chicago, Illinois. This work is based on research supported under National Science Foundation Grants SBR 96-17708 and 97-30275. The authors thank the Center in Political Economy at Washington University for additional nancial support. The authors also thank Sid Chib, Ed Greenberg, Norman Schoeld, and Itai Sened for helpful comments and discussions. Andrew D. Martin will be joining the political science faculty at SUNY-Stony Brook in the fall of 1998. After June 1, his address will be the Department of Political Science, State University of New York at Stony Brook, Stony Brook, NY, 11794-4392, Email: [email protected]. Please address all correspondence to Kevin M. Quinn. 1 1 Introduction The spatial model of voting holds a place of primacy in the formal theoretic literature. Its basic assumption that a voter chooses the party / candidate who puts forth the policy proposal most preferred by the voter forms the basis of a great deal of theoretical work.1 Theories of coalition formation (Schoeld et al., n.d.; Schoeld and Sened, 1998), and electoral strategy (Downs, 1957; Shepsle, 1991), among others all rely to some extent on the assumption of voters who act in accordance with the spatial model. While theoretical work which makes use of the spatial model has progressed rapidly, empirical work has not progressed to the same degree. We see this as the result of two distinct problems which confront researchers working in this area. The rst problem revolves around the need to obtain measures of latent voter and candidate issue positions. The second problem has to do with the means by which competing models and explanations are compared.2 The question of how to locate voters and candidates in a common space has been the subject of much debate and has been answered with varying degrees of success using a wide range of techniques (Cahoon, 1975; Cahoon et al., 1975; Enelow and Hinich, 1984; Rabinowitz and MacDonald, 1989; Iversen, 1994; Quinn et al., 1996; Jackman, 1997; Dow, 1997). The adequacy of any of these strategies depends greatly on the task at hand. In what follows, we argue that the use of conrmatory factor analysis on mass-level issue preference questions oers several advantages over competing measurement methods in the context of operationalizing and testing random utility models of voter choice. On the other hand, the problem of testing competing models and explanations has largely been ignored by political scientists.3 While the use of multinomial choice models has become commonplace in political science (Alvarez and Nagler, 1995; Whitten and Palmer, 1996; Quinn et al., 1996; Lawrence, 1997; Schoeld et al., n.d.; Alvarez and Nagler, 1998), these authors either assert the superiority of the multinomial probit 1 Throughout the paper we use the terms party and candidate interchangeably. This is entirely appropriate in both the Netherlands and Denmark, since both have fairly pure forms of proportional representation. In other contexts, such as in the United States, this usage would be clearly inappropriate. 2 Throughout this paper we use model to refer to a statistical model which makes specic distributional assumptions. In this usage, multinomial probit and multinomial logit are two distinct models which may or may not have the same covariates. We use the term explanation to refer to a particular specication of a model (i.e., the choice of covariates included in the model). 3 Researchers in other elds are beginning to see the importance of this. For instance, Chib et al. (1998) demonstrate that in the context of inter-city travel mode choice MNL outperforms MNP. 2 model (MNP) over the multinomial logit model (MNL),4 or assert that given the oftentimes substantial cost of tting the MNP model, the MNL model is close enough. Both assertions are tenuous, as ultimately, the adequacy of either model is an empirical question. To compare the performance of both the MNP and MNL models, and to assess competing explanations of voting behavior, we adopt a Bayesian framework. We argue that the Bayesian paradigm oers several advantages in the context of tting polychotomous choice models. Not only does this approach oer computational advantages when tting the MNP model, but it also allows for comparison of non-nested models in an extremely straightforward and computationally practical manner. While such comparisons may seem to be of little substantive import, the consequences of not assessing the comparative t of several models and explanations should not be understated. For, as we demonstrate below using real data from the Netherlands and Denmark, the choice of model can noticeably aect one's inferences. Similarly, neglecting to evaluate the t of major competing explanations can also produce misleading inferences. This paper proceeds as follows. In the following section, we discuss various strategies designed to obtain measures of voter and candidate ideal points and elaborate a promising measurement strategy. Actual measures of voter and party positions from the Netherlands and Denmark are also presented. In Section 3, we turn our attention to the MNP and MNL models, and discuss Bayesian estimation strategies for each. We also discuss the computational aspects of calculating the marginal likelihood for purposes of model comparison. Section 4, summarizes our data and provides a discussion of the voter choice models we operationalize. Section 5 contains results from the MNP and MNL models of Dutch and Danish voting behavior, and the Bayes factors of competing models and explanations. The nal section concludes. 2 Measurement and Spatial Voting Models At the heart of spatial theories of voting is the idea that a voter's preferences can be represented by a point in a reasonably low-dimensional space. Similarly, party policy declarations can be represented by points in this same space. Unfortunately, neither these issue locations, nor relevant information about the issue space itself (number of dimensions, substantive content of the dimensions, etc.) are directly observable. What is 4 Researchers sometimes distinguish between multinomial logit in which covariates are individual-specic and conditional logit where some covariates are choice specic. For simplicity we refer to both models under the heading of multinomial logit. 3 needed is a means by which one can (a) obtain measures of the latent issue locations, and (b) test hypotheses regarding the underlying structure of the issue space. Before demonstrating exactly how we operationalize the spatial model, we briey critique other methods which have been used to construct spatial issue locations of voters and parties/candidates. 2.1 Previous Measurement Strategies The three most commonly-used techniques to estimate the latent issue positions of voters and candidates are those methods which rely on (a) thermometer scores (Cahoon, 1975; Cahoon et al., 1975; Enelow and Hinich, 1984; Dow, 1997), (b) voter perceptions of their own and candidates issue positions (Rabinowitz and MacDonald, 1989; Alvarez and Nagler, 1998), and (c) the actual issue preferences of voters and candidates (Iversen, 1994; Quinn et al., 1996; Jackman, 1997; Schoeld et al., n.d.). Ultimately, the usefulness of any of these techniques depends on the problem at hand. Since our interest is in explaining and testing various models of voter choice, we evaluate each of these methods in light of this goal. The minimal requirements needed to meet this goal are that the voter positions and party positions are estimated in the same space and that the measurement strategy employed does not make assumptions regarding voter preferences that we wish to later test. The rst procedure (Cahoon, 1975; Cahoon et al., 1975) assumes that a voter's thermometer score (a measure of how an individuals feels about each candidate on a cardinal scale) of a candidates represents the distance between the voter and the candidate plus a disturbance term. After some algebraic manipulation, the thermometer scores are subjected to factor analysis { the end result being that voters and candidates can be placed in the same issue space. The problem with this technique for the present purposes is that it begs the question of why some voters view some candidates more favorably. Is it really because of a congruence of policy declarations and issue preferences, or is it really due to sociological or psychological factors? Clearly, using such a measurement strategy to test the spatial model vs. other explanations of voter choice greatly stacks the deck in favor of the spatial model. A second general method of locating voter and candidate positions relies on the subjective assessments of the voters themselves as to their own issue preferences and the issue locations of the candidates. One version 4 of this method assumes that the ordinal responses of the voters are measured on a common scale and then proceeds to measure spatial distance as the dierence between a voter's declared position on an issue and either (a) her perceptions of the candidates, or (b) some type of summary (usually the means or medians) of the population perceptions of the candidates (Rabinowitz and MacDonald, 1989; Alvarez and Nagler, 1998). Oftentimes, the dimensionality of the issue space is collapsed by creating additive indices of these distances. The problems with this type of approach are multiple. First, the assumption that the ordinal voter responses are actually measured on common scale is empirically questionable. Second, the fact that voters oftentimes equate their own policy preferences with those of their preferred party once again greatly stacks the deck in favor of the spatial model when conducting tests vis-a-vis other types of explanations. Third, there is no reason to believe that aggregating voter perceptions of party locations will produce accurate estimates of the actual party positions. We expect this to be especially the case for small parties in multi-party systems where perceptual errors will unlikely cancel out. Finally, there is no way to assess the goodness-of-t of such a measurement strategy. A third type measurement strategy is that which relies upon matching mass and elite survey data (Iversen, 1994; Quinn et al., 1996; Jackman, 1997; Schoeld et al., n.d.). For example, Iversen (1994) performs principle components factor analysis on equivalently worded issue questions posed to European voters and European party elites. The mass and elite responses are then scored. The problem with the Iversen approach is that since the mass and elite responses are factor analyzed separately, the factor scores are projecting these two groups into non-equivalent spaces. As such, the distance metric needed to operationalize the spatial model of voting is not dened. A separate problem is that the type of factor analysis used by Iversen imposes no theoretical structure on the factor model. As such, such an analysis is best seen as an exploratory exercise as opposed to a measurement exercise. More sophisticated analyses of matching mass and elite survey data are those of Quinn et al. (1996), Jackman (1997), and Schoeld et al. (n.d.). These authors use conrmatory factor analysis on mass-level issue preference questions to construct theoretically meaningful factors and to project voters into a common issue space. They then use the mass-level scoring coecients to score the elite responses and to place the 5 elites into the same space as the voters. The benets of this approach are that (a) the factors can be constructed theoretically and subjected to various tests of goodness-of-t, (b) independent measures of voter and candidate issue preferences are used as input, and (c) voters and candidates are placed in a common space indicative of voter perceptions. For these reasons, this is the method adopted in this paper.5 2.2 Locating Voters and Parties in a Common Space The motivation for the measurement procedure which is employed in this paper is the following. First, assume that voter political preferences can be represented by points in some relatively low-dimensional space. Even though these points are not directly observable, we do have measures of voter preferences on specic issues such as abortion availability, governmental control of industry, the need to reduce economic inequality, and the like. Presumably, these specic responses were generated as a result of the voters' more general political preferences (the unobservable spatial locations), and a random error. Consequently, one can use factor analysis to recover estimates of the latent issue positions which presumably generated the observed issue responses. Further, since we hypothesize that voters evaluate parties/candidates in terms of their general political preferences, we use the same scoring coecients which we used to project voter issue responses into the more general issue space to project candidate responses into this same issue space. Employing conrmatory factor analysis to construct voter and party issue positions has several advantages. First, we are able to specify the structure of the factor model a priori on theoretical grounds. Second, by putting constraints on the factor loadings, we are able to allow voter issue locations to be correlated and to have unequal variances. As will become more apparent below, this additional exibility provides a richer picture of Danish voter preferences. Similar ndings have been found in Australia (Jackman, 1997), and Israel (Ofek et al., 1998). Finally, conrmatory factor analysis allows one to assess goodness-of-t of competing factor models. 5 We have not discussed techniques designed solely to measure party/candidate issue locations such as the use of expert opinions (DeSwaan, 1973; Taylor and Laver, 1973), and the content analysis of manifestos (Budge et al., 1987). Since these methods say nothing about the spatial location of voters, they cannot be directly employed to operationalize spatial models of voter choice. However, in situations where only reliable mass-level surveys are available, these techniques can be used to generate hypothetical issue responses of the party leaders which can then be scored using the scoring coecients from the mass-level conrmatory factor analysis. For such a measurement strategy see Ofek et al. (1998). 6 2.2.1 Data and Measurement Results Before presenting the results of this measurement strategy we briey discuss our data sources. The data used in this paper comes from two sources { one source which records mass opinion and one which records elite opinion. To study mass opinion, we use the Euro-Barometer 11 data set (Rabier and Inglehardt, 1981).6 This survey was conducted in April 1979 with support from the European Community. Although this survey was administered in nine nations, we only use the randomly-selected respondents from the Netherlands and Denmark in this paper. To study elite opinion, we use the European Political Parties' Middle-Level Elites (EPPMLE) data set (Institut fur Sozialwissenschaften and Europa-Institut of the Universitat Mannheim, 1983).7 The EPPMLE study consists of a survey of delegates to the European party conferences in 1979. The EPPMLE study includes survey responses from the four major Dutch parties and the 8 major Danish parties. Because of our relatively small sample size, we have chosen to focus only on the 5 Danish parties represented in our sampel that received more than ve percent of the sample voteshare. In addition we use single imputation to ll in missing data in the Danish case. This increases our sample size from 440 (using listwise deletion) to 640.8 Thus, we have two surveys { one of Dutch and Danish political elites and one of Dutch and Danish citizens { with nearly identical issue questions, conducted at nearly the same point in time. See Table A1 in Appendix A for the issue questions administered in both surveys. The measurement strategy employed here is the following. In each case an initial exploratory factor analysis was conducted to get a sense as to how many factors to retain. In both cases, it appears that two or three factors are present.9 Given our prior beliefs and small number of issue questions, we elected to estimate two factor CFA models in each country. In the Dutch case, all observed variables had factor loadings greater than 0.25 on the rst factor. Responses to questions regarding control of multinational corporations, income inequality, penalties for terrorists, and abortion availability had factor loadings greater than 0.25. For this reason, the factor loadings of the remaining observed variables on the second factor were constrained to 0. 6 The Euro-Barometer data sets are distributed in the United States by the Inter-University Consortium for Political and Social Research (ICPSR Study 7752). 7 The EPPMLE data set was initially published through the Institut f ur Sozialwissenschaftern and Europa-Institut of the Universitat Mannheim, and is available through the Koln Zentralarchiv. The EPPMLE is a proprietary data set. 8 Details of the imputation procedure can be found in Appendix C 9 In the Dutch case principle components factor analysis revealed 3 factors with eigenvalues greater than 1. The eigenvalue of the third factor was approximately equal to 1. The Danish data produced similar ndings. 7 The t of this model was judged to be sucient. The Danish case was somewhat more complicated. A similar procedure was used to specify the initial CFA model. The t of this model was judged to be excessively poor. After tting a number of other models we found that a reasonable t could be obtained by restricting the loadings of the terrorism, nuclear energy, and multinational corporation responses to be equal to zero on the rst factor, and the loadings of the remaining observed variables to be equal to 0 on the second factor. In addition we allowed for the factors to be correlated. The factor loadings derived from the CFA procedures are presented in Tables 1 and 2.10 [Insert Tables 1 and 2 about here.] We call the rst factor in the Dutch case a general economic (left-right) factor; we identify the second as a measure of preferences over the scope of government. In the Danish case, interpretation of the factors is slightly more dicult. In part, this is owes to the high (0.899) correlation among the two factors which makes interpreting either factor in a vacuum somewhat misleading. In a sense, this result is telling us that the Danish issue space is nearly one-dimensional. For this reason, we prefer to interpret the two factors together as co-determinants of a general economic (left-right) dimension. From the CFA factor results we obtain the factor scoring coecients which we use to place the voters in the issue space implied by the CFA results. We present the regression scoring coecients for each country in Tables 3 and 4. [Insert Tables 3 and 4 about here.] In addition, we use these same scoring coecients in conjunction with the elite responses to the equivalently worded issue questions to place each member of the party conference in the same issue space as the voters. We then use the median position of each delegation on each dimension as an estimate of the policy position of each party. Tables 5 and 6 contain the locations obtained using this procedure for the Dutch and Danish cases. [Insert Tables 5 and 6 about here.] 10 More detail of the methods employed here are given in Appendix C. 8 Since parties and voters are now projected into the same space, distance measures are easily dened. Figures 1 and 2 present density estimates of Dutch and Danish voter ideal points with the relevant Dutch and Danish party positions superimposed. [Insert Figures 1 and 2 about here.] From Figures 1 and 2 we see that our results t common understandings of Dutch and Danish politics. For example, the left-right orientations of the parties are consistent with nearly all past work. Furthermore, the Dutch party positions are similar to the two-dimensional spatial maps cited by (Daalder, 1987, pp. 212, 233). As we expect, the CDA seems to force the relative party positions away from a uni-dimensional ordering. Given the CDA's strong views on abortion, this result is expected (for example, see Andeweg and Irwin, 1993). Similarly, the Danish spatial map is consistent with the one-dimensional orderings presented in Laver and Schoeld (1990). 3 Fitting and Evaluating Multinomial Data Models11 Most political scientists now recognize that individual-level voting data drawn from a multi-party democracy are multinomial data, and as such, multinomial response models such as the multinomial probit (MNP) and multinomial logit (MNL) models are better suited to the modeling of this data than are the more familiar binomial probit and logit models. What many researchers have failed to recognize is that the choice between dierent types of multinomial response model is not always a simple matter. While it would seem that the greater exibility of the MNP model would make it preferable to the more restrictive MNL model, we demonstrate below that once model complexity is accounted for, the MNP model is not always the clear winner. A related concern stems from the our desire to assess the relative explanatory power of the spatial model vis-a-vis competing explanations of voter choice. In so doing, we can directly assess the explanatory power of a rational choice explanation of political behavior versus other explanations (see Green and Shapiro, 1994). As we discuss below, Bayesian inference provides a consistent and computationally practical means to achieve each of these aims. 11 This section draws heavily from Chib et al. (1998) 9 The purpose of this section is to demonstrate how two major polychotomous choice models can be t from a Bayesian perspective, and to demonstrate one of the main attractions of pursuing such a Bayesian model-tting strategy { the relative ease with which one can assess the relative t of non-nested models with dierent functional forms. This is something which is extremely dicult to do from a classical (frequentist) perspective. On the other hand, the Bayesian paradigm provides a relatively simple means to compare the t of any two models t to the same data. For a broader discussion of the properties of the MNP and MNL models we refer the reader to any of the several good, general discussions of these models (Alvarez and Nagler, 1995; Lawrence, 1997; Alvarez and Nagler, 1998) aimed at a political science audience. These papers also discuss various classical estimation strategies, as do the works of Maddala (1983) and Greene (1997). 3.1 Random Utility Motivation12 Given data from n individuals (voters) choosing between p alternatives (parties) both the MNP model and the MNL model can be motivated by the following random utility model: zi = Vi + Wi + ui yij = 1 if zij = max(zi ) 0 otherwise (1) for i = 1; :::; n and j = 1; :::; p where zi is a p 1 vector, with zij dened as the utility voter i attaches to voting for party j ; Vi is a p l matrix of choice-specic covariates; Wi is a p m matrix of individual-specic covariates;13 and are vectors of choice-specic and individual-specic coecients, respectively; ui is a p 1 vector of disturbances; and yi is a p 1 vector representing the observed vote choice of individual i. The probability that alternative j is chosen by individual i is simply the probability that zij is equal to max(zi ). This is the basic random utility model which can be used to motivate both MNP and MNL models of voter choice. The dierence between these models stems from decisions as to how the disturbance terms are assumed to be distributed. 12 Throughout this paper we use the following notation. Lower case, non-bold Roman letters indicate scalars; lower case, bold Roman letters indicate vectors; and upper case, bold Roman letters indicate matrices. All vectors are assumed to be column vectors unless otherwise noted. 13 Throughout this section we assume that W is formed as W I where W is the original n m matrix of m individualp specic attributes from all n individuals. 10 3.2 Bayesian Estimation of the MNP Model The multinomial probit model results from the assumption that the errors in Equation 1 are distributed multivariate normal with mean vector 0 and variance-covariance matrix . In order to identify the MNP model, Equation 1 has to be slightly reformulated. First, note that one identication problem arises from the fact that an arbitrary constant can be added to both sides of Equation 1 without changing the distribution of yi . In order to remedy this identication problem, it is customary to express each zij relative to zip. Dene ,1 ) where zij = zij , zip . The underlying regression model is now zi = Xi + "i ; where zi = (zi1 ; :::; zi;p "i Np,1 (0; ) and Xi is the new matrix of covariates obtained by horizontally concatenating Vi = Vi , v0ip to Wi and then deleting the pth row of X and each column of individual-specic attributes for the pth choice category.14 Stacking the zi s, the random utility model now becomes: 2 64 z1 3 2 X1 3 2 75 + 64 .. 75 = 64 .. . . zn n(p,1)1 k1 Xn n(p,1)k 3 .. 75 . "n "1 n(p,1)1 A second identication problem inherent in the MNP model stems from the fact that multiplying zi by a positive constant will not change the value of yi . This problem is traditionally solved by restricting 11 to be equal to 1. For notational purposes, we refer to the restricted matrix as . The probability that individual i chooses party j is: Pr(yij = 1j; ) = Z Aj p,1 (zi jXi ; )dzi (2) where p,1 represents the p , 1-variate normal probability density function, and Aj = fzi : zij > 0; zij > zi; ,j g for all j p , 1 fzi : zi1 < 0; zi2 < 0; :::; zi;p,1 < 0g for j = p: The sampling density is then given by: f (yj; ) = p n Y Y Pr(yij = 1j; )yij i=1 j =1 1 1 14 It should be noted that if = I ,then = I p p,1 + 110 . This can be normalized to = 2 Ip,1 + 2 110 : This follows directly from the rules for calculating the variance and covariance of sums and dierences of random variables (see DeGroot, 1986, p. 216). Not only will not be an identity matrix when the undierenced disturbances are i.i.d., but since the mapping from to is many to one, it is possible to say very little about from knowledge of unless additional assumptions are made. 11 The posterior density of and is given by Bayes theorem as: (; jy) / f (yj; )( )( ) where, ( ) and ( ) denote the prior densities of and respectively. While, in theory, Bayesian estimation of the MNP model can proceed by using a suitable method (such as importance sampling or the Metropolis-Hastings algorithm) to draw samples of the model parameters directly from the posterior density, this approach suers most of the well-known numerical problems associated with tting the MNP model { the relatively high computational cost and/or low numerical accuracy of evaluating the integrals in Equation 2. The key to avoiding many of these problems is the concept of data augmentation (Tanner and Wong, 1987; Albert and Chib, 1993). Data augmentation is a very general method which can be used to deal with missing data (see Schafer, 1997). Before detailing how the data augmentation algorithm is employed in the MNP model, note that if the latent vector of utilities (z ) was observed, the MNP model would reduce to a seemingly unrelated regression (SUR) model. Since such a model is a linear model with normal disturbances, it is not dicult to t from either a Bayesian or classical perspective. The central idea behind data augmentation is that even though the actual value of z is unobserved, we do know how it is distributed conditional on the data and other model parameters. By including draws of these unobserved values of zi inside what would otherwise be an MCMC sampling scheme for a slightly reformulated SUR model, we are able to t the MNP model at a minimal computational cost. The actual MCMC sampling algorithm employed in this paper is based on work by Chib et al. (1998), and to a lesser extent Albert and Chib (1993) and Chib and Greenberg (n.d.). The algorithm employed here is the following: 1. 2. g := 1 from (zij jyi ; zi;,j ; (g,1) ; g, ) draw zij set ( 3. draw (g) 4. draw g 5. store ( ) (g) 1) ( jy; z ; g, ) from (i jy; z ; (g) ) ( from and 1) g . ( ) 12 for j = 1; 2; :::p , 1 and i = 1; 2; :::n: 6. 7. set if g := g + 1 g G goto step 2. The distribution of zij jyi ; zi;,j ; (g,1) ; g, is a univariate truncated normal distribution whose mean and ( 1) variance follow from standard normal theory (see McCulloch and Rossi, 1994, for details within the context of the MNP model). For problems with a small number of choices, it may be more computationally ecient to sample zi directly using the accept-reject method. We employ a conjugate, normal prior on , ( ) = k (0 ; B0 ). From this it follows that jy; z ; N (1 ; B1 ) P P where B1 = (B,0 1 + i X0i , Xi ),1 and 1 = B1 (B,0 1 0 + i X0i , zi ). We use the log-Choleski 1 1 parameterization of employed by Chib et al. (1998). To summarize, we can use the Choleski decomposition to factor any symmetric, positive denite, (p , 1) (p , 1) matrix A as A = LL0 where L is a (p , 1) (p , 1) lower triangular matrix with typical element lrc. If a11 = 1, then l11 = 1. Using a log transformation to restrict the diagonal elements of L to be positive, we have the following parameterization: = (l21 ; log(l22 ); l31 ; :::; log(lp,1;p,1 ))0 0 ((1 ; 2 ; p ) (3) where p = (p + 1)(p , 2)=2: As Chib et al. (1998) note: The mapping between [ ] and is one-to-one. This parameterization of [ ] leaves the p vector entirely unrestricted. Any 2 R leads to a matrix [ ] that is symmetric, positive denite, and has 11 = 1 (1998, emphasis in original). Since the conditional distribution of ( ) is not available in closed form, we employ a Metropolis-Hastings step (Chib and Greenberg, n.d.) to sample from this distribution. Once again, conditioning on the latent vector z makes this step relatively easy. For more specic details of the algorithm employed, we urge the reader to see Chib et al. (1998). 3.3 Bayesian Estimation of the MNL Model The MNL model results from the assumption that the disturbances in Equation 1 are independently and identically distributed according to the Weibull distribution (McFadden, 1989; Greene, 1997). As McFadden 13 has shown, the choice probabilities then take the following form: 0 0 exp(vij + wij ) Pr(yij = 1j; ) = Pp 0 0 j =1 exp(vij + wij ) where, for reasons of identication, the pth (baseline) row of Wi has been set equal to zeros, and accordingly every pth column of Wi has also been deleted. In a slight abuse of notation, we continue to use to denote the coecient vector which conforms to this reformulated matrix. Grouping and into a single vector the sampling density is then given by: f (yj ) = p n Y Y i=1 j=1 Pr(yij = 1j )yij It follows from Bayes theorem that the posterior density of is given by: ( jy) / f (yj )( ) While this density is not available in closed form, a sequence of draws from it can be constructed using the M-H algorithm. This works as follows. 1. Calculate the mode and curvature of ( jy) using a standard maximization algorithm such as NewtonRaphson or BFGS. Denote the posterior mode ^ and the inverse of the posterior information matrix V^ 2. 3. 4. 5. 6. 7. g := 1 ^ V^ ) draw y from N (; n (yjy) ( g, j^;V^ ) o if rndu min 1; ( g, jy) ( y j^;V^ ) ( g ) ( g , 1) else := store (g) . g := g + 1 if g G goto step 3. set ( ( 1) 1) then (g) := y where is a user specied tuning parameter (usually between 1 and 2), and rndu denotes a draw from a uniform distribution with support on the unit interval. It is not dicult to show that this series of draws converges in distribution to ( jy). 14 3.4 Calculating the Marginal Likelihood of Multinomial Response Models The Bayes factor is the primary means by which Bayesian models are compared. In order to calculate the Bayes factor comparing any two models, it is necessary to compute the marginal likelihood of each model. Using the method of Chib (1995), the marginal likelihood of a MNL model can be computed directly within the MCMC sampling scheme outlined above. Even in the more complex MNP model, only minor additions to the normal MCMC sampling strategy are needed to compute an estimate of the marginal likelihood. For the MNL model, calculating the marginal likelihood is quite easy to do as both the sampling density and the prior density are available in closed form. Kernel density estimation can be used to calculate the remaining posterior ordinate. When the dimension of is large (greater than 6), estimating the posterior ordinate in reasonable-sized blocks will often improve the accuracy of the density estimate. To do this, note that any joint density (1 ; 2 ; :::; n jy) can be be factored as: (1 ; 2 ; :::; n jy) = (1 jy) (2 jy; 1 ) (3 jy; 1 ; 2 ) ::: (n jy; 1 ; 2 ; :::; n,1 ): The marginal ordinate (1 jy) can be calculated from a kernel density estimate constructed from the original MCMC iterations. Each conditional ordinate can be calculated from a kernel density estimate constructed from a set of reduced MCMC iterations in which the conditioning parameters are held constant at the xed values from . For more details, we refer the reader to Chib (1995) and Chib and Greenberg (n.d.). The MNP model is slightly more dicult to work with. To avoid a double superscript, let ~ = . To compute the value of f (yj ; ~ ) the GHK algorithm is used (Geweke, 1991; Hajivassiliou, 1990; Keane, 1994). Both ( ) and (~ ) are available in closed form. To calculate ( ; ~ jy) note that ( ; ~ jy) = ( jy; ~ )(~ jy) Once again, kernel density estimation can be used to calculate (~ jy) from the original series of MCMC draws. To calculate ( jy; ~ ) note that ( jy; ~ ) = Z ( jy; z; ~ )f (zjy; ~ )dz 15 As such, an accurate estimate of ( jy; ~ ) can be obtained from an additional series of reduced MCMC iterations through zjy; ; ~ and jz; ~ . The estimate of ( jy; ~ ) is given by ^ ( jy; ~ ) = G1 G X g=1 ( j1(g) ; B(1g) ) where G is the total number of reduced MCMC iterations, and 1 and B1 are as given in step three of the MNP sampling algorithm discussed above. For a more detailed treatment of the estimation strategies discussed in this section we urge the reader to see Chib et al. (1998). 4 Data and Research Design As noted above, our source of mass data is Euro-Barometer 11. In each case we operationalize two explanations { one based on the spatial theory of voting, the other a composite explanation which incorporates hypothesized sociological/structural determinants of voter choice along with the issue concerns associated with our measure of spatial distance. For each explanation we t an MNP and an MNL model of voter choice { each of which could plausibly have generated the observed data according to the explanations in question. To operationalize the spatial model, we use the results from the measurement procedure discussed in Section 2.2 to calculate the negative squared distance between each voter and each party. This measure is included as a choice specic covariate. To operationalize sociological / structural theories, we include three variables that capture notions of religion and class: Religious Importance (scored 0 for those indicating religion is not important, to 4 for those indicating religion is very important), Income (measured on an ordinal 12 point scale), and Manual Labor (coded 1 if the respondent is a manual laborer, and 0 otherwise). We also include other demographic characteristics commonly used in structural voting studies. Education measures the age the respondent nished formal education (ranging from 1 indicating 14 years or younger to 9 indicating 22 years or older). Town Size captures urban / rural splits in the electorate by measuring subjective town size (with 1 indicating small town / rural, 2 indicating middle-size town, and 3 indicating city). 16 5 Results For both the Netherlands and Denmark, we estimate a MNP and MNL model for each explanation. Thus, for each country, we estimate four models: a spatial MNP, a spatial MNL, a joint MNP of spatial distance and our sociological / structural covariates, and a joint MNL. For each model we report the marginal likelihood that we use for model comparison. In the remainder of this section, we present our results. 5.1 The Netherlands For the Netherlands we begin our analysis by estimating an MNP and MNL model that includes spatial distance and three constants as the only covariates. We summarize the posterior density samples for both models in Table 7. [Table 7 about here.] As is expected, the posterior mean for the spatial distance measure is positive in both models. In addition, the 95% Bayesian Credible Interval { which contains the central 95% of the posterior density { is positive. This implies that with probability no less than 97.5%, spatial distance is positively related to vote choice. Both models predict the vote share for each party well. Additionally, the MNP predicts 47.6% of the cases correctly, while the MNL predicts slightly less with 45.2% of the voters classied correctly. Thus, from looking at the coecients of the model and the percent correctly predicted, the models are quite comparable. In Table 8 we present the results from the joint MNP and MNL of Dutch voting behavior. [Table 8 about here.] Here we include our spatial distance measure, along with demographic covariates that should impact voter choice. Again, the spatial distance measure is clearly positive, as in both models the 95% BCI resides above zero. The demographic covariates perform as one would expect. In both the MNP and the MNL, those individuals who are manual laborers have a higher probability of voting for PvdA { the Dutch worker's party { than the other alternatives. Similarly, those who profess religious beliefs are more likely to vote for the CDA { the Christian democrats { than the other parties. The BCI for the income coecient on PvdA is negative in the MNP model (it is indistinguishable from zero in the MNL), while the coecient for VVD { 17 the party which espouses the the most conservative economic policies and is often referred to as the bourgeois party { is positive for both the MNP and the MNL. Both of these income coecients are consistent with the structure of Dutch politics. Town size does not seem to be related to voter choice in any systematic way. As expected, one's education impacts for whom one votes. The BCI on the PvdA coecient lies below zero for both models, and the BCI on the CDA coecient lies below zero in the MNL (it is indistinguishable from zero in the MNP). The coecients thus leave us with a story often-repeated in the Dutch context; workers vote for the PvdA, people professing religious beliefs vote for the CDA, and those with high incomes vote for the VVD. Both the MNP and MNL models point to this same story. The predicted voteshare for both the MNP and the MNL are close to the sample marginals. The MNP does slightly better at predicting individual votes correctly; it predicts 57.1% of the votes correctly while the MNL correctly predicts 55.5% of the voters. Again, by just comparing coecient estimates, predicted vote shares, and the percent correctly predicted, both the MNP and the MNL perform reasonably well. As we discuss in Section 3.4 and in Appendix B, one can use the Bayes factor to compare competing models and competing explanations of voting behavior at the same time. We present the Bayes factors for the four Dutch models in Table 9. [Table 9 about here.] In the Dutch case, the spatial MNP is the best explanatory model of voting behavior. The Bayes factors (on a logarithmic scale) of the spatial MNP versus the other three models is greater than ve, showing very strong support for the spatial MNP as being the best explanatory model. Substantively this indicates that in the Netherlands, the spatial model provides a very parsimonious and reasonably accurate account of voting behavior. This rational choice explanation of voter choice outperforms a traditional alternative. If we had only used MNP models of voter choice in our analysis, we would have reached the same conclusion because the Bayes factor between the spatial MNP and the joint MNP would remain the same. If, however, we would have relied only on MNL models, we would have incorrectly chosen the joint model as the 18 best explanatory model. This is because the Bayes factor between the joint MNL and the spatial MNL picks the joint model. This illustrates how the choice of statistical models can inuence the substantive conclusions one reaches about politics. By adopting the Bayesian framework, we can compare not only explanations (which is easily done using frequentist techniques), but we can also compare competing statistical models using probability as our scale. This case highlights necessity of comparing both explanations and models when studying voter choice. 5.2 Denmark Does the same pattern hold in Denmark? To answer this question, we estimate four models of Danish voting behavior. We present the results from the spatial model in Table 10. [Table 10 about here.] As expected, the 95% BCI on the negative spatial distance coecient lies above zero for both the MNP and the MNL models. This clearly shows that individuals tend to vote for the party closest to them in the Danish issue space. The predicted vote shares for both models are nearly identical to those of the population. Both models, however, are much poorer than the Dutch case when looking at the percent correctly predicted. Indeed, the MNP only correctly classies 31.0% of the cases, while the MNL does a bit better at 37.1%. While a null model of simply choosing the modal category (SD) would be predictively better, such a model would not be informative about the other parties which are of vital importance when attempting to understand coalition politics. In this case, both the MNP and the MNL are consistent with the same theoretical story of voters choosing the closest parties. In Table 11 we present results from the joint MNL and MNP models. [Table 11 about here.] The spatial distance measure stays positive even when controlling for the demographic factors included in this model. Very few of the demographic coecients dier from zero. Some notable exceptions are the religion coecient for SD. As expected, all else being equal, religious voters are more likely to vote for SD than the more radically leftist SFP (the baseline party). The 95% BCI of this parameter lies above zero for both the 19 MNP and the MNL models. The town size coecient for SD is negative in the MNP and the MNL, which is consistent with SD's urban base. The education coecients for KFP { the conservative bourgeois party { and for Venstre { the somewhat more moderate right wing party { are negative in the MNP model. As is the case in many West European countries, the more educated voters tend to vote with left-wing parties. In the MNL model, the eect for KFP persists, which the coecient for Venstre is indistinguishable from zero. These results are consistent with our understanding of Danish politics. Both of the models predict voteshare well for all parties. The MNP again correctly classies just 31.0% of the vote, while the MNL does better at 39.3%. From a predictive standpoint, the MNL model seems to outshine the MNP in Denmark.15 To compare both explanations and statistical models, we calculate the Bayes factors for all four models, and present them in Table 12. [Table 12 about here.] Table 12 shows that the spatial MNL is the best explanatory model of Danish voting behavior. The model is very strongly better than the spatial MNP and the joint MNL, and is slightly stronger than the joint MNP. Substantively, then, we reach the same conclusion as we did in the Netherlands { the spatial model is indeed the best explanatory model. However, in Denmark, the MNL model outperforms the MNP. If we had only relied on the MNP results to reach our conclusions, we would have selected the joint MNP over the spatial MNP. Again in this case, choosing the wrong statistical model would have led us to incorrect inferences about politics. It seems clear, therefore, that it is necessary to compare both explanations and models when studying voting behavior in the West European context. 6 Conclusion This paper serves as an evaluation of many methodological choices individuals have made when studying voting behavior in multi-party democracies. While many strategies exist for operationalizing the spatial model of voting, we advocate the use of conrmatory factor analysis as a measurement strategy. Not only can one use CFA to place voters and parties in a common space, but one can test the goodness-of-t of one 15 These results should be viewed with a bit of caution. Because of time constraints, our Danish MNP estimates are based on only 5000 MCMC iterations. While the estimates presented here are probably not grossly inaccurate, we will have more condence in these ndings after running the sampler for a longer time. 20 measurement model vs. another. The second methodological choice scholars have made deals with the explanations being tested. Our results demonstrate that the spatial model is a better explanatory model of both Dutch and Danish voting behavior than a model based on sociological / structural considerations. These results lend credence to the use of the spatial model of voting as a starting point for the analysis of coalition formation, and electoral strategy among other theoretical enterprises. In addition, this paper provides solid empirical evidence that this particular rational choice explanation of political behavior outperforms a sociological alternative. The nal methodological choice faced by scholars interested in multi-party democracy is that of a statistical model. Political scientists have moved beyond the use binomial choice models or ordered response models that do not reliably capture the multinomial nature of the data generating mechanism. However, it is far from clear whether the computationally easy multinomial logit model or the computationally dicult multinomial probit model should be used. The use of Bayesian methods advocated in this paper not only provides computational tools that make estimating the MNP model easier than other approaches, but it allows the direct comparison of the MNP and the MNL on the scale of probability using the Bayes factor. Our results show that in the Netherlands, the MNP model is best, while in Denmark the MNL model is superior. It thus seems application dependent, and something that scholars need to address in future substantive analysis. Our results also demonstrate that by choosing the wrong model for analysis can lead to incorrect inference. Thus, the conclusion to take from this paper is clear. The choice of covariates (model specication) as well as the choice of statistical models (function form) dramatically impacts the inferences one can make about politics. It is thus necessary to compare many alternative specications and models when studying voter choice in a multi-party democracy. 21 A Appendix. Question Wording of Issue Questions [Table A1 about here (Question Wording).] B Appendix. Bayesian Inference and Model Comparison While several textbooks provide comprehensive treatments of Bayesian inference (Gelman et al., 1995, for example), the theory and practice of Bayesian inference remains unfamiliar to most political scientists. In this appendix, we briey discuss Bayesian inference, Markov chain Monte Carlo (MCMC) simulation, and Bayesian model comparison to provide the reader with a better sense of the statistical results in the body of the paper. B.1 Bayesian Inference At the heart of Bayesian inference are probability statements. After observing data, we are interested in stating the probability of a set of parameters taking particular values. Thus, we are interested in making statements about the distribution (jy), which is known as the posterior density. y represents observed values of a dependent variable; the data. To make such probabilistic statements, one applies Bayes theorem, yielding, (jy) = R ff ((yyjj))(())d : R The normalizing constant f (yj)()d is called the marginal likelihood. In most cases, one works with the unnormalized posterior density. Thus, (jy) / f (yj)(): f (yj) is the sampling density, and () represents the researcher's prior beliefs about the value of . Note that in classical (frequentist) statistics one maximizes f (yj) with respect to when performing maximum likelihood estimation. The goal of Bayesian inference is to calculate the posterior distribution (jy) so that probability statements about and functionals of can be formed. In most applications, the posterior (jy) is not of standard form, and therefore cannot be investigated analytically. Nonetheless, it is relatively straightforward to generate a series of draws from this distribution using Markov chain Monte Carlo (MCMC) simulation. One MCMC algorithm is the Gibbs sampling algorithm. Using this algorithm, one simulates draws from a given joint posterior distribution ( j y) using information only from the full conditional distributions (1 jy; 2; : : : ; n ); (2 jy; 1 ; 3 ; : : : ; n ); : : : ; (n jy; 1 ; : : : ; n,1 ); for = f1 ; 2 ; : : : ; n g. The algorithm is often easy to implement because the full conditional distributions can be sampled directly in many models. Because the Gibbs sampler constitutes a Markov chain whose stationary distribution is equivalent to the target distribution ( j y), convergence can be assured as long as very mild regularity conditions are met. The sampler works by iteratively sampling from the full conditional distributions, conditioned on the most recent draw of . Let 1(0) ; : : : ; n(0) denote arbitrary starting values which are in the support of ( j y). At each gth iteration of the sampler, the following series of draws is made from the full conditional distributions: 1(g) j y; 2(g,1) ; 3(g,1) ; : : : ; n(g,1) 2(g) j y; 1(g) ; 3(g,1) ; : : : ; n(g,1) .. . .. . .. . n(g) j y; 1(g) ; .. . 2(g) ; .. . .. . : : : ; n(g,) 1 : These draws are stored and used to compute estimates of the posterior moments, probability intervals, and other quantities of interest. It is standard practice to discard the rst m burn-in draws and use only draws m + 1 to m + G to compute posterior quantities of interest. This helps eliminate sensitivity to initial conditions, and helps to ensure that remaining draws are representative of the target distribution. For an 22 accessible introduction to the Gibbs sampler, we refer the reader to Casella and George (1992), Chib and Greenberg (1996), and Albert and Chib (1996). For a more advanced theoretical discussion of the Gibbs sampler, we refer the reader to Geman and Geman (1984), Tanner and Wong (1987), Gelfand and Smith (1990), and Tierney (1994). The Gibbs sampling algorithm is a special case of the Metropolis-Hastings (MH) algorithm, which is another MCMC algorithm based on importance sampling (see Chib and Greenberg, 1995). Typically one uses the M-H algorithm when one or more of the conditional distributions are not of standard form. We employ both algorithms in this paper. B.2 The Bayes Factor One distinct advantage of Bayesian inference over classical (frequentist) approaches is the ability to test non-nested models. Central to the idea of Bayesian model comparison and hypothesis testing is the Bayes factor (Kass and Raftery, 1995; Jereys, 1961). The Bayes factor provides a convenient means to assess the amount of evidence in favor of one scientic theory vs. that for another (Kass and Raftery, 1995, p. 777). If the model j and model k are equally likely a priori, the Bayes factor for model j vs. model k (denoted Bjk ) is simply the ratio of the marginal likelihood of the data conditional on model j to the marginal likelihood of the data given model k. Somewhat more formally, in the Bayesian tradition, we assign prior probabilities that the data y was generated by model Mi : Pr(Mj ) and Pr(Mk ). In practice, we assume Pr(Mj ) = Pr(Mk ) = 1=2. After observing the data y, we are interested in the posterior probabilities: Pr(Mj j y) and Pr(Mj j y). Applying Bayes theorem, Kass and Raftery (1995) demonstrate that, Pr(Mj j y) Pr(y j Mj ) Pr(Mj ) Pr(Mk j y) = Pr(y j Mk ) Pr(Mk ) : This ratio is dened as the Bayes factor. Thus, the Bayes factor Bjk between models Mj and Mk with uniform priors is, Pr(y j Mj ) Bj;k = Pr( y j Mk ) R f (yj; Mj )(jMj )d = R f (y j; Mk ) (; Mk )d m(yjMj ) : m (y j M ) k Unlike frequentist hypothesis tests, the Bayes factor is not interpreted with respect to critical values. Instead, as Kass and Raftery (1995, p. 777) note, \[p]robability itself provides a meaningful scale dened by betting." Reworking a table rst suggested by Jereys (1961), Kass and Raftery suggest the following rough description of the information provided by the Bayes factor for scientic purposes (see Table B1). [Table B1 about here.] B.3 Computing Marginal Likelihoods Because of the diculties involved in calculating the integrals in the marginal likelihoods, Bayes factors have, until recently, played only a small role in applied work. However, recent advances in MCMC simulation have greatly enhanced our ability to calculate accurate estimates of the marginal likelihood. In this article, we specically use the reduced Gibbs sampling algorithm to estimate this quantity, oered by Chib (1995). Chib relies on the identity, m(y) = f (y (jj)y)() ; which he terms the basic marginal likelihood identity. Fixing at , one can estimate the marginal likelihood (on the logarithmic scale) as, ln m ^ (y) = ln f (y j ) + ln ( ) , ln ^ ( j y): 23 This is an appealing formulation because it only requires an evaluation of the likelihood at one point, the evaluation of the prior at one point, and an estimate of posterior ordinate at this same point. Chib (1995) discusses the simulation error of this estimate, and illustrates how one can estimate the marginal likelihood to any degree of precision using this computational technique. The marginal likelihood can be computed at any point in the parameter space . In practice, one usually chooses values of each parameter at a high density point. The rst quantity to calculate, f (y j ), is simply the likelihood evaluated at . The second quantity is the prior, evaluated at the point . The nal quantity ^ ( j y) is the posterior density ordinate. Although Chib considers the general case, assume that a Gibbs sampling algorithm has been applied to one vector block = 1 . The output from the Gibbs sampling algorithm is therefore f1(g) gGg=1 . Chib demonstrates that an appropriate Monte Carlo estimate of (1 ; j y) at point 1 is, ^ (1 j y) = G,1 G X g=1 (1 j y); Chib demonstrates that this estimate is simulation consistent. Extending this strategy to more than one vector block is relatively straightforward. In section 3.4 we demonstrate how this can be done. Once f (y j ); ( ); and ^ ( j y) are in hand, computing the marginal likelihood is trivial. C Appendix. Computation The conrmatory factor analysis of the Dutch data was performed via maximum likelihood in SAS using PROC CALIS. The conrmatory factor analysis of the Danish data was performed using Browne's asymptotically distribution free (ADF) method in the LISREL package. In the Danish case, a polychoric correlation matrix was analyzed. The Dutch factor analysis used a product-moment (Pearson) correlation matrix as input. Re-analysis of this data with the LISREL package using a polychoric correlation matrix and Browne's ADF method produces nearly identical results to those reported here. Preliminary, exploratory factor analysis was conducted in STATA. The missing Danish data was imputed using the S-PLUS function norm written by Joseph Schafer. This function is freely available at http://www.stat.psu.edu/jls/misoftwa.html. Because of the computational complexity of some of the models employed here, single imputation was judged to be a satisfactory compromise between full multiple imputation and simple listwise deletion. The imputations were constructed using the maximum likelihood estimates of the means, standard deviations, and correlations from the EM algorithm. The real-valued imputations of the ordinal variables were then rounded to the nearest ordinal category. See Schafer (1997) for a discussion of this practice. The bivariate kernel density estimates of voter ideal points were calculated using the S-PLUS function kde2D of Guy Nason and Martin M achler. This function is available via statlib: http://lib.stat.cmu.edu/. The default bandwidth was used to construct the estimates appearing in the paper. Other reasonable choices of bandwidth do not change the general appearance of Figures 1 and 2. Estimation of the MNL and MNP models was done in GAUSS. The MNL code was written by the authors. The code used to estimate the MNP models was generously provided by Sid Chib, Ed Greenberg, and Yuxin Chen. The kernel density estimation routines used to estimate the posterior ordinates are Ruud H. Koning's GAUSS implementations of the algorithms presented in Hardle (1990) and Silverman (1986). For all of the 1:34) where s denotes the samresults in this paper, a biweight kernel was used with bandwidth = 0:9 min(s;IQR= n= ple standard deviation, IQR denotes the interquartile range of the datapoints, and n is the sample size (see Silverman, 1986). These routines are freely available at: http://www.xs4all.nl/rhkoning/gauss.htm. The MNL models were t at a minimal computational cost { a few minutes on a 300 MHz Pentium II. The MNP models took between 12 and 17 hours, depending on the dataset and the specics of the algorithm employed. All gures were made using S-PLUS. 1 5 24 References Albert, James H., and Siddhartha Chib. 1993. \Bayesian Analysis of Binary and Polychotomous Response Data." Journal of the American Statistical Association 88(June):669{679. Albert, Jim, and Siddhartha Chib. 1996. \Computation in Bayesian Econometrics: An Introduction to Markov Chain Monte Carlo." Advances in Econometrics A 11(June):3{24. Alvarez, R. Michael, and Jonathan Nagler. 1995. \Economics, Issues and the Perot Candidacy: Voter Choice in the 1992 Presidential Election." American Journal of Political Science 39(August):714{744. Alvarez, R. Michael, and Jonathan Nagler. 1998. \When Politics and Model Collide: Estimating Models of Multi-Candidate Elections." American Journal of Political Science 42(January):55{96. Andeweg, Rudy B., and Galen A. Irwin. 1993. Dutch Government and Politics . New York: St. Martin's Press. Budge, Ian, David Robertson, and David Hearl. 1987. Ideology, Strategy, and Party Change . Cambridge: Cambridge University Press. Cahoon, L. 1975. \Locating a Set of Points Using Range Information Only." Ph.D. Dissertation, CarnegieMellon University. Cahoon, L., Melvin Hinich, and Peter Ordeshook. 1975. \A Multi-Dimensional Statistical Procedure for Spatial Analysis." VPI&SU and Carnegie-Mellon University: Typescript. Casella, George, and Edward I. George. 1992. \Explaining the Gibbs Sampler." The American Statistician 46(August):167{174. Chib, Siddhartha. 1995. \Marginal Likelihood From the Gibbs Output." Journal of the American Statistical Association 90(December):1313{1321. Chib, Siddhartha, and Edward Greenberg. 1995. \Understanding the Metropolis-Hastings Algorithm." The American Statistician 49(November):327{336. Chib, Siddhartha, and Edward Greenberg. 1996. \Markov Chain Monte Carlo Simulation Methods in Econometrics." Econometric Theory 12(August):409{431. Chib, Siddhartha, and Edward Greenberg. n.d. \Analysis of Multivariate Probit Models." Biometrika Forthcoming. Chib, Siddhartha, Edward Greenberg, and Yuxin Chen. 1998. \MCMC Methods for Fitting and Comparing Multinomial Response Models." Washington University in St. Louis: Typescript. Daalder, Hans. 1987. \The Dutch Party System: From Segmentation to Polarization { And Then?" In Party Systems in Denmark, Austria, Switzerland, the Netherlands, and Belgium , New York: St. Martin's Press. DeGroot, Morris H. 1986. Probability and Statistics . Reading, MA: Addison Wesley. DeSwaan, Abram. 1973. Coalition Theories and Cabinet Formation . Amsterdam: Elsevier. Dow, Jay K. 1997. \Voter Choice and Strategies in French Presidential Elections." Paper presented at the Annual Meeting of the Midwest Political Science Association. Downs, Anthony. 1957. An Economic Theory of Democracy . New York: Harper & Row. Enelow, James, and Melvin Hinich. 1984. The Spatial Theory of Voting: An Introduction . Cambridge: Cambridge University Press. 25 Gelfand, Alan E., and Adrian F. M. Smith. 1990. \Sampling-Based Approaches to Calculating Marginal Densities." Journal of the American Statistical Association 85(December):398{409. Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 1995. Bayesian Data Analysis . London: Chapman & Hall. Geman, S., and D. Geman. 1984. \Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images." IEEE Transactions of Pattern Analysis and Machine Intelligence 6(November):721{741. Geweke, John M. 1991. \Ecient Simulation form the Multivariate Normal and Student-t Distributions Subject to Linear Constraints." In Computer Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface , Fairfax, VA: Interface Foundation of America. Green, Donald P., and Ian Shapiro. 1994. Pathologies of Rational Choice Theory: A Critique of Applications in Political Science . New Haven: Yale University Press. Greene, William. 1997. Econometric Analysis . Upper Saddle River, NJ: Prentice-Hall, third edition. Hajivassiliou, V. A. 1990. \Smooth Simulation Estimation of Panel LDV Models." Typescript. Hardle, Wolfgang. 1990. Applied Non-Parametric Regression . Oxford: Oxford University Press. Institut fur Sozialwissenschaften and Europa-Institut of the Universitat Mannheim. 1983. European Elections Study: European Political Parties' Middle-Level Elites . Koln: Koln Zentralarchiv. Iversen, Torben. 1994. \Political Leadership and Representation in West European Democracies: A Test of Three Models of Voting." American Journal of Political Science 38(February):45{74. Jackman, Simon. 1997. \Pauline, the Mainstream, and Political Elites: the Place of Race in Austrailian Political Ideology." Manuscript. Jereys, H. 1961. Theory of Probability . Oxford: Oxford University Press, third edition. Kass, Robert E., and Adrian E. Raftery. 1995. \Bayes Factors." Journal of the American Statistical Association 90(June):773{795. Keane, Michael P. 1994. \A Computationally Practical Simulation Estimator for Panel Data." Econometrica January:95{116. Laver, Michael, and Norman Schoeld. 1990. Multiparty Government: The Politics of Coalition in Europe . Oxford: Oxford University Press. Lawrence, Eric. 1997. \A Simulated Maximum Likelihood Approach to the 1988 Democratic Primary." Paper presented at the Annual Meeting of the Midwest Political Science Association. Maddala, G. S. 1983. Limited-dependent and Qualitative Variables in Econometrics . Cambridge: Cambridge University Press. McCulloch, Robert, and Peter E. Rossi. 1994. \An Exact Likelihood Analyis of the Multinomial Probit Model." Journal of Econometrics 64(September-October):207{240. McFadden, Daniel. 1989. \A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration." Econometrica 57(September):995{1026. Ofek, Dganit, Kevin M. Quinn, and Itai Sened. 1998. \Voters, Parties, and Coalition Formation in Israel: Theory and Evidence." Paper presented at the Annual Meeting of the Midwest Political Science Association. 26 Quinn, Kevin M., Andrew D. Martin, and Andrew B. Whitford. 1996. \Explaining Voter Choice in MultiParty Democracy: A Look at Data from the Netherlands." Paper presented at the Annual Meeting of the American Political Science Association. Rabier, Jacques-Rene, and Ronald Inglehardt. 1981. Euro-Barometer 11 - April, 1979: The Year of the Child in Europe . Ann Arbor, MI: Inter-University Consortium for Political and Social Research. Rabinowitz, George, and Stuart Elaine MacDonald. 1989. \A Directional Theory of Issue Voting." American Political Science Review 83(1):93{121. Schafer, Joseph L. 1997. Analysis of Incomplete Multivariate Data . London: Chapman & Hall. Schoeld, Norman, and Itai Sened. 1998. \Political Equilibrium in Multiparty Democracies." Paper presented at the Annual Meeting of the Midwest Political Science Association. Schoeld, Norman J., Andrew D. Martin, Kevin M. Quinn, and Andrew B. Whitford. n.d. \Multiparty Electoral Competition in the Netherlands and Germany: A Model Based on Multinomial Probit." Public Choice Forthcoming. Shepsle, Kenneth A. 1991. Models of Multiparty Electoral Competition . Chur: Harwood Academic Publishers. Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis . London: Chapman and Hall. Tanner, M. A., and W. Wong. 1987. \The Calculation of Posterior Distributions by Data Augmentation." Journal of the American Statistical Association 82(June):528{550. Taylor, Michael, and Michael Laver. 1973. \Government Coalitions in Western Europe." European Journal of Political Research 1:205{248. Tierney, Luke. 1994. \Markov Chains for Exploring Posterior Distributions." Annals of Statistics 22(August):1701{1762. Whitten, Guy D., and Harvey D. Palmer. 1996. \Heightening Comparativists' Concern for Model Choice: Voting Behavior in Great Britain and the Netherlands." American Journal of Political Science 40(February):231{260. 27