...

Department of Political Science Campus Box 1063 Washington University

by user

on
Category: Documents
12

views

Report

Comments

Transcript

Department of Political Science Campus Box 1063 Washington University
Operationalizing and Testing Spatial Theories of Voting
Kevin M. Quinn
Andrew D. Martin
Department of Political Science
Department of Political Science
Campus Box 1063
Campus Box 1063
Washington University
Washington University
St. Louis, MO 63130
St. Louis, MO 63130
Email: [email protected]
Email: [email protected]
Department of Political Science Paper # 346
Preliminary Draft { Comments Are Welcome
April 15, 1998
Abstract
Spatial models of voting behavior provide the foundation for a substantial number of theoretical
results. Nonetheless, empirical work involving the spatial model faces a number of potential diculties.
First, measures of the latent voter and candidate issue positions must be obtained. Second, evaluating
the t of competing statistical models of voter choice is often more complicated than previously realized.
In this paper, we discuss precisely these issues. We argue that conrmatory factor analysis applied to
mass-level issue preference questions is an attractive means of measuring voter ideal points. We also show
how party issue positions can be recovered using a variation of this strategy. We go on to discuss the
problems of assessing the t of competing statistical models (multinomial logit vs. multinomial probit)
and competing explanations (those based on spatial theory vs. those derived from other theories of
voting such as sociological theories). We demonstrate how the Bayesian perspective not only provides
computational advantages in the case of tting the multinomial probit model, but also how it facilitates
both types of comparison mentioned above. Results from the Netherlands and Denmark suggest that
even when the computational cost of multinomial probit is disregarded, the decision whether to use
multinomial probit (MNP) or multinomial logit (MNL) is not clear-cut.
Paper presented at the 1998 Midwest Political Science Association Annual Meeting, April 1998, Chicago, Illinois. This work
is based on research supported under National Science Foundation Grants SBR 96-17708 and 97-30275. The authors thank the
Center in Political Economy at Washington University for additional nancial support. The authors also thank Sid Chib, Ed
Greenberg, Norman Schoeld, and Itai Sened for helpful comments and discussions. Andrew D. Martin will be joining the political science faculty at SUNY-Stony Brook in the fall of 1998. After June 1, his address will be the Department of Political Science,
State University of New York at Stony Brook, Stony Brook, NY, 11794-4392, Email: [email protected]. Please
address all correspondence to Kevin M. Quinn.
1
1 Introduction
The spatial model of voting holds a place of primacy in the formal theoretic literature. Its basic assumption
that a voter chooses the party / candidate who puts forth the policy proposal most preferred by the voter
forms the basis of a great deal of theoretical work.1 Theories of coalition formation (Schoeld et al., n.d.;
Schoeld and Sened, 1998), and electoral strategy (Downs, 1957; Shepsle, 1991), among others all rely to
some extent on the assumption of voters who act in accordance with the spatial model. While theoretical
work which makes use of the spatial model has progressed rapidly, empirical work has not progressed to the
same degree. We see this as the result of two distinct problems which confront researchers working in this
area. The rst problem revolves around the need to obtain measures of latent voter and candidate issue
positions. The second problem has to do with the means by which competing models and explanations are
compared.2
The question of how to locate voters and candidates in a common space has been the subject of much
debate and has been answered with varying degrees of success using a wide range of techniques (Cahoon,
1975; Cahoon et al., 1975; Enelow and Hinich, 1984; Rabinowitz and MacDonald, 1989; Iversen, 1994; Quinn
et al., 1996; Jackman, 1997; Dow, 1997). The adequacy of any of these strategies depends greatly on the task
at hand. In what follows, we argue that the use of conrmatory factor analysis on mass-level issue preference
questions oers several advantages over competing measurement methods in the context of operationalizing
and testing random utility models of voter choice.
On the other hand, the problem of testing competing models and explanations has largely been ignored
by political scientists.3 While the use of multinomial choice models has become commonplace in political
science (Alvarez and Nagler, 1995; Whitten and Palmer, 1996; Quinn et al., 1996; Lawrence, 1997; Schoeld
et al., n.d.; Alvarez and Nagler, 1998), these authors either assert the superiority of the multinomial probit
1 Throughout the paper we use the terms party and candidate interchangeably. This is entirely appropriate in both the
Netherlands and Denmark, since both have fairly pure forms of proportional representation. In other contexts, such as in the
United States, this usage would be clearly inappropriate.
2 Throughout this paper we use model to refer to a statistical model which makes specic distributional assumptions. In this
usage, multinomial probit and multinomial logit are two distinct models which may or may not have the same covariates. We
use the term explanation to refer to a particular specication of a model (i.e., the choice of covariates included in the model).
3 Researchers in other elds are beginning to see the importance of this. For instance, Chib et al. (1998) demonstrate that
in the context of inter-city travel mode choice MNL outperforms MNP.
2
model (MNP) over the multinomial logit model (MNL),4 or assert that given the oftentimes substantial cost
of tting the MNP model, the MNL model is close enough. Both assertions are tenuous, as ultimately, the
adequacy of either model is an empirical question.
To compare the performance of both the MNP and MNL models, and to assess competing explanations
of voting behavior, we adopt a Bayesian framework. We argue that the Bayesian paradigm oers several
advantages in the context of tting polychotomous choice models. Not only does this approach oer computational advantages when tting the MNP model, but it also allows for comparison of non-nested models in
an extremely straightforward and computationally practical manner. While such comparisons may seem to
be of little substantive import, the consequences of not assessing the comparative t of several models and
explanations should not be understated. For, as we demonstrate below using real data from the Netherlands
and Denmark, the choice of model can noticeably aect one's inferences. Similarly, neglecting to evaluate
the t of major competing explanations can also produce misleading inferences.
This paper proceeds as follows. In the following section, we discuss various strategies designed to obtain
measures of voter and candidate ideal points and elaborate a promising measurement strategy. Actual
measures of voter and party positions from the Netherlands and Denmark are also presented. In Section
3, we turn our attention to the MNP and MNL models, and discuss Bayesian estimation strategies for
each. We also discuss the computational aspects of calculating the marginal likelihood for purposes of
model comparison. Section 4, summarizes our data and provides a discussion of the voter choice models
we operationalize. Section 5 contains results from the MNP and MNL models of Dutch and Danish voting
behavior, and the Bayes factors of competing models and explanations. The nal section concludes.
2 Measurement and Spatial Voting Models
At the heart of spatial theories of voting is the idea that a voter's preferences can be represented by a point
in a reasonably low-dimensional space. Similarly, party policy declarations can be represented by points in
this same space. Unfortunately, neither these issue locations, nor relevant information about the issue space
itself (number of dimensions, substantive content of the dimensions, etc.) are directly observable. What is
4 Researchers sometimes distinguish between multinomial logit in which covariates are individual-specic and conditional
logit where some covariates are choice specic. For simplicity we refer to both models under the heading of multinomial logit.
3
needed is a means by which one can (a) obtain measures of the latent issue locations, and (b) test hypotheses
regarding the underlying structure of the issue space. Before demonstrating exactly how we operationalize
the spatial model, we briey critique other methods which have been used to construct spatial issue locations
of voters and parties/candidates.
2.1 Previous Measurement Strategies
The three most commonly-used techniques to estimate the latent issue positions of voters and candidates
are those methods which rely on (a) thermometer scores (Cahoon, 1975; Cahoon et al., 1975; Enelow and
Hinich, 1984; Dow, 1997), (b) voter perceptions of their own and candidates issue positions (Rabinowitz and
MacDonald, 1989; Alvarez and Nagler, 1998), and (c) the actual issue preferences of voters and candidates
(Iversen, 1994; Quinn et al., 1996; Jackman, 1997; Schoeld et al., n.d.). Ultimately, the usefulness of any
of these techniques depends on the problem at hand. Since our interest is in explaining and testing various
models of voter choice, we evaluate each of these methods in light of this goal. The minimal requirements
needed to meet this goal are that the voter positions and party positions are estimated in the same space
and that the measurement strategy employed does not make assumptions regarding voter preferences that
we wish to later test.
The rst procedure (Cahoon, 1975; Cahoon et al., 1975) assumes that a voter's thermometer score (a
measure of how an individuals feels about each candidate on a cardinal scale) of a candidates represents the
distance between the voter and the candidate plus a disturbance term. After some algebraic manipulation,
the thermometer scores are subjected to factor analysis { the end result being that voters and candidates can
be placed in the same issue space. The problem with this technique for the present purposes is that it begs
the question of why some voters view some candidates more favorably. Is it really because of a congruence of
policy declarations and issue preferences, or is it really due to sociological or psychological factors? Clearly,
using such a measurement strategy to test the spatial model vs. other explanations of voter choice greatly
stacks the deck in favor of the spatial model.
A second general method of locating voter and candidate positions relies on the subjective assessments of
the voters themselves as to their own issue preferences and the issue locations of the candidates. One version
4
of this method assumes that the ordinal responses of the voters are measured on a common scale and then
proceeds to measure spatial distance as the dierence between a voter's declared position on an issue and
either (a) her perceptions of the candidates, or (b) some type of summary (usually the means or medians) of
the population perceptions of the candidates (Rabinowitz and MacDonald, 1989; Alvarez and Nagler, 1998).
Oftentimes, the dimensionality of the issue space is collapsed by creating additive indices of these distances.
The problems with this type of approach are multiple. First, the assumption that the ordinal voter responses
are actually measured on common scale is empirically questionable. Second, the fact that voters oftentimes
equate their own policy preferences with those of their preferred party once again greatly stacks the deck
in favor of the spatial model when conducting tests vis-a-vis other types of explanations. Third, there is no
reason to believe that aggregating voter perceptions of party locations will produce accurate estimates of
the actual party positions. We expect this to be especially the case for small parties in multi-party systems
where perceptual errors will unlikely cancel out. Finally, there is no way to assess the goodness-of-t of such
a measurement strategy.
A third type measurement strategy is that which relies upon matching mass and elite survey data (Iversen,
1994; Quinn et al., 1996; Jackman, 1997; Schoeld et al., n.d.). For example, Iversen (1994) performs principle
components factor analysis on equivalently worded issue questions posed to European voters and European
party elites. The mass and elite responses are then scored. The problem with the Iversen approach is that
since the mass and elite responses are factor analyzed separately, the factor scores are projecting these two
groups into non-equivalent spaces. As such, the distance metric needed to operationalize the spatial model
of voting is not dened. A separate problem is that the type of factor analysis used by Iversen imposes no
theoretical structure on the factor model. As such, such an analysis is best seen as an exploratory exercise
as opposed to a measurement exercise.
More sophisticated analyses of matching mass and elite survey data are those of Quinn et al. (1996),
Jackman (1997), and Schoeld et al. (n.d.). These authors use conrmatory factor analysis on mass-level
issue preference questions to construct theoretically meaningful factors and to project voters into a common
issue space. They then use the mass-level scoring coecients to score the elite responses and to place the
5
elites into the same space as the voters. The benets of this approach are that (a) the factors can be
constructed theoretically and subjected to various tests of goodness-of-t, (b) independent measures of voter
and candidate issue preferences are used as input, and (c) voters and candidates are placed in a common
space indicative of voter perceptions. For these reasons, this is the method adopted in this paper.5
2.2 Locating Voters and Parties in a Common Space
The motivation for the measurement procedure which is employed in this paper is the following. First,
assume that voter political preferences can be represented by points in some relatively low-dimensional
space. Even though these points are not directly observable, we do have measures of voter preferences on
specic issues such as abortion availability, governmental control of industry, the need to reduce economic
inequality, and the like. Presumably, these specic responses were generated as a result of the voters' more
general political preferences (the unobservable spatial locations), and a random error. Consequently, one
can use factor analysis to recover estimates of the latent issue positions which presumably generated the
observed issue responses. Further, since we hypothesize that voters evaluate parties/candidates in terms of
their general political preferences, we use the same scoring coecients which we used to project voter issue
responses into the more general issue space to project candidate responses into this same issue space.
Employing conrmatory factor analysis to construct voter and party issue positions has several advantages. First, we are able to specify the structure of the factor model a priori on theoretical grounds. Second,
by putting constraints on the factor loadings, we are able to allow voter issue locations to be correlated
and to have unequal variances. As will become more apparent below, this additional exibility provides a
richer picture of Danish voter preferences. Similar ndings have been found in Australia (Jackman, 1997),
and Israel (Ofek et al., 1998). Finally, conrmatory factor analysis allows one to assess goodness-of-t of
competing factor models.
5 We have not discussed techniques designed solely to measure party/candidate issue locations such as the use of expert
opinions (DeSwaan, 1973; Taylor and Laver, 1973), and the content analysis of manifestos (Budge et al., 1987). Since these
methods say nothing about the spatial location of voters, they cannot be directly employed to operationalize spatial models
of voter choice. However, in situations where only reliable mass-level surveys are available, these techniques can be used to
generate hypothetical issue responses of the party leaders which can then be scored using the scoring coecients from the
mass-level conrmatory factor analysis. For such a measurement strategy see Ofek et al. (1998).
6
2.2.1 Data and Measurement Results
Before presenting the results of this measurement strategy we briey discuss our data sources. The data
used in this paper comes from two sources { one source which records mass opinion and one which records
elite opinion. To study mass opinion, we use the Euro-Barometer 11 data set (Rabier and Inglehardt, 1981).6
This survey was conducted in April 1979 with support from the European Community. Although this survey
was administered in nine nations, we only use the randomly-selected respondents from the Netherlands and
Denmark in this paper. To study elite opinion, we use the European Political Parties' Middle-Level Elites
(EPPMLE) data set (Institut fur Sozialwissenschaften and Europa-Institut of the Universitat Mannheim,
1983).7 The EPPMLE study consists of a survey of delegates to the European party conferences in 1979.
The EPPMLE study includes survey responses from the four major Dutch parties and the 8 major Danish
parties. Because of our relatively small sample size, we have chosen to focus only on the 5 Danish parties
represented in our sampel that received more than ve percent of the sample voteshare. In addition we use
single imputation to ll in missing data in the Danish case. This increases our sample size from 440 (using
listwise deletion) to 640.8 Thus, we have two surveys { one of Dutch and Danish political elites and one
of Dutch and Danish citizens { with nearly identical issue questions, conducted at nearly the same point in
time. See Table A1 in Appendix A for the issue questions administered in both surveys.
The measurement strategy employed here is the following. In each case an initial exploratory factor
analysis was conducted to get a sense as to how many factors to retain. In both cases, it appears that two or
three factors are present.9 Given our prior beliefs and small number of issue questions, we elected to estimate
two factor CFA models in each country. In the Dutch case, all observed variables had factor loadings greater
than 0.25 on the rst factor. Responses to questions regarding control of multinational corporations, income
inequality, penalties for terrorists, and abortion availability had factor loadings greater than 0.25. For this
reason, the factor loadings of the remaining observed variables on the second factor were constrained to 0.
6 The Euro-Barometer data sets are distributed in the United States by the Inter-University Consortium for Political and
Social Research (ICPSR Study 7752).
7 The EPPMLE data set was initially published through the Institut f
ur Sozialwissenschaftern and Europa-Institut of the
Universitat Mannheim, and is available through the Koln Zentralarchiv. The EPPMLE is a proprietary data set.
8 Details of the imputation procedure can be found in Appendix C
9 In the Dutch case principle components factor analysis revealed 3 factors with eigenvalues greater than 1. The eigenvalue
of the third factor was approximately equal to 1. The Danish data produced similar ndings.
7
The t of this model was judged to be sucient.
The Danish case was somewhat more complicated. A similar procedure was used to specify the initial
CFA model. The t of this model was judged to be excessively poor. After tting a number of other models
we found that a reasonable t could be obtained by restricting the loadings of the terrorism, nuclear energy,
and multinational corporation responses to be equal to zero on the rst factor, and the loadings of the
remaining observed variables to be equal to 0 on the second factor. In addition we allowed for the factors to
be correlated. The factor loadings derived from the CFA procedures are presented in Tables 1 and 2.10
[Insert Tables 1 and 2 about here.]
We call the rst factor in the Dutch case a general economic (left-right) factor; we identify the second as
a measure of preferences over the scope of government. In the Danish case, interpretation of the factors
is slightly more dicult. In part, this is owes to the high (0.899) correlation among the two factors which
makes interpreting either factor in a vacuum somewhat misleading. In a sense, this result is telling us that
the Danish issue space is nearly one-dimensional. For this reason, we prefer to interpret the two factors
together as co-determinants of a general economic (left-right) dimension.
From the CFA factor results we obtain the factor scoring coecients which we use to place the voters in
the issue space implied by the CFA results. We present the regression scoring coecients for each country
in Tables 3 and 4.
[Insert Tables 3 and 4 about here.]
In addition, we use these same scoring coecients in conjunction with the elite responses to the equivalently
worded issue questions to place each member of the party conference in the same issue space as the voters.
We then use the median position of each delegation on each dimension as an estimate of the policy position
of each party. Tables 5 and 6 contain the locations obtained using this procedure for the Dutch and Danish
cases.
[Insert Tables 5 and 6 about here.]
10
More detail of the methods employed here are given in Appendix C.
8
Since parties and voters are now projected into the same space, distance measures are easily dened. Figures
1 and 2 present density estimates of Dutch and Danish voter ideal points with the relevant Dutch and Danish
party positions superimposed.
[Insert Figures 1 and 2 about here.]
From Figures 1 and 2 we see that our results t common understandings of Dutch and Danish politics. For
example, the left-right orientations of the parties are consistent with nearly all past work. Furthermore, the
Dutch party positions are similar to the two-dimensional spatial maps cited by (Daalder, 1987, pp. 212, 233).
As we expect, the CDA seems to force the relative party positions away from a uni-dimensional ordering.
Given the CDA's strong views on abortion, this result is expected (for example, see Andeweg and Irwin,
1993). Similarly, the Danish spatial map is consistent with the one-dimensional orderings presented in Laver
and Schoeld (1990).
3 Fitting and Evaluating Multinomial Data Models11
Most political scientists now recognize that individual-level voting data drawn from a multi-party democracy
are multinomial data, and as such, multinomial response models such as the multinomial probit (MNP) and
multinomial logit (MNL) models are better suited to the modeling of this data than are the more familiar
binomial probit and logit models. What many researchers have failed to recognize is that the choice between
dierent types of multinomial response model is not always a simple matter. While it would seem that
the greater exibility of the MNP model would make it preferable to the more restrictive MNL model, we
demonstrate below that once model complexity is accounted for, the MNP model is not always the clear
winner.
A related concern stems from the our desire to assess the relative explanatory power of the spatial model
vis-a-vis competing explanations of voter choice. In so doing, we can directly assess the explanatory power of
a rational choice explanation of political behavior versus other explanations (see Green and Shapiro, 1994).
As we discuss below, Bayesian inference provides a consistent and computationally practical means to achieve
each of these aims.
11
This section draws heavily from Chib et al. (1998)
9
The purpose of this section is to demonstrate how two major polychotomous choice models can be t
from a Bayesian perspective, and to demonstrate one of the main attractions of pursuing such a Bayesian
model-tting strategy { the relative ease with which one can assess the relative t of non-nested models with
dierent functional forms. This is something which is extremely dicult to do from a classical (frequentist)
perspective. On the other hand, the Bayesian paradigm provides a relatively simple means to compare the
t of any two models t to the same data.
For a broader discussion of the properties of the MNP and MNL models we refer the reader to any of the
several good, general discussions of these models (Alvarez and Nagler, 1995; Lawrence, 1997; Alvarez and
Nagler, 1998) aimed at a political science audience. These papers also discuss various classical estimation
strategies, as do the works of Maddala (1983) and Greene (1997).
3.1 Random Utility Motivation12
Given data from n individuals (voters) choosing between p alternatives (parties) both the MNP model and
the MNL model can be motivated by the following random utility model:
zi = Vi + Wi + ui
yij =
1 if zij = max(zi )
0 otherwise
(1)
for i = 1; :::; n and j = 1; :::; p
where zi is a p 1 vector, with zij dened as the utility voter i attaches to voting for party j ; Vi is a p l
matrix of choice-specic covariates; Wi is a p m matrix of individual-specic covariates;13 and are
vectors of choice-specic and individual-specic coecients, respectively; ui is a p 1 vector of disturbances;
and yi is a p 1 vector representing the observed vote choice of individual i. The probability that alternative
j is chosen by individual i is simply the probability that zij is equal to max(zi ). This is the basic random
utility model which can be used to motivate both MNP and MNL models of voter choice. The dierence
between these models stems from decisions as to how the disturbance terms are assumed to be distributed.
12 Throughout this paper we use the following notation. Lower case, non-bold Roman letters indicate scalars; lower case, bold
Roman letters indicate vectors; and upper case, bold Roman letters indicate matrices. All vectors are assumed to be column
vectors unless otherwise noted.
13 Throughout this section we assume that W is formed as W I where W is the original n m matrix of m individualp
specic attributes from all n individuals.
10
3.2 Bayesian Estimation of the MNP Model
The multinomial probit model results from the assumption that the errors in Equation 1 are distributed
multivariate normal with mean vector 0 and variance-covariance matrix . In order to identify the MNP
model, Equation 1 has to be slightly reformulated. First, note that one identication problem arises from the
fact that an arbitrary constant can be added to both sides of Equation 1 without changing the distribution of
yi . In order to remedy this identication problem, it is customary to express each zij relative to zip. Dene
,1 ) where zij = zij , zip . The underlying regression model is now zi = Xi + "i ; where
zi = (zi1 ; :::; zi;p
"i Np,1 (0; ) and Xi is the new matrix of covariates obtained by horizontally concatenating Vi = Vi , v0ip
to Wi and then deleting the pth row of X and each column of individual-specic attributes for the pth choice
category.14 Stacking the zi s, the random utility model now becomes:
2
64
z1 3 2 X1 3
2
75 + 64
.. 75 = 64 ..
.
.
zn
n(p,1)1
k1
Xn
n(p,1)k
3
.. 75
.
"n
"1
n(p,1)1
A second identication problem inherent in the MNP model stems from the fact that multiplying zi by a
positive constant will not change the value of yi . This problem is traditionally solved by restricting 11 to
be equal to 1. For notational purposes, we refer to the restricted matrix as .
The probability that individual i chooses party j is:
Pr(yij = 1j; ) =
Z
Aj
p,1 (zi jXi ; )dzi
(2)
where p,1 represents the p , 1-variate normal probability density function, and
Aj =
fzi : zij > 0; zij > zi; ,j g
for all j p , 1
fzi : zi1 < 0; zi2 < 0; :::; zi;p,1 < 0g for j = p:
The sampling density is then given by:
f (yj; ) =
p
n Y
Y
Pr(yij = 1j; )yij
i=1 j =1
1
1
14 It should be noted that if = I ,then = I
p
p,1 + 110 . This can be normalized to = 2 Ip,1 + 2 110 : This follows
directly from the rules for calculating the variance and covariance of sums and dierences of random variables (see DeGroot,
1986, p. 216). Not only will not be an identity matrix when the undierenced disturbances are i.i.d., but since the mapping
from to is many to one, it is possible to say very little about from knowledge of unless additional assumptions are
made.
11
The posterior density of and is given by Bayes theorem as:
(; jy) / f (yj; )( )( )
where, ( ) and ( ) denote the prior densities of and respectively.
While, in theory, Bayesian estimation of the MNP model can proceed by using a suitable method (such as
importance sampling or the Metropolis-Hastings algorithm) to draw samples of the model parameters directly
from the posterior density, this approach suers most of the well-known numerical problems associated with
tting the MNP model { the relatively high computational cost and/or low numerical accuracy of evaluating
the integrals in Equation 2. The key to avoiding many of these problems is the concept of data augmentation
(Tanner and Wong, 1987; Albert and Chib, 1993).
Data augmentation is a very general method which can be used to deal with missing data (see Schafer,
1997). Before detailing how the data augmentation algorithm is employed in the MNP model, note that
if the latent vector of utilities (z ) was observed, the MNP model would reduce to a seemingly unrelated
regression (SUR) model. Since such a model is a linear model with normal disturbances, it is not dicult to
t from either a Bayesian or classical perspective. The central idea behind data augmentation is that even
though the actual value of z is unobserved, we do know how it is distributed conditional on the data and
other model parameters. By including draws of these unobserved values of zi inside what would otherwise
be an MCMC sampling scheme for a slightly reformulated SUR model, we are able to t the MNP model at
a minimal computational cost.
The actual MCMC sampling algorithm employed in this paper is based on work by Chib et al. (1998),
and to a lesser extent Albert and Chib (1993) and Chib and Greenberg (n.d.). The algorithm employed here
is the following:
1.
2.
g := 1
from (zij jyi ; zi;,j ; (g,1) ; g, )
draw zij
set
(
3.
draw
(g)
4.
draw
g
5.
store
( )
(g)
1)
( jy; z ; g, )
from (i jy; z ; (g) )
(
from
and
1)
g .
( )
12
for
j = 1; 2; :::p , 1
and
i = 1; 2; :::n:
6.
7.
set
if
g := g + 1
g G goto
step 2.
The distribution of zij jyi ; zi;,j ; (g,1) ; g, is a univariate truncated normal distribution whose mean and
(
1)
variance follow from standard normal theory (see McCulloch and Rossi, 1994, for details within the context
of the MNP model). For problems with a small number of choices, it may be more computationally ecient
to sample zi directly using the accept-reject method.
We employ a conjugate, normal prior on , ( ) = k (0 ; B0 ). From this it follows that
jy; z ; N (1 ; B1 )
P
P
where B1 = (B,0 1 + i X0i , Xi ),1 and 1 = B1 (B,0 1 0 + i X0i , zi ). We use the log-Choleski
1
1
parameterization of employed by Chib et al. (1998). To summarize, we can use the Choleski decomposition
to factor any symmetric, positive denite, (p , 1) (p , 1) matrix A as A = LL0 where L is a (p , 1) (p , 1)
lower triangular matrix with typical element lrc. If a11 = 1, then l11 = 1. Using a log transformation to
restrict the diagonal elements of L to be positive, we have the following parameterization:
= (l21 ; log(l22 ); l31 ; :::; log(lp,1;p,1 ))0
0
((1 ; 2 ; p )
(3)
where p = (p + 1)(p , 2)=2: As Chib et al. (1998) note:
The mapping between [ ] and is one-to-one.
This parameterization of [ ] leaves the
p
vector entirely unrestricted. Any 2 R leads to a matrix [ ] that is symmetric, positive
denite, and has 11 = 1 (1998, emphasis in original).
Since the conditional distribution of ( ) is not available in closed form, we employ a Metropolis-Hastings
step (Chib and Greenberg, n.d.) to sample from this distribution. Once again, conditioning on the latent
vector z makes this step relatively easy. For more specic details of the algorithm employed, we urge the
reader to see Chib et al. (1998).
3.3 Bayesian Estimation of the MNL Model
The MNL model results from the assumption that the disturbances in Equation 1 are independently and
identically distributed according to the Weibull distribution (McFadden, 1989; Greene, 1997). As McFadden
13
has shown, the choice probabilities then take the following form:
0
0
exp(vij + wij )
Pr(yij = 1j; ) = Pp
0
0
j =1 exp(vij + wij )
where, for reasons of identication, the pth (baseline) row of Wi has been set equal to zeros, and accordingly
every pth column of Wi has also been deleted. In a slight abuse of notation, we continue to use to denote
the coecient vector which conforms to this reformulated matrix.
Grouping and into a single vector the sampling density is then given by:
f (yj ) =
p
n Y
Y
i=1 j=1
Pr(yij = 1j )yij
It follows from Bayes theorem that the posterior density of is given by:
( jy) / f (yj )( )
While this density is not available in closed form, a sequence of draws from it can be constructed using the
M-H algorithm. This works as follows.
1. Calculate the mode and curvature of ( jy) using a standard maximization algorithm such as NewtonRaphson or BFGS. Denote the posterior mode ^ and the inverse of the posterior information matrix
V^
2.
3.
4.
5.
6.
7.
g := 1
^ V^ )
draw y from N (;
n (yjy) ( g, j^;V^ ) o
if rndu min 1; ( g, jy)
( y j^;V^ )
(
g
)
(
g
,
1)
else := store (g) .
g := g + 1
if g G goto step 3.
set
(
(
1)
1)
then
(g) := y
where is a user specied tuning parameter (usually between 1 and 2), and rndu denotes a draw from a
uniform distribution with support on the unit interval. It is not dicult to show that this series of draws
converges in distribution to ( jy).
14
3.4 Calculating the Marginal Likelihood of Multinomial Response Models
The Bayes factor is the primary means by which Bayesian models are compared. In order to calculate the
Bayes factor comparing any two models, it is necessary to compute the marginal likelihood of each model.
Using the method of Chib (1995), the marginal likelihood of a MNL model can be computed directly within
the MCMC sampling scheme outlined above. Even in the more complex MNP model, only minor additions
to the normal MCMC sampling strategy are needed to compute an estimate of the marginal likelihood.
For the MNL model, calculating the marginal likelihood is quite easy to do as both the sampling density
and the prior density are available in closed form. Kernel density estimation can be used to calculate the
remaining posterior ordinate. When the dimension of is large (greater than 6), estimating the posterior
ordinate in reasonable-sized blocks will often improve the accuracy of the density estimate. To do this, note
that any joint density (1 ; 2 ; :::; n jy) can be be factored as:
(1 ; 2 ; :::; n jy) = (1 jy) (2 jy; 1 ) (3 jy; 1 ; 2 ) ::: (n jy; 1 ; 2 ; :::; n,1 ):
The marginal ordinate (1 jy) can be calculated from a kernel density estimate constructed from the original
MCMC iterations. Each conditional ordinate can be calculated from a kernel density estimate constructed
from a set of reduced MCMC iterations in which the conditioning parameters are held constant at the xed
values from . For more details, we refer the reader to Chib (1995) and Chib and Greenberg (n.d.).
The MNP model is slightly more dicult to work with. To avoid a double superscript, let ~ = . To
compute the value of f (yj ; ~ ) the GHK algorithm is used (Geweke, 1991; Hajivassiliou, 1990; Keane,
1994). Both ( ) and (~ ) are available in closed form. To calculate ( ; ~ jy) note that
( ; ~ jy) = ( jy; ~ )(~ jy)
Once again, kernel density estimation can be used to calculate (~ jy) from the original series of MCMC
draws. To calculate ( jy; ~ ) note that
( jy; ~ ) =
Z
( jy; z; ~ )f (zjy; ~ )dz
15
As such, an accurate estimate of ( jy; ~ ) can be obtained from an additional series of reduced MCMC
iterations through zjy; ; ~ and jz; ~ . The estimate of ( jy; ~ ) is given by
^ ( jy; ~ ) = G1
G
X
g=1
( j1(g) ; B(1g) )
where G is the total number of reduced MCMC iterations, and 1 and B1 are as given in step three of
the MNP sampling algorithm discussed above. For a more detailed treatment of the estimation strategies
discussed in this section we urge the reader to see Chib et al. (1998).
4 Data and Research Design
As noted above, our source of mass data is Euro-Barometer 11. In each case we operationalize two explanations { one based on the spatial theory of voting, the other a composite explanation which incorporates
hypothesized sociological/structural determinants of voter choice along with the issue concerns associated
with our measure of spatial distance. For each explanation we t an MNP and an MNL model of voter
choice { each of which could plausibly have generated the observed data according to the explanations in
question. To operationalize the spatial model, we use the results from the measurement procedure discussed
in Section 2.2 to calculate the negative squared distance between each voter and each party. This measure
is included as a choice specic covariate.
To operationalize sociological / structural theories, we include three variables that capture notions of
religion and class: Religious Importance (scored 0 for those indicating religion is not important, to 4 for
those indicating religion is very important), Income (measured on an ordinal 12 point scale), and Manual
Labor (coded 1 if the respondent is a manual laborer, and 0 otherwise). We also include other demographic
characteristics commonly used in structural voting studies. Education measures the age the respondent
nished formal education (ranging from 1 indicating 14 years or younger to 9 indicating 22 years or older).
Town Size captures urban / rural splits in the electorate by measuring subjective town size (with 1 indicating
small town / rural, 2 indicating middle-size town, and 3 indicating city).
16
5 Results
For both the Netherlands and Denmark, we estimate a MNP and MNL model for each explanation. Thus, for
each country, we estimate four models: a spatial MNP, a spatial MNL, a joint MNP of spatial distance and
our sociological / structural covariates, and a joint MNL. For each model we report the marginal likelihood
that we use for model comparison. In the remainder of this section, we present our results.
5.1 The Netherlands
For the Netherlands we begin our analysis by estimating an MNP and MNL model that includes spatial
distance and three constants as the only covariates. We summarize the posterior density samples for both
models in Table 7.
[Table 7 about here.]
As is expected, the posterior mean for the spatial distance measure is positive in both models. In addition,
the 95% Bayesian Credible Interval { which contains the central 95% of the posterior density { is positive.
This implies that with probability no less than 97.5%, spatial distance is positively related to vote choice.
Both models predict the vote share for each party well. Additionally, the MNP predicts 47.6% of the cases
correctly, while the MNL predicts slightly less with 45.2% of the voters classied correctly. Thus, from looking
at the coecients of the model and the percent correctly predicted, the models are quite comparable.
In Table 8 we present the results from the joint MNP and MNL of Dutch voting behavior.
[Table 8 about here.]
Here we include our spatial distance measure, along with demographic covariates that should impact voter
choice. Again, the spatial distance measure is clearly positive, as in both models the 95% BCI resides above
zero. The demographic covariates perform as one would expect. In both the MNP and the MNL, those
individuals who are manual laborers have a higher probability of voting for PvdA { the Dutch worker's
party { than the other alternatives. Similarly, those who profess religious beliefs are more likely to vote for
the CDA { the Christian democrats { than the other parties. The BCI for the income coecient on PvdA is
negative in the MNP model (it is indistinguishable from zero in the MNL), while the coecient for VVD {
17
the party which espouses the the most conservative economic policies and is often referred to as the bourgeois
party { is positive for both the MNP and the MNL. Both of these income coecients are consistent with the
structure of Dutch politics. Town size does not seem to be related to voter choice in any systematic way. As
expected, one's education impacts for whom one votes. The BCI on the PvdA coecient lies below zero for
both models, and the BCI on the CDA coecient lies below zero in the MNL (it is indistinguishable from
zero in the MNP).
The coecients thus leave us with a story often-repeated in the Dutch context; workers vote for the
PvdA, people professing religious beliefs vote for the CDA, and those with high incomes vote for the VVD.
Both the MNP and MNL models point to this same story. The predicted voteshare for both the MNP and
the MNL are close to the sample marginals. The MNP does slightly better at predicting individual votes
correctly; it predicts 57.1% of the votes correctly while the MNL correctly predicts 55.5% of the voters.
Again, by just comparing coecient estimates, predicted vote shares, and the percent correctly predicted,
both the MNP and the MNL perform reasonably well.
As we discuss in Section 3.4 and in Appendix B, one can use the Bayes factor to compare competing
models and competing explanations of voting behavior at the same time. We present the Bayes factors for
the four Dutch models in Table 9.
[Table 9 about here.]
In the Dutch case, the spatial MNP is the best explanatory model of voting behavior. The Bayes factors
(on a logarithmic scale) of the spatial MNP versus the other three models is greater than ve, showing very
strong support for the spatial MNP as being the best explanatory model. Substantively this indicates that in
the Netherlands, the spatial model provides a very parsimonious and reasonably accurate account of voting
behavior. This rational choice explanation of voter choice outperforms a traditional alternative.
If we had only used MNP models of voter choice in our analysis, we would have reached the same
conclusion because the Bayes factor between the spatial MNP and the joint MNP would remain the same. If,
however, we would have relied only on MNL models, we would have incorrectly chosen the joint model as the
18
best explanatory model. This is because the Bayes factor between the joint MNL and the spatial MNL picks
the joint model. This illustrates how the choice of statistical models can inuence the substantive conclusions
one reaches about politics. By adopting the Bayesian framework, we can compare not only explanations
(which is easily done using frequentist techniques), but we can also compare competing statistical models
using probability as our scale. This case highlights necessity of comparing both explanations and models
when studying voter choice.
5.2 Denmark
Does the same pattern hold in Denmark? To answer this question, we estimate four models of Danish voting
behavior. We present the results from the spatial model in Table 10.
[Table 10 about here.]
As expected, the 95% BCI on the negative spatial distance coecient lies above zero for both the MNP
and the MNL models. This clearly shows that individuals tend to vote for the party closest to them in the
Danish issue space. The predicted vote shares for both models are nearly identical to those of the population.
Both models, however, are much poorer than the Dutch case when looking at the percent correctly predicted.
Indeed, the MNP only correctly classies 31.0% of the cases, while the MNL does a bit better at 37.1%. While
a null model of simply choosing the modal category (SD) would be predictively better, such a model would
not be informative about the other parties which are of vital importance when attempting to understand
coalition politics. In this case, both the MNP and the MNL are consistent with the same theoretical story
of voters choosing the closest parties.
In Table 11 we present results from the joint MNL and MNP models.
[Table 11 about here.]
The spatial distance measure stays positive even when controlling for the demographic factors included in this
model. Very few of the demographic coecients dier from zero. Some notable exceptions are the religion
coecient for SD. As expected, all else being equal, religious voters are more likely to vote for SD than the
more radically leftist SFP (the baseline party). The 95% BCI of this parameter lies above zero for both the
19
MNP and the MNL models. The town size coecient for SD is negative in the MNP and the MNL, which
is consistent with SD's urban base. The education coecients for KFP { the conservative bourgeois party {
and for Venstre { the somewhat more moderate right wing party { are negative in the MNP model. As is
the case in many West European countries, the more educated voters tend to vote with left-wing parties. In
the MNL model, the eect for KFP persists, which the coecient for Venstre is indistinguishable from zero.
These results are consistent with our understanding of Danish politics. Both of the models predict voteshare
well for all parties. The MNP again correctly classies just 31.0% of the vote, while the MNL does better
at 39.3%. From a predictive standpoint, the MNL model seems to outshine the MNP in Denmark.15
To compare both explanations and statistical models, we calculate the Bayes factors for all four models,
and present them in Table 12.
[Table 12 about here.]
Table 12 shows that the spatial MNL is the best explanatory model of Danish voting behavior. The model
is very strongly better than the spatial MNP and the joint MNL, and is slightly stronger than the joint
MNP. Substantively, then, we reach the same conclusion as we did in the Netherlands { the spatial model
is indeed the best explanatory model. However, in Denmark, the MNL model outperforms the MNP. If we
had only relied on the MNP results to reach our conclusions, we would have selected the joint MNP over
the spatial MNP. Again in this case, choosing the wrong statistical model would have led us to incorrect
inferences about politics. It seems clear, therefore, that it is necessary to compare both explanations and
models when studying voting behavior in the West European context.
6 Conclusion
This paper serves as an evaluation of many methodological choices individuals have made when studying
voting behavior in multi-party democracies. While many strategies exist for operationalizing the spatial
model of voting, we advocate the use of conrmatory factor analysis as a measurement strategy. Not only
can one use CFA to place voters and parties in a common space, but one can test the goodness-of-t of one
15 These results should be viewed with a bit of caution. Because of time constraints, our Danish MNP estimates are based
on only 5000 MCMC iterations. While the estimates presented here are probably not grossly inaccurate, we will have more
condence in these ndings after running the sampler for a longer time.
20
measurement model vs. another.
The second methodological choice scholars have made deals with the explanations being tested. Our
results demonstrate that the spatial model is a better explanatory model of both Dutch and Danish voting
behavior than a model based on sociological / structural considerations. These results lend credence to the
use of the spatial model of voting as a starting point for the analysis of coalition formation, and electoral
strategy among other theoretical enterprises. In addition, this paper provides solid empirical evidence that
this particular rational choice explanation of political behavior outperforms a sociological alternative.
The nal methodological choice faced by scholars interested in multi-party democracy is that of a statistical model. Political scientists have moved beyond the use binomial choice models or ordered response
models that do not reliably capture the multinomial nature of the data generating mechanism. However, it
is far from clear whether the computationally easy multinomial logit model or the computationally dicult
multinomial probit model should be used. The use of Bayesian methods advocated in this paper not only
provides computational tools that make estimating the MNP model easier than other approaches, but it
allows the direct comparison of the MNP and the MNL on the scale of probability using the Bayes factor.
Our results show that in the Netherlands, the MNP model is best, while in Denmark the MNL model is
superior. It thus seems application dependent, and something that scholars need to address in future substantive analysis. Our results also demonstrate that by choosing the wrong model for analysis can lead to
incorrect inference. Thus, the conclusion to take from this paper is clear. The choice of covariates (model
specication) as well as the choice of statistical models (function form) dramatically impacts the inferences
one can make about politics. It is thus necessary to compare many alternative specications and models
when studying voter choice in a multi-party democracy.
21
A Appendix. Question Wording of Issue Questions
[Table A1 about here (Question Wording).]
B Appendix. Bayesian Inference and Model Comparison
While several textbooks provide comprehensive treatments of Bayesian inference (Gelman et al., 1995, for
example), the theory and practice of Bayesian inference remains unfamiliar to most political scientists. In
this appendix, we briey discuss Bayesian inference, Markov chain Monte Carlo (MCMC) simulation, and
Bayesian model comparison to provide the reader with a better sense of the statistical results in the body
of the paper.
B.1 Bayesian Inference
At the heart of Bayesian inference are probability statements. After observing data, we are interested in
stating the probability of a set of parameters taking particular values. Thus, we are interested in making
statements about the distribution (jy), which is known as the posterior density. y represents observed
values of a dependent variable; the data. To make such probabilistic statements, one applies Bayes theorem,
yielding,
(jy) = R ff ((yyjj))(())d :
R
The normalizing constant f (yj)()d is called the marginal likelihood. In most cases, one works with
the unnormalized posterior density. Thus,
(jy) / f (yj)():
f (yj) is the sampling density, and () represents the researcher's prior beliefs about the value of . Note
that in classical (frequentist) statistics one maximizes f (yj) with respect to when performing maximum
likelihood estimation. The goal of Bayesian inference is to calculate the posterior distribution (jy) so that
probability statements about and functionals of can be formed.
In most applications, the posterior (jy) is not of standard form, and therefore cannot be investigated
analytically. Nonetheless, it is relatively straightforward to generate a series of draws from this distribution using Markov chain Monte Carlo (MCMC) simulation. One MCMC algorithm is the Gibbs sampling
algorithm. Using this algorithm, one simulates draws from a given joint posterior distribution ( j y) using
information only from the full conditional distributions
(1 jy; 2; : : : ; n ); (2 jy; 1 ; 3 ; : : : ; n ); : : : ; (n jy; 1 ; : : : ; n,1 );
for = f1 ; 2 ; : : : ; n g. The algorithm is often easy to implement because the full conditional distributions
can be sampled directly in many models. Because the Gibbs sampler constitutes a Markov chain whose
stationary distribution is equivalent to the target distribution ( j y), convergence can be assured as long
as very mild regularity conditions are met.
The sampler works by iteratively sampling from the full conditional distributions, conditioned on the most
recent draw of . Let 1(0) ; : : : ; n(0) denote arbitrary starting values which are in the support of ( j y). At
each gth iteration of the sampler, the following series of draws is made from the full conditional distributions:
1(g) j y; 2(g,1) ; 3(g,1) ; : : : ; n(g,1)
2(g) j y; 1(g) ; 3(g,1) ; : : : ; n(g,1)
..
.
..
.
..
.
n(g) j y; 1(g) ;
..
.
2(g) ;
..
.
..
.
: : : ; n(g,) 1 :
These draws are stored and used to compute estimates of the posterior moments, probability intervals,
and other quantities of interest. It is standard practice to discard the rst m burn-in draws and use only
draws m + 1 to m + G to compute posterior quantities of interest. This helps eliminate sensitivity to initial
conditions, and helps to ensure that remaining draws are representative of the target distribution. For an
22
accessible introduction to the Gibbs sampler, we refer the reader to Casella and George (1992), Chib and
Greenberg (1996), and Albert and Chib (1996). For a more advanced theoretical discussion of the Gibbs
sampler, we refer the reader to Geman and Geman (1984), Tanner and Wong (1987), Gelfand and Smith
(1990), and Tierney (1994). The Gibbs sampling algorithm is a special case of the Metropolis-Hastings (MH) algorithm, which is another MCMC algorithm based on importance sampling (see Chib and Greenberg,
1995). Typically one uses the M-H algorithm when one or more of the conditional distributions are not of
standard form. We employ both algorithms in this paper.
B.2 The Bayes Factor
One distinct advantage of Bayesian inference over classical (frequentist) approaches is the ability to test
non-nested models. Central to the idea of Bayesian model comparison and hypothesis testing is the Bayes
factor (Kass and Raftery, 1995; Jereys, 1961).
The Bayes factor provides a convenient means to assess the amount of evidence in favor of one scientic
theory vs. that for another (Kass and Raftery, 1995, p. 777). If the model j and model k are equally likely a
priori, the Bayes factor for model j vs. model k (denoted Bjk ) is simply the ratio of the marginal likelihood
of the data conditional on model j to the marginal likelihood of the data given model k. Somewhat more
formally, in the Bayesian tradition, we assign prior probabilities that the data y was generated by model
Mi : Pr(Mj ) and Pr(Mk ). In practice, we assume Pr(Mj ) = Pr(Mk ) = 1=2. After observing the data y, we
are interested in the posterior probabilities: Pr(Mj j y) and Pr(Mj j y). Applying Bayes theorem, Kass and
Raftery (1995) demonstrate that,
Pr(Mj j y) Pr(y j Mj ) Pr(Mj )
Pr(Mk j y) = Pr(y j Mk ) Pr(Mk ) :
This ratio is dened as the Bayes factor. Thus, the Bayes factor Bjk between models Mj and Mk with
uniform priors is,
Pr(y j Mj )
Bj;k = Pr(
y j Mk )
R
f (yj; Mj )(jMj )d
= R
f (y j; Mk ) (; Mk )d
m(yjMj ) :
m
(y j M )
k
Unlike frequentist hypothesis tests, the Bayes factor is not interpreted with respect to critical values.
Instead, as Kass and Raftery (1995, p. 777) note, \[p]robability itself provides a meaningful scale dened by
betting." Reworking a table rst suggested by Jereys (1961), Kass and Raftery suggest the following rough
description of the information provided by the Bayes factor for scientic purposes (see Table B1).
[Table B1 about here.]
B.3 Computing Marginal Likelihoods
Because of the diculties involved in calculating the integrals in the marginal likelihoods, Bayes factors have,
until recently, played only a small role in applied work. However, recent advances in MCMC simulation have
greatly enhanced our ability to calculate accurate estimates of the marginal likelihood. In this article, we
specically use the reduced Gibbs sampling algorithm to estimate this quantity, oered by Chib (1995). Chib
relies on the identity,
m(y) = f (y (jj)y)() ;
which he terms the basic marginal likelihood identity. Fixing at , one can estimate the marginal likelihood
(on the logarithmic scale) as,
ln m
^ (y) = ln f (y j ) + ln ( ) , ln ^ ( j y):
23
This is an appealing formulation because it only requires an evaluation of the likelihood at one point, the
evaluation of the prior at one point, and an estimate of posterior ordinate at this same point. Chib (1995)
discusses the simulation error of this estimate, and illustrates how one can estimate the marginal likelihood
to any degree of precision using this computational technique.
The marginal likelihood can be computed at any point in the parameter space . In practice, one usually
chooses values of each parameter at a high density point. The rst quantity to calculate, f (y j ), is simply
the likelihood evaluated at . The second quantity is the prior, evaluated at the point . The nal quantity
^ ( j y) is the posterior density ordinate. Although Chib considers the general case, assume that a Gibbs
sampling algorithm has been applied to one vector block = 1 . The output from the Gibbs sampling
algorithm is therefore f1(g) gGg=1 . Chib demonstrates that an appropriate Monte Carlo estimate of (1 ; j y)
at point 1 is,
^ (1 j y) = G,1
G
X
g=1
(1 j y);
Chib demonstrates that this estimate is simulation consistent. Extending this strategy to more than one
vector block is relatively straightforward. In section 3.4 we demonstrate how this can be done. Once
f (y j ); ( ); and ^ ( j y) are in hand, computing the marginal likelihood is trivial.
C Appendix. Computation
The conrmatory factor analysis of the Dutch data was performed via maximum likelihood in SAS using
PROC CALIS. The conrmatory factor analysis of the Danish data was performed using Browne's asymptotically distribution free (ADF) method in the LISREL package. In the Danish case, a polychoric correlation
matrix was analyzed. The Dutch factor analysis used a product-moment (Pearson) correlation matrix as
input. Re-analysis of this data with the LISREL package using a polychoric correlation matrix and Browne's
ADF method produces nearly identical results to those reported here. Preliminary, exploratory factor analysis was conducted in STATA.
The missing Danish data was imputed using the S-PLUS function norm written by Joseph Schafer. This
function is freely available at http://www.stat.psu.edu/jls/misoftwa.html. Because of the computational complexity of some of the models employed here, single imputation was judged to be a satisfactory
compromise between full multiple imputation and simple listwise deletion. The imputations were constructed
using the maximum likelihood estimates of the means, standard deviations, and correlations from the EM
algorithm. The real-valued imputations of the ordinal variables were then rounded to the nearest ordinal
category. See Schafer (1997) for a discussion of this practice.
The bivariate kernel density estimates of voter ideal points were calculated using the S-PLUS function
kde2D of Guy Nason and Martin M
achler. This function is available via statlib: http://lib.stat.cmu.edu/.
The default bandwidth was used to construct the estimates appearing in the paper. Other reasonable choices
of bandwidth do not change the general appearance of Figures 1 and 2.
Estimation of the MNL and MNP models was done in GAUSS. The MNL code was written by the authors.
The code used to estimate the MNP models was generously provided by Sid Chib, Ed Greenberg, and Yuxin
Chen. The kernel density estimation routines used to estimate the posterior ordinates are Ruud H. Koning's
GAUSS implementations of the algorithms presented in Hardle (1990) and Silverman (1986). For all of the
1:34) where s denotes the samresults in this paper, a biweight kernel was used with bandwidth = 0:9 min(s;IQR=
n=
ple standard deviation, IQR denotes the interquartile range of the datapoints, and n is the sample size (see
Silverman, 1986). These routines are freely available at: http://www.xs4all.nl/rhkoning/gauss.htm.
The MNL models were t at a minimal computational cost { a few minutes on a 300 MHz Pentium II. The
MNP models took between 12 and 17 hours, depending on the dataset and the specics of the algorithm
employed.
All gures were made using S-PLUS.
1 5
24
References
Albert, James H., and Siddhartha Chib. 1993. \Bayesian Analysis of Binary and Polychotomous Response
Data." Journal of the American Statistical Association 88(June):669{679.
Albert, Jim, and Siddhartha Chib. 1996. \Computation in Bayesian Econometrics: An Introduction to
Markov Chain Monte Carlo." Advances in Econometrics A 11(June):3{24.
Alvarez, R. Michael, and Jonathan Nagler. 1995. \Economics, Issues and the Perot Candidacy: Voter Choice
in the 1992 Presidential Election." American Journal of Political Science 39(August):714{744.
Alvarez, R. Michael, and Jonathan Nagler. 1998. \When Politics and Model Collide: Estimating Models of
Multi-Candidate Elections." American Journal of Political Science 42(January):55{96.
Andeweg, Rudy B., and Galen A. Irwin. 1993. Dutch Government and Politics . New York: St. Martin's
Press.
Budge, Ian, David Robertson, and David Hearl. 1987. Ideology, Strategy, and Party Change . Cambridge:
Cambridge University Press.
Cahoon, L. 1975. \Locating a Set of Points Using Range Information Only." Ph.D. Dissertation, CarnegieMellon University.
Cahoon, L., Melvin Hinich, and Peter Ordeshook. 1975. \A Multi-Dimensional Statistical Procedure for
Spatial Analysis." VPI&SU and Carnegie-Mellon University: Typescript.
Casella, George, and Edward I. George. 1992. \Explaining the Gibbs Sampler." The American Statistician
46(August):167{174.
Chib, Siddhartha. 1995. \Marginal Likelihood From the Gibbs Output." Journal of the American Statistical
Association 90(December):1313{1321.
Chib, Siddhartha, and Edward Greenberg. 1995. \Understanding the Metropolis-Hastings Algorithm." The
American Statistician 49(November):327{336.
Chib, Siddhartha, and Edward Greenberg. 1996. \Markov Chain Monte Carlo Simulation Methods in
Econometrics." Econometric Theory 12(August):409{431.
Chib, Siddhartha, and Edward Greenberg. n.d. \Analysis of Multivariate Probit Models." Biometrika
Forthcoming.
Chib, Siddhartha, Edward Greenberg, and Yuxin Chen. 1998. \MCMC Methods for Fitting and Comparing
Multinomial Response Models." Washington University in St. Louis: Typescript.
Daalder, Hans. 1987. \The Dutch Party System: From Segmentation to Polarization { And Then?" In
Party Systems in Denmark, Austria, Switzerland, the Netherlands, and Belgium , New York: St. Martin's
Press.
DeGroot, Morris H. 1986. Probability and Statistics . Reading, MA: Addison Wesley.
DeSwaan, Abram. 1973. Coalition Theories and Cabinet Formation . Amsterdam: Elsevier.
Dow, Jay K. 1997. \Voter Choice and Strategies in French Presidential Elections." Paper presented at the
Annual Meeting of the Midwest Political Science Association.
Downs, Anthony. 1957. An Economic Theory of Democracy . New York: Harper & Row.
Enelow, James, and Melvin Hinich. 1984. The Spatial Theory of Voting: An Introduction . Cambridge:
Cambridge University Press.
25
Gelfand, Alan E., and Adrian F. M. Smith. 1990. \Sampling-Based Approaches to Calculating Marginal
Densities." Journal of the American Statistical Association 85(December):398{409.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 1995. Bayesian Data Analysis .
London: Chapman & Hall.
Geman, S., and D. Geman. 1984. \Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration
of Images." IEEE Transactions of Pattern Analysis and Machine Intelligence 6(November):721{741.
Geweke, John M. 1991. \Ecient Simulation form the Multivariate Normal and Student-t Distributions
Subject to Linear Constraints." In Computer Science and Statistics: Proceedings of the Twenty-Third
Symposium on the Interface , Fairfax, VA: Interface Foundation of America.
Green, Donald P., and Ian Shapiro. 1994. Pathologies of Rational Choice Theory: A Critique of Applications
in Political Science . New Haven: Yale University Press.
Greene, William. 1997. Econometric Analysis . Upper Saddle River, NJ: Prentice-Hall, third edition.
Hajivassiliou, V. A. 1990. \Smooth Simulation Estimation of Panel LDV Models." Typescript.
Hardle, Wolfgang. 1990. Applied Non-Parametric Regression . Oxford: Oxford University Press.
Institut fur Sozialwissenschaften and Europa-Institut of the Universitat Mannheim. 1983. European Elections
Study: European Political Parties' Middle-Level Elites . Koln: Koln Zentralarchiv.
Iversen, Torben. 1994. \Political Leadership and Representation in West European Democracies: A Test of
Three Models of Voting." American Journal of Political Science 38(February):45{74.
Jackman, Simon. 1997. \Pauline, the Mainstream, and Political Elites: the Place of Race in Austrailian
Political Ideology." Manuscript.
Jereys, H. 1961. Theory of Probability . Oxford: Oxford University Press, third edition.
Kass, Robert E., and Adrian E. Raftery. 1995. \Bayes Factors." Journal of the American Statistical Association 90(June):773{795.
Keane, Michael P. 1994. \A Computationally Practical Simulation Estimator for Panel Data." Econometrica
January:95{116.
Laver, Michael, and Norman Schoeld. 1990. Multiparty Government: The Politics of Coalition in Europe .
Oxford: Oxford University Press.
Lawrence, Eric. 1997. \A Simulated Maximum Likelihood Approach to the 1988 Democratic Primary."
Paper presented at the Annual Meeting of the Midwest Political Science Association.
Maddala, G. S. 1983. Limited-dependent and Qualitative Variables in Econometrics . Cambridge: Cambridge
University Press.
McCulloch, Robert, and Peter E. Rossi. 1994. \An Exact Likelihood Analyis of the Multinomial Probit
Model." Journal of Econometrics 64(September-October):207{240.
McFadden, Daniel. 1989. \A Method of Simulated Moments for Estimation of Discrete Response Models
without Numerical Integration." Econometrica 57(September):995{1026.
Ofek, Dganit, Kevin M. Quinn, and Itai Sened. 1998. \Voters, Parties, and Coalition Formation in Israel: Theory and Evidence." Paper presented at the Annual Meeting of the Midwest Political Science
Association.
26
Quinn, Kevin M., Andrew D. Martin, and Andrew B. Whitford. 1996. \Explaining Voter Choice in MultiParty Democracy: A Look at Data from the Netherlands." Paper presented at the Annual Meeting of the
American Political Science Association.
Rabier, Jacques-Rene, and Ronald Inglehardt. 1981. Euro-Barometer 11 - April, 1979: The Year of the
Child in Europe . Ann Arbor, MI: Inter-University Consortium for Political and Social Research.
Rabinowitz, George, and Stuart Elaine MacDonald. 1989. \A Directional Theory of Issue Voting." American
Political Science Review 83(1):93{121.
Schafer, Joseph L. 1997. Analysis of Incomplete Multivariate Data . London: Chapman & Hall.
Schoeld, Norman, and Itai Sened. 1998. \Political Equilibrium in Multiparty Democracies." Paper presented
at the Annual Meeting of the Midwest Political Science Association.
Schoeld, Norman J., Andrew D. Martin, Kevin M. Quinn, and Andrew B. Whitford. n.d. \Multiparty
Electoral Competition in the Netherlands and Germany: A Model Based on Multinomial Probit." Public
Choice Forthcoming.
Shepsle, Kenneth A. 1991. Models of Multiparty Electoral Competition . Chur: Harwood Academic Publishers.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis . London: Chapman and Hall.
Tanner, M. A., and W. Wong. 1987. \The Calculation of Posterior Distributions by Data Augmentation."
Journal of the American Statistical Association 82(June):528{550.
Taylor, Michael, and Michael Laver. 1973. \Government Coalitions in Western Europe." European Journal
of Political Research 1:205{248.
Tierney, Luke. 1994. \Markov Chains for Exploring Posterior Distributions." Annals of Statistics
22(August):1701{1762.
Whitten, Guy D., and Harvey D. Palmer. 1996. \Heightening Comparativists' Concern for Model
Choice: Voting Behavior in Great Britain and the Netherlands." American Journal of Political Science
40(February):231{260.
27
Fly UP