...

Estimating voter preference distributions from individual-level voting data

by user

on
Category: Documents
13

views

Report

Comments

Transcript

Estimating voter preference distributions from individual-level voting data
Estimating voter preference distributions from
individual-level voting data
(with application to split-ticket voting)
Preliminary and incomplete, comments welcome, version 1.0
Jerey B. Lewis Department of Politics
Princeton University
August 30, 1998
Abstract
In the last decade a great deal of progress has been made in estimating spatial
models of legislative roll-call voting. There are now several well-known and eective
methods of estimating the ideal points of legislators from their roll-call votes. Similar
progress has not been made in the empirical modeling of the distribution of preferences in the electorate. Progress has been slower, not because the question is less
important, but because of limitations of data and a lack of tractable methods. In this
paper, I present a method for inferring the distribution of voter ideal points on a single
dimension from individual-level voting returns on ballot propositions. The statistical
model and estimation technique draw heavily on the psychometric literature on test
taking and, in particular, on the work of Bock and Aitken (1981). The method yields
semi-parametric estimates of the distribution of voters along an unobserved spatial
dimension. The model is applied to data from the 1992 general election in Los Angeles
County. I present the distribution of voter ideal points of each of 17 Congressional
districts. Finally, I consider the issue of split-ticket voting estimating for two Congressional districts the distribution of voters that split their tickets and of those that did
not.
Prepared for presentation at the Annual Meeting of the American Political Science Association, Boston MA, September 1998.
Addresses: Department of Politics, Corwin Hall, Princeton University, Princeton NJ 08544 and
[email protected].
This research was supported by a grant from the Woodrow Wilson School of
Public and International Aairs, Princeton University.
1 Introduction
Over the last decade great strides have been made in the empirical application of spatial
models of policy preferences. Developments such as Poole and Rosenthal's Nominate scores
(1985, 1997) and Heckman{Snyder (1997) scores and the application of these estimators to
roll-call voting in the U.S. Congress have greatly increased our understanding of legislative
behavior. While progress has been made in the mapping of legislative preferences, similar
progress has not been made in the mapping of voter preferences. This is unfortunate because in the end it is the preferences of voters that are crucial to our understanding of the
democratic process. While locating legislators is critical to understanding the workings of legislatures, ultimately we want to understand how voter preferences and legislator preferences
are linked. In this paper, drawing on models of test-taking from the psychometrics literature, I present a method for describing the distribution of voter preferences (ideal points)
over a singe spatial dimension. I then apply the method by estimating the distribution of
voter preferences in Los Angeles County Congressional districts using individual-level voting
returns from the 1992 general elections. To demonstrate the potential application of this
method to a substantive question, I consider the problem of split-ticket voting by mapping
the preference distributions of voters that split their Presidential and Congressional votes
versus voters that did not split their votes in each of Los Angeles' seventeen Congressional
districts.
The empirical estimation of voter ideal points has lagged behind the estimation of legislative ideal points because of both methodological and data limitations. As described below,
the sorts of data that we generally have for voters is not amenable to estimation techniques
such as Heckman{Snyder or Poole{Rosenthal. Not only do we tend to observe many fewer
bits of data from which to infer each voter's preferences, but voter behavior tends to be more
stochastic than legislator behavior. Putting aside the possibility that the highly stochastic
nature of the observed behavior of voters means that they have no political preferences (Converse and Markus 1979), at the very least, the amount of information about voter preferences
1
A excellent example of these advances is the \committee outlier" literature, see Groseclose (1994) or
Londregan and Snyder (1994).
1
2
that can be gleaned from a particular vote or answer to a particular survey item is much less
for members of the public than for members of the legislature. Thus, not only do we observe
relatively few choices for each voter, but each choice contains relatively little information.
In general, the \actions" of voters that we observe are answers to survey questions. The
methods described below could be used to model answers to survey questions. However,
the model is developed specically for a relatively rare, but potentially very interesting data
source{actual individual-level ballots. These data are available for a series of elections in Los
Angeles County, California and are probably available for other jurisdictions that employ the
same vote tabulating equipment. The way in which these data were collected is described in
the appendix. The advantage of these sort of data is that they measure the actual decisions of
voters. That is, they directly measure the behavior of interest. In developing a model of how
people express their political preferences, what could be better than actual measurements of
that behavior? The Los Angeles data includes votes on referendum and initiative measures.
These data are particularly valuable because proposition votes should be relatively free of
the strategic considerations that may be present in candidate elections (such as the balancing
incentives posited by Alesina and Rosenthal (1994)). Indeed votes on ballot propositions can
be seen as directly analogous to the roll-call votes of legislators. In many cases, identical or
similar measures had been considered in the legislature (see Gerber 1994; or Lewis 1996).
However, these data do present some drawbacks. The main drawback is that they contain
no demographic or socio-economic information about the individual voters. This loss is
costly because these variables can be used as instruments in estimating voter preferences.
If we know that blacks tend to be more liberal than whites, or women more liberal than
men, we could use information on race and gender in deriving estimates of individual voters
preferences. Since the individual votes themselves only noisily reect voters' preferences,
this auxiliary information is potentially quite valuable.
In other instances, individual voter data is missing altogether. Attempts to measure
voter preference distribution from aggregate data are relatively common in the literature
(for examples, see Lewis 1997). While the process of aggregation may \average out" much
of the apparent randomness in individual voting or survey data, aggregation introduces
new methodological pitfalls. Inferring characteristics of individuals from aggregate data is
3
dangerous at best, despite the recent advances in this area such as King (1996). In this paper,
I put aside question of aggregate data for estimating the distribution of voter preferences
(see Heron (1998) for a careful consideration of this problem).
1.1 Dierences between voter and legislator decision data
The two most striking dierences between the data from which we can estimate legislators
preferences and the data from which we can estimate voter preferences is the number of
observed actions per individual and the greater degree of randomness in the responses of
voters. In this section, I will briey describe these dierences and the challenges they pose
for extending models of legislators to voters.
The rst dierence is rather obvious. Assuming that voting on roll calls is generally
sincere (that there is little vote trading or log rolling), we can use votes on roll calls to infer
legislators' preferences. In any recent U.S. Congress, we observe at least 500 roll-call votes.
If we assume members' preferences are constant over time, we have literally thousands of
votes from which to estimate a given member's spatial location. This situation is not unique
to the U.S. Congress. For example, all votes of the California State Legislature are recorded
yielding several thousand choice observations per legislator. In contrast, rarely do we observe
more than a handful of actions from which to infer voter preferences. With fewer observations
on the decisions of any given individual, our ability to infer a given individual's preferences
is greatly reduced.
Two factors mitigate this loss of information. While there are generally fewer observations
per voter, there tends to be many more observations per vote. Even a large legislature, such
as the U.S. House has only 435 members as compared to the typical survey of 2000 voters
or the Los Angeles County voting data that has over 2 million observations. With relatively
few observations per decision it is dicult to infer the spatial locations of the alternatives
being voted over. Intuitively, when we cannot identify the alternatives involved in a decision,
it is hard to infer preferences from observations on that decision. As I describe in the next
section, many of the models that have previously been developed require both the number
2
Unless otherwise noted, I will use the term \vote" or \roll call" to refer to a particular measure to be
decided.
2
4
of decision makers and the number of decisions to be very large. Also mitigating against
the relatively small number of observed decisions is that we are generally less interested in
the preferences of a particular voter than we are in the preferences of a particular legislator.
The method developed in this paper capitalizes on this fact by estimating the distribution
of voter preferences rather than the preferences of each individual voter. Considering the
distribution of voters eectively averages away much of the individual-level uncertainty.
The second dierence between legislator and voter data is that the voter data appear far
more stochastic. As mentioned above there is a long literature on whether the stochastic
nature of survey data implies that voters simply do not have coherent policy preferences
at all. While the empirical results presented in this paper certainly suggest that there is
underlying structure to the data that is consistent with a spatial model, it is nevertheless
true that the voting data is quite random.
It has sometimes been suggested that the reason voter data appears random is that survey
instruments are not taken seriously by the respondents or that survey questions are phrased
in a way that is unfamiliar to voters, I present evidence that in fact this randomness extends
to actual voting data as well. In Table 1, I show the correlations among votes for the statewide propositions on the Los Angeles County ballot. These correlations are generally very
low. For example even the highly charged welfare reform (Proposition 165) and term limits
(Proposition 164) questions were only correlated at 0:36. The large degree of randomness in
each observed decision creates the same problems for estimation as observing relatively few
decisions. It makes inferring preferences more dicult.
Given these dierences in the features of the data, I now turn to the methods that have
been used to infer legislators' preferences. I describe them with particular attention to these
dierences.
3
The total number of voters in the 1992 Los Angeles County election was 2; 831; 077. In what follows,
unless otherwise noted, I will consider only those voters who voted for major party candidates and who did
not abstain on any of the propositions. While this roughly 1=3 of the electorate probably diers from the
rest in some interesting and relevant ways, the models developed in this paper do not address abstention or
voting for minor-party candidate.
3
5
Correlations among ballot proposition votes in Los Angeles, 1992
Proposition p155 p156 p157 p158 p160 p161 p162 p163 p164 p165 p166
p156
0.54
p157
0.13 0.20
p158
0.36 0.41 0.17
p160
0.14 0.15 0.20 0.13
p161
0.11 0.09 0.07 0.10 0.06
p162
0.16 0.17 0.15 0.16 0.17 0.09
p163
-0.07 -0.06 0.17 -0.05 0.20 -0.02 0.07
p164
-0.19 -0.19 -0.04 -0.23 0.06 0.01 -0.01 0.16
p165
-0.22 -0.19 -0.05 -0.18 -0.02 0.04 -0.06 0.07 0.36
p166
0.16 0.08 0.08 0.02 0.12 0.13 0.14 0.07 0.08 0.02
p167
0.26 0.21 0.14 0.21 0.15 0.19 0.20 0.04 -0.09 -0.17 0.29
Table 1: The table shows the correlation among proposition votes on the 1992 Los Angeles
County Ballot. n=826,159 (all voters who voted for major party candidates for each federal
oce and voted on all of the propositions).
1.2 Spatial models of legislators' preferences or spatial locations
Despite the dierences in the observed data, models of legislators preferences are the natural
place to start in constructing a technique for inferring voter preferences. Thinking about
dierences in legislators' voting records in a spatial way has a long history, perhaps the
earliest example of this sort of analysis is Thurstone (1931). The methods employed in this
early work were not based on any explicit theoretical model of preferences and were mainly
concerned with identifying voting coalitions in the Congress, various state legislatures, and
even the United Nations [ADD CITES].
Recent advances in the spatial analysis of roll-call data are explicitly grounded in formal
models of voting. In these spatial theories of voting (see Hinich and Enelow (1984)), public
policy alternatives are represented as points in space and legislators' preferences over those
points are dened by the distance between the policy point that the legislator would most
like to see implemented (the ideal point) and each of the alternatives. With this theoretical
framework (articulated further below), estimated locations of the legislators take on a new
signicance. That is, the spatial locations of the legislators can be interpreted as their ideal
6
Methods of Estimating Legislator/Voter Locations
Estimator
Method
type
Consistent as: Citation/example
Vote index
NP
n/a
ADA score
Guttman Scaling
NP
n/a
Anderson et. al., 1966
Heckman{Snyder scores
GLS
nv ! 1
Heckman and Snyder (1997)
nominate scores
ML
nv ; nL; nnLv ! 1 Poole and Rosenthal (1997)
Random proposal models
MML
nv ! 1
Londregan (n.d.)
Rasch models
CML
nv ! 1
Lahda (1991)
Covariate models
ML
nL ! 1
Peltzman (1985a)
Random eects covariate models MML
nv ; nL ! 1 Bailey (1998)
Table 2: Table lists commonly used methods of ideal point estimation. NP = Non-parametric,
GLS = Generalized Least Squares, ML = Maximum Likelihood, CML = Conditional Maximum Likelihood, MML = Marginal Maximum Likelihood, nL = Number of legislators/voters,
nv = Number of votes.
points. As it turn out, a spatial voting model interpretation can be given (roughly speaking)
to many of the earlier methods (Heckman and Snyder 1997).
Table 2 lists a number of techniques that have been used to estimate legislators' spatial
locations or ideal points. Each of these methods represents a potential approach to the
problem of estimating voter ideal points. In the end, none are particularly suited to the
problem. However, some consideration of their properties highlights the relevant data and
statistical issues involved.
While some of these models are explicitly designed to place legislators in a multidimensional space others are restricted to a single dimension. A few can be extended to multiple
dimensions only with a great increase in computational cost (e.g. Rasch model and randomeects models). Since, I seek only to describe voters along a single spatial dimension, my
consideration of these existing method deals only with their one-dimensional variants.
The most important statistical issue is the consistency of the estimated ideal points. The
4
5
In discussing the methods, I use the terms \location" and \ideal point" interchangeably.
In future work, I plan to extend the model developed below to multiple dimensions. However, most
previous work on the dimensionality of politics suggests that a single dimension can account for a great deal
of the observed variation in legislators voting records (Poole and Rosenthal 1997).
4
5
7
problem is most easily seen from a maximum-likelihood perspective. Supposing a maximumlikelihood estimator for each of the ideal point models could be written, it would have the
form L(V j; ) where V is a nl nv matrix of observed votes (the data), is a vector
of parameters describing characteristics of the proposals, and is a vector of parameters
describing the distribution of ideal points. Suppose for starters that each of the nv proposals
to be voted on has one or more elements of associated with it and that is a vector of
nl ideal points. Consider the ML estimators for and as the size of the data matrix
increases. As the number of legislators nl grows by one so does the size of . Similarly, as nv
grows so does the size of . This proliferation of parameters as the sample size increases is
well-known to undermine the standard consistency results for ML estimators (Neyman and
Scott 1948).
One way around this problem has been to show that estimates of will be consistent
under a so-called \triple" asymptotic condition (Haberman 1977). In these cases, can be
consistently estimated if the following three conditions hold: (1) the number of roll calls
goes to innity, (2) the number of legislators goes to innity, and (3) the ratio of votes to
legislators goes to innity. In other words, these estimators will work if you have a large
legislature that takes a lot of roll call votes. While I have not extended Haberman's triple
asymptotic result to Poole and Rosenthal's Nominate procedure, I believe their method
is consistent under these conditions. From the standpoint of estimating voters preferences
(as opposed to legislators' preferences), this is certainly not going to be a compelling result.
While it may be correct to think of the number of voters as approaching innity, the number
of votes is not large and certainly in no way could we think of the number of votes over the
number of voters as large.
In cases where the triple asymptotic condition seems unlikely to hold, several alternative
solutions have been presented. The rst is to specify the model in such a way their exists a
sucient statistic (S ) of the data for . In this case the likelihood can be reformulated as:
6
L(V j; ) = Lc(V j; S )g(S j)
It should be noted that under this triple asymptotic condition the will still not be consistently
estimated.
6
8
Since the corresponding log likelihood,
ln L(V j; ) = ln Lc(V j; S ) + ln g(S j);
is additively separable in and , the value that maximizes L() over will be the same
as the value that maximizes Lc() over . Assuming Lc() meets the usual conditions for
consistent ML estimation, we see that we can get consistent estimates of as nv grows
large by maximizing the likelihood of conditional on S . This method is referred to as
Conditional Maximum Likelihood. Both the Heckman{Snyder and (one-parmaeter) Raschtype models take this approach. From the standpoint estimating voter preferences, even
these models require that a large number of votes be observed.
The above approaches can all be put under the general heading of xed-eects models.
That is, they treat all of the vote characteristic and the ideal point parameters as xed
constants to be estimated. Another approach would be to treat the set of proposals as a
draw from some distribution, g(). The likelihood of L(V j) can then be written as
Z
L(V j) = L(V j; )dG():
(1)
Integrating out the nuisance parameters () and maximizing (1) over is referred to as
Marginal Maximum Likelihood. Londregan's (n.d.) random coecient model takes this
approach. Londregan presents a model in which g() is conditioned on a set of proposalspecic covariates (e.g. party of the proposer). This use of auxiliary information weakens
the importance of the arbitrary distributional assumption, g(). The coecients on the
covariates are also of direct substantive interest. The model is consistent as the number of
votes grows large. Since Longregan's method still requires the number of proposals to grow
large, it is not appropriate to the problem at hand, but it does suggest a possible course.
Rather than using auxiliary information to help identify the distribution of proposal
characteristics, we could parameterize the idel points using covariates. This is quite common
in the literature. The most common form of the \covariate" models are those in which the
9
legislators' ideal points are assumed to be a deterministic linear function of a set of covariates,
i = Xi
where i is legislator i's ideal point and Xi is a vector of observed legislator attributes (e.g.
party, constituency characteristics, or previous occupation). These deterministic models
are commonly used (at least implicitly) in both models of legislative voting and candidate
elections. Generally these models involve running a probit or logit regression of a single roll
call or vote choice on a vector of covariates. The ideal points can then be estimated as Xi^.
While these models can be applied to situations in which many decisions are observed, the
estimated ideal points are consistent with a single observed roll call as the number of voters
or legislators grows large.
The obvious problem with the deterministic covariate model is that it is highly restrictive
to assume that the ideal points can be written as a deterministic function of a set of covariates. Surely, unobserved traits must also aect the values of . Bailey (1998) generalizes
the deterministic covariate model by assuming that the ideal points are a linear function of
a set of covariates and a legislator specic random shock,
i = Xi + i
Bailey's model can be thought of as a mirror image of Londregan's. Bailey treats as a set
of xed eects to be estimated and integrates over the random ,
Z
L(V j; ) = L(V j; )dG(jX; ):
Having assumed a distribution for , Bailey can then consistently estimate the distribution of
legislator ideal points as the number of legislators grows large. To nd a particular legislator's
ideal point, the a posteriori expectation E(i jV; ; ) can be estimated using estimated the
^ and ^ in place of and . The consistency of these estimates require that the number of
legislators and the number of rolls calls grow large.
Because it is only the distribution of voter ideal points and not each individual's ideal
10
point that is of interest, the random-eects covariate model seems promising. It provides
estimates of the parameters of the conditional ideal point distribution as the number of
legislators (voters) grows large. However, the voting data contains no covariates. The
challenge is to develop a model that will allow us to estimate consistently the distribution
of ideal points without covariates.
2 The basic spatial model
My statistical model begins from the standard spatial model of voting (Hotelling 1929, Downs
1957, Black 1958, Hinich and Enelow 1984). In the spatial model, it is assumed that policy
choices can be represented by points in Euclidean space. In what follows, I will assume
that this space is one-dimensional. That is, that much of politics is contested over a single
left-right dimension. Each voter is assumed to have a most preferred policy position in
the space . A voter's utility for various policy alternatives is dened by a function of the
distance between the position of the alternative and the voter's ideal point. Following the
usual convention in the literature, I assume this function is a simple quadratic. In order
to introduce uncertainty into the vote choice, I assume that voters' utilities for various
alternatives are not solely determined by their spatial positions but are also determined by
an additive idiosyncratic shock . Thus, the utility for a voter at from the implementation
of a policy A is,
U (; A) = ,( , A) + 2
where, by assumption, N(0; ) and i.i.d. across alternatives.
Assume that all choices are over exactly two alternatives (there is no abstention). The
dierence between the utility provided by any two alternatives A and A is:
1
2
U (; A ) , U (; A ) = ,( , A ) + ( , A ) + + = (A , A ) + 2(A , A ) , ( , )
1
2
1
2
2
2
2
1
2
1
2
2
1
2
1
2
Dene the cutpoint (c) as the point such that a voter located at = c is indierent in
expectation between A and A . Setting E (U (; A ) , U (; A )) = 0 and noting that E( ,
1
2
1
11
2
1
) = 0, we have:
2
0 = (A , A ) + 2(A , A )c
2(A , A )c = (A , A )
c = A +2 A
1
2
2
2
2
1
2
1
2
2
1
1
2
2
Assuming sincere voting in the sense that voters vote for the alternative that they prefer.
the probability that a voter with ideal point votes for alternative A over alternative A
is:
Prob(Vote for A j) = Prob (A , A ) + 2(A , A ) > ( , )
7
1
1
2
2
p
2
1
1
2
2
2
1
Because N(0; ). , N(0; 2),
2
1
+ 2(A , A )
Prob (A , A ) + 2(A , A ) > ( , ) = (A , A ) p
2
2
2
2
1
1
2
2
2
2
1
2
1
1
!
2
where () is the standard normal cumulative distribution. Letting Yij represent voter i's
choice over fA j ; A j g where Yij = 1 denotes the choice of A j and Yij = 0 denotes the choice
2
2
of A j and letting j = Ap2 ,Aj 1 and j = Ap1 ,Aj 2 , we nd the familiar probit model.
1
2
2
1
(
)
2
2(
)
8
2
Prob(Yij = yij ) = (j + j i )yij (1 , (j + j i)) ,yij
(1
)
By the denition of , , and c it is easy to derive that cj = ,j =j . Thus, the cutpoint cj
does not depend on j . As we will see, this is convenient because the parameter j is not
identied. While the presence of the unidentied j precludes estimation of A and A , c
can still be estimated.
1
2
Of course, since the each voter has virtually no chance of changing the outcome of the election with her
vote, voting for her preferred outcome is not a strictly dominating strategy.
8 Dening = X where X is a vector of observed characteristics, we have the deterministic covariates
model of ideal point estimation described above.
7
12
3 Estimating ideal point distributions
Given this basic model of each vote choice, I now turn to the question of estimating the
parameters of the model. In order to go from the theoretical model presented above to a
statistical model, I must make some additional assumptions. First, I assume that, conditional
on the parameters, the vote choice probabilities are independent across of votes and voters.
That is, each decision made by each voter is an independent draw. Thus, the likelihood of
observing a vector of votes (Yi) by a voter i located i is
L(Yij; ; i) /
Y
j
(j + j i )yij (1 , (j + j i)) ,yij :
1
I treat the unobserved ideal points ('s) as random parameters. Following Bock and
Aitken (1981), I begin by assuming that the distribution of is standard normal. Given this
assumption of the vectors vote parameters = ( ; ::; j ) and = ( ; ::; J ) by Marginal
Maximum Likelihood. The likelihood function to be maximized is
1
1
Z
L(Yij; ) = L(Yij; ; )()d
where () is the standard normal density. Because there is no closed form solution to this
integral, I approximate the integral with Gauss-Hermite quadrature (Stroud and Secrest
1966). The integral is replaced by a sum,
L(Yij; ) X
q
L(Yij; ; zq )Q(zq );
(2)
over a set of Q points z = fz ; ::; zQg and the density () is replaced by a set of weights A(zq ).
Stroud and Secrest (1966) give the sets of points and weights for various values of Q. The
larger is Q the more accurate the approximation. In general, the order of the approximation
is not less than 2Q , 1. In what follows, I have selected a 51 point quadrature (Q = 51).
Intuitively, the continuous distribution () is being approximated by a discrete distribution
that has support on z and for which Prob(z = zq ) = A(zq ).
Given that vote choices are assumed to be independent across voters, the total log like1
13
lihood can be written as
ln L =
N
X
i=1
ln
X
q
!
L(Yij; ; ; zq)A(zq )
Given that we would like to t this model to a data set with over 800 thousand observations,
this rendering of the likelihood that involves N Q evaluations of L(Yi; ; ) which itself
involves the product of J calls of the normal CDF function is unattractive. However, the
situation can be greatly simplied. If J is not overly large, the number of patterns in the
data 2J is considerably smaller than the number of voters N . Since the vector of votes is all
that I observe for each voter the value of ln(Pq L(Yi; ; )A(zq )) is clearly the same for all
voters who share the vote vector Yi. Thus, dening rk to be the total number of voters that
share a vote prole Xk , the likelihood can be rewritten as
ln L =
2J
X
k=1
rk ln
X
q
!
L(Xk j; ; zq )A(zq ) :
(3)
This rendering allows data sets with an arbitrarily large number of voters to be estimated
using a conventional statistical package such as Gauss (tm) so long as the number of observed
vote choices is relatively small. (3) can be maximized directly using standard numerical
maximization techniques.
However, even greater numerical eciencies can be achieved through an application of
the EM algorithm. While I will not present a full discussion of the algorithm, it can be
thought of as a method for maximizing certain likelihood functions in the presence of missing
information (Dempster, Laird and Rubin 1977). The method proceeds in Expectation-steps
and Maximization steps. In the E-step the expectation of the missing information given the
data and provisional values of the parameters to be estimated are calculated. Then, in the
M-step the likelihood is the maximized substituting the estimated expected values from the
E-step for the missing information. The procedure is then repeated with the next E-step
using the new estimates of the parameters from the M-step. The process ends when the
estimated parameters are stable from one iteration to the next.
Bock and Aitken (1981) demonstrate that the likelihood derived here is amenable to the
14
EM method. They show that the likelihood (2) can be reformulated in the following way.
Let nq be the number of voters at quadrature point q (by assumption of the quadrature
approximation, all voters are located at one of the Q points, that is P nq = N ) and rjq be
the number of voters at point zq for whom Yij = 1. The log likelihood can then be written
as:
XX
ln L =
A(zq ) [(nq , rjq ) ln(1 , (j + j zq )) + rjq ln (j + j zq )]
(4)
j
q
This would be straight-forward to estimate if nq and rjq were observed. Since each of fj ; j g
pairs are additively seperated in (4), the likelihood could be maximized over each fj ; j g
separately. Indeed we see that (4) is simply a series of weighted grouped probit likelihoods
each of which can be maximized very eciently using standard techniques. However, the n's
and r's are not observed. They are the missing information to be calculated in the E step.
Letting rjq be the estimate of E(rjq jY; ^; ^) and n q be the estimate of E (nq jY; ^; ^). The
EM algorithm proceeds in the following steps.
1. Pick starting values for ^ and ^.
2. E-step: Calculate rjq and nq
3. M-step: Find new estimates of and (call them ~ and ~) by maximizing
ln L =
XX
j
q
A(zq ) [(nq , rjq ) ln(1 , (j + j zq )) + rjq ln (j + j zq )]
over and .
4. Repeat from step two using the new estimates ~ and ~.
The iterations stop when ~ and ~ are stable from one iteration to the next.
Using this EM algorithm greatly increases the speed of convergence. However, the estimation is still quite computationally intensive. Implemented in Gauss (tm), the large runs
shown below take several hours to converge on a 300-Mhz Pentium II based computer.
Up to now, I have assumed that the voter ideal points are normally distributed in the voter
population. This assumption is highly restrictive. We are interested in possible features of
the distribution (such as multi-modality and skew) that are assumed away by the imposition
9
9
In the applications that follow, iterations were stopped when the largest parameter change was less than
10,4.
15
of normality. Again following the suggestion of Bock and Aitken (1981), I extend the model
above to a semi-parametric quasi-Bayes model. I begin by estimating the parameters and using the normal prior distribution of in the way described above. More exactly, I
assume that the prior distribution of can be approximated by a discrete distribution A(z)
over the point z, where z and A(z) are dened by the Gauss-Hermite quadrature formulas
(as above). After maximizing the likelihood using this prior distribution A(z), I calculate
the a posteriori ideal point distribution over the points z. Applying Bayes' rule,
A~t(z) = P
P
^ z)At,1 (z)
k rk L(Xk j^; ;
P
^
k
q rk L(Xk j^; ; z )At,1 (zq )
:
This posterior estimate of the distribution of is then substituted for A() in the likelihood
and the EM procedure described above is reapplied using this new \prior" distribution for
. This process is then repeated until At(z) At, (z) for all z = fz ; ::; zQg where the
subscript on A() indicates the number of iterations. In practice, convergence of A() occurs
after about 15 iterations.
Several aspects of the model are still left for future work. First, I have yet to estimate
standard errors for model. Calculating standard errors conditional on A() is straightforward,
more challenging is the calculation of the unconditional standard errors and the standard
errors of the elements of A~(). For the applications given below the lack of standard error
estimates is relatively unimportant due to the huge same size. With a sample of nearly one
million observations, the sampling uncertainty is negligible, especially in relation to specication uncertainty. Thus even correctly calculated standard errors would greatly understate
the true uncertainty of the estimates. The second aspect of the model that must be left
for future work is a test of model t. Likelihood ratio tests for this sort of model have
been presented in the literature (conditional on Q()) and should be relatively easy to apply.
However, with such a large sample size the rejection of any particular null is very likely.
Finally, I need to conduct a systematic set of monte carlo experiments. Casual monte
carlo experiments show that the model can accurately recover the proposal parameters ('s
and 's) when the true distribution of is standard normal. When the distribution of is
non-normal the experiments are a bit trickier because the Maximum Likelihood estimates
1
16
1
are not unique. In particular, the location and dispersion of the distribution of is not
identied. The same likelihood can be achieved for ^ = b + a as for simply by dening
^ = , a and ^ = =b. The iterative method used to infer the distribution of does
not make any a priori parameter restrictions that identify the dispersion and location of .
Rather, the imposition of the normal prior and the stepwise way in which the distribution
of the is determined dene which of the (observationally equivalent) distributions is
selected. Because the \true" values of the parameters are not uniquely dened a priori,
formal monte carlo is complicated. That is, it is not clear how to determine the deviations
of the parameter estimates from their \true" values. However, casual examination suggests
that the model works fairly well. For example, the model is able to recover skewed and
multimodal distributions. However, more monte carlo work needs to done to determine
the statistical properties of the estimator.
10
4 The Distribution of Voter Ideal points in Los Angeles County
Having described the method, I will now apply it to inferring the distribution of voter
preferences in each of Los Angeles County's 17 Congressional districts. In order to do this I
extend the model in a way that is notationally cumbersome, but intuitively transparent. I
rewrite the likelihood such that the proposal parameters ('s and 's) are constrained to be
constant across districts while allowing each district to have its own distribution (A()).
Thus, assuming that j is constant across districts, the estimated distributions are directly
comparable across districts. That is, all of the districts are mapped using the same scale.
While I cannot say whether the variance of preferences in a given district is large or small
in absolute terms, I can say that one district's variance is larger than another district's or
that the median voter in one district is to the left of the median voter in another district.
The data used for the estimation are the votes of all individuals who voted on all statewide ballot propositions and who supported a major party candidate for President and
Congress in the 1992 general election. While the selection of this subset of voters most
Since is not observed its scale is obviously completely arbitrary. Thus, this lack of identication is
purely a technical, rather than a substantive, challenge.
10
17
likely introduces some bias, it does have some advantages. The model I developed does
not explicitly incorporate the decision to abstain. Nor does it attempt to explain voting
for third-party candidates. Thus without extending the model in a large (though perhaps
tractable) way, I would have to treat these voters in an ad hoc way. Also since the ultimate
goal is to test theories of split ticket voting that assume no abstention and no minor party
candidates, the friendliest testing ground is clearly those voters who never abstain and only
support major parties. If the models do not t this subset of the electorate they probably do
not t the overall electorate either. Except in two districts where there were no Republican
Congressional candidates (districts 34 and 37), the selection eect is unlikely to be large.
Even in those districts it seems likely that the number of voters who abstained because of
the lack of a Republican alternative is probably relatively small (presumably that is why no
Republican candidate appeared!).
In principle all voter decisions could be used to help identify the ideal points. Indeed
even sets of oces that are not common to all of the voters in the sample such as state
assembly and state senate could be incorporated into the likelihood. However, because
I am ultimately interested in possible strategic voting behavior, I apply the model only to
proposition votes. In doing so I assume that propositions are (at least relative to candidate
elections) voted on sincerely and independently. With 12 state-wide proposition votes on
the California ballot in 1992, I still have a relatively large number of decisions from which
to identify the distribution of .
Table 3 shows the estimated 's, 's, and c's in the upper panel and the means, variances,
and medians of the ideal point distributions in the lower panel. Looking rst to the proposal
parameters, we see that preference dimension uncovered in the analysis has a standard leftp
right interpretation. Remember that = 2(P , SQ)= where P is the location that would
be implemented if a given proposition was implemented and SQ is the location of the status
quo that would prevail if the proposition failed. Thus the sign of describes the direction
of the proposed policy move. In this case negative 's seem to be consistent with \left"
moves. For example, progressive tax reform (prop. 167) and the bond issues (prop. 155 and
11
11
Use of the EM algorithm make such disjoint sets of votes easy to incorporate.
18
Estimated proposition parameters (; ; c)
Parameter:
Proposition
Prop. 155
Prop. 156
Prop. 157
Prop. 158
Prop. 160
Prop. 161
Prop. 162
Prop. 163
Prop. 164
Prop. 165
Prop. 166
Prop. 167
Description
School bonds
Rail bonds
Allow toll roads
Reorganizing the oce of
the legislative analyst
Property tax exemption
Assisted suicide
Alter state employee
retirement fund
Repeal sales tax on snack foods
Congressional term limits
Welfare and budget reform
Single-payer health care
Progressive tax reform
0.144
0.061
-0.601
-0.251
-1.641
-1.580
-0.390
-0.959
c
0.088
0.038
-1.543
-0.262
0.089 -0.307 0.289
-0.277 -0.229 -1.211
-0.072 -0.323 -0.223
0.398
0.298
-0.016
-0.572
-0.461
0.054
0.471
0.527
-0.275
-0.619
-7.315
-0.634
0.030
-2.083
-0.744
Estimated medians, means, and variances of the voters' ideal points
District
District 24
District 25
District 26
District 27
District 28
District 29
District 30
District 31
District 32
District 33
District 34
District 35
District 36
District 37
District 38
District 39
District 41
Member
Med()
Beilenson (D,85)
0.000
McKeon (R,5)
0.438
Berman (D.95)
0.000
Morehead (R,0)
0.438
Drier (R,0)
0.438
Waxman (D,100)
0.000
Becarra (D,95)
-0.438
Martinez (D,90)
0.000
Dixon (D,80)
-0.876
Roybal-Allard (D,95) -0.438
Torres (D,90)
0.000
Waters (D,100)
-0.438
Harman (D,85)
0.000
Tucker (D,90)
-0.438
Horn (R,40)
0.438
Royce (R,20)
0.438
Kim (R,10)
0.000
E()
0.154
0.559
-0.162
0.418
0.460
0.146
-0.335
-0.127
-0.872
-0.240
-0.007
-0.625
0.018
-0.576
0.342
0.474
0.200
V()
0.902
0.553
0.984
0.687
0.626
0.829
0.938
0.916
1.111
0.699
0.708
1.552
0.840
0.935
0.695
0.632
0.632
Table 3: n=920,833, In parentheses following the members' names are their party and their
1993 ADA score. * = Districts that are only partially in Los Angeles County. For these
districts only the preferences of the Los Angeles County portion of the district is mapped.
19
prop. 156) have large negative 's. Alternatively, the proposed moves to the \right" such
as term limits (prop. 164) and welfare reform (prop. 165) are associated with positive 's.
The magnitude of indicates how far the proposal is from the status quo relative to the
importance of the idiosyncratic preference shock (). Propositions on topics that are outside
of the realm of traditional left{right politics (P SQ along this dimension) such as the
assisted suicide measure (prop. 166) have the smallest 's. Among the propositions that
deal with topics that bear on standard left-right issues, the \easier" (less complex) questions
(the bonds) have larger 's than the more complex or less advertised proposals such as the
sweeping tax reform (prop. 167) or the reorganization of the legislative analyst's oce (prop.
158).
Eight of the twelve estimated cutpoints are within the range of the district means or
medians. Thus, for these eight measures, there were some districts in which the median or
average voter was expected to support the measure while the average voter in some other
district was expected to oppose. The estimated cutpoint of over 7 for Prop. 163 may be taken
as evidence of some multidimensionality in the data. Prop. 163 repealed a short-lived sales
tax on prepared snack foods. It seems likely that the strongest supporters of the measures
were those on the extreme left who saw the snack tax as regressive and those on the extreme
right who always vote to reduce taxes. Thus, the question may have had both \a how much
to tax" and a \who to tax" dimension.
The lower panel of table 3 lends a certain empirical validity to the estimated spatial
dimension. The mean and median voter in districts held by Democrats are all to the left
of the mean and median voter in Republican held districts. The estimates that Dixon,
Watters, and Tucker held the most \liberal" or left districts in Los Angeles county is certainly
consistent with conventional wisdom. The data also suggest that there are considerable
cross district dierences in the variance of the voter distribution. McKeon's (R-25) district
is estimated to have only half the preference variance of Dixon's (D-32). There is a long
history of inquiry into the eects of heterogeneous versus homogeneous constituencies of
12
The estimated district medians fall on a discrete set of points because the voter ideal points are themselves
assumed to fall on a discrete set of points z . A continuous approximation to the assumed discrete ideal point
distribution (such as the one used in gure 2, would yield unique median estimates for each district. However,
the estimates would be highly dependent on the smoothing technique employed.
12
20
Distribution of the voter ideal points by Congressional district
Figure 1: Shows the estimated distribution of voter ideal points in each of Los Angeles
County's Congressional Districts.
21
Distribution of the voter ideal points by Congressional district
Figure 2: Shows the normal kernel density smoothed distribution of voter ideal points in each
of Los Angeles County's Congressional Districts. The standard deviation of the kernel is set
to 0:25.
22
representation (see Huntington 19xx, Fiorina 19xx). The estimated preference variances
presented here are perhaps the best data on the degree of Congressional district heterogeneity
available. Applying these estimates to the old questions of heterogeneity and representation
will be a focus of future work.
Even more interesting perhaps than the estimated moments of the preference distribution are the estimated voter preference histograms and densities shown in Figures 1 and 2.
The most striking aspect of these gures is that several of districts have strongly bimodal
preference distributions. For example Watter's district (35) appears to be a mixture of extreme liberals and moderate liberals. One important feature to note is that the shape of
the distributions varies greatly across districts. Thus methods of inferring features of these
distributions from aggregate data that assume that districts only dier from each other in
location or dispersion (see Heron (1998) or Snyder (1996)) may lead to false inferences.
5 Split-ticket voting
Putting the statistical model for estimating voters' preferences aside for the moment, the
individual voting data lends the possibility of tabulating the incidence of split-ticket voting
in a way that has not previously been possible. In Tables 4 and 5, I present (perhaps for
the rst time anywhere) a tabulation of the joint distribution of all votes cast in a given
jurisdiction across two federal oces.
Table 4 shows the county-wide distribution of President{Congress vote pairs. The conditional distribution of votes among those who voted Republican for President is a bit misleading because the data includes two districts for which no Republican Congressional candidate
was present. However, the table is still quite revealing. Perhaps most interesting is that the
majority of \splits" are not combinations of the two major parties. For example, among those
voting Democrat for President (all of whom could have voted for a Democrat for Congress),
only 9 percent supported a Republican for Congress whereas 14 percent supported either
minor party candidates or abstained from voting for Congress. There was only a very slight
tendency for one combination of votes for the two major parties to be favored over the other.
While 14 percent of Republican presidential voters supported Democrats for Congress, 9
23
Votes for President and U.S. House of Representatives,
Los Angeles County, 1992
Vote for President
Vote for
U.S. House Democrat Republican Perot Other Abstain Total
Democrat 1,119,057
114,182 170,184 4,337 26,421 1,434,181
(0.77)
(0.14) (0.35) (0.24) (0.34)
(0.51)
Republican 131,221
583,687 192,767 4,185 15,633 927,493
(0.09)
(0.73) (0.39) (0.23) (0.20)
(0.33)
Other
96,263
42,758 83,532 7,887
5,352 235,792
(0.07)
(0.05) (0.17) (0.44) (0.07)
(0.08)
Abstain
99,988
58,980 42,141 1,514 30,988 233,611
(0.07)
(0.07) (0.09) (0.08) (0.40)
(0.08)
Total
1,446,529
(0.51)
799,607 488,624 17,923
(0.28) (0.17) (0.01)
78,394 2,831,077
(0.03)
Table 4: Joint distribution of votes for U.S. House and President. Column percentages in
parentheses. Data are all valid ballots cast in the 1992 General election in Los Angeles
county.
24
Distribution of President/Congress vote pair by Congressional district
Los Angeles County, 1992
President/Congress vote pair
District
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
41
Incumbent
fd; dg fd; rg fr; rg fr; dg fp; dg fp; rg Other
Beilenson (D) 0.43 0.03 0.20 0.04 0.08 0.08 0.14
Open
0.23 0.04 0.31 0.02 0.05 0.11 0.23
Berman (D)
0.43 0.04 0.16 0.04 0.07 0.07 0.19
Morehead (R) 0.29 0.07 0.30 0.02 0.05 0.08 0.19
Drier (R)
0.25 0.07 0.35 0.02 0.06 0.11 0.14
Waxman (D)
0.49 0.05 0.14 0.03 0.04 0.05 0.22
Open
0.43 0.03 0.15 0.04 0.04 0.03 0.28
Martinez (D)
0.40 0.06 0.20 0.08 0.08 0.07 0.11
Dixon (D)
0.65 0.00 0.00 0.05 0.05 0.00 0.25
Open
0.46 0.07 0.15 0.04 0.05 0.05 0.18
Torres (D)
0.40 0.05 0.20 0.07 0.08 0.06 0.14
Waters (D)
0.67 0.02 0.08 0.03 0.04 0.03 0.13
Open
0.30 0.05 0.26 0.05 0.09 0.08 0.17
Open
0.60 0.00 0.00 0.07 0.06 0.00 0.27
Open
0.30 0.08 0.25 0.03 0.06 0.10 0.18
Open
0.25 0.07 0.34 0.03 0.07 0.09 0.14
Open
0.27 0.09 0.30 0.03 0.06 0.10 0.15
Table 5: Shows proportion of voters in each Congressional district that cast votes for a
given President-Congress vote pair. p=Perot, d=Democrat, r=Republican. = Districts
that are only partially within LA county. Only the Los Angeles County votes are shown for
these districts. = In thirty-third district was open, however the winning candidate Lucille
Roybal-Allard (D) was the daughter of the retiring incumbent (Edward Roybal).
percent of Democratic Presidential voters supported Republicans for Congress. Though this
dierence is consistent with Fiorina's (1994) conjecture that voters tend to see Republicans
as relatively better suited to the Presidency and Democrats as better suited to Congress (at
least in the pre-1994 period), this dierence may be largely an artifact of the two missing
Republican candidates. A nal interesting point is that third party voters seem to be a breed
unto themselves. A much larger fraction of Perot voters supported minor party Congressional candidates than did Bush or Clinton voters. Similarly, nearly half of those supporting
minor party candidates for president followed up with a minor party vote for Congress.
25
Estimated mean and variance of voter ideal points
by President{Congress vote pair
Vote pair
District
fd; dg fd; rg fr; dg fr; rg
District 36 (Harmon-D)
Mean
-0.773 -0.238 0.137 0.472
Variance
0.793 0.399 0.455 0.633
Percent of voters
45.4
7.6
7.6 39.4
District 38 (Horn-R)
Mean
Variance
Percent of voters
-0.660 -0.233 0.206 0.489
0.790 0.455 0.497 0.626
45.4 12.1
4.5 37.9
Table 6: Shows the mean and variance of the ideal point distribution of voters casting various
President-Congress vote pairs. The \Percent of voters" rows sum to 100 (minor-party voters
and abstainers are ommitted. The name and party of the winning Congressional candidate
are shown in parentheses.
Turning to Table 5, we see evidence of what Gerber and Many (1996) have called incumbency led ticket splitting. The table shows the joint distribution of various vote pairs.
The rst thing to notice is that eect of (major-party) ticket splitting on many of the elections is fairly modest. The fraction voting fd; rg minus the fraction voting fr; dg is never
more than about 7 percentage points (in absolute value) and is often less than 3 percentage
points. Although the presence of Perot in this election may be masking the importance of
major-party ticket splitting, it seems unlikely that from these data that ticket-splitting is
responsible for divided government. However, as we will soon see, in two closely contested
districts the ability of the Democratic Congressional candidate to obtain the votes of those
voting Democrat for President was a critical factor in the election outcome.
While the incidence of major-party ticket splitting is relatively low (generally less than
10 percent of all ballots cast), it could still have an important eect on election outcomes
at the margin. The question then is can split-ticket voting be understood within the spatial
voting framework developed above. Many authors have suggested that it can. Both Alesina
26
and Rosenthal (1994) and Fiorina (1992) have argued that split-ticket voting is a rational
behavior of voters with moderate policy preferences. In order to test these hypotheses,
I re-t the statistical model developed above using data from California's 36th and 38th
Congressional districts. These two districts had the most hotly contested Congressional
elections in Los Angeles County in 1992. We might expect that if voters ever condition their
votes on expectations of what policy will be implemented following the election, it will be
when the election is closely contested and a voter's own vote has some chance of altering the
outcome.
Both the 36th and 38th Congressional districts had open seat elections in 1992. In
the 36th district, which includes Venice and San Pedro, Democrat Jane Harmon defeated
Republican Susan Brooks by 800 votes despite outspending Brooks by a ratio of over 2 to
1. In the 38th district (Long Beach) Republican Stephen Horn defeated Democrat Peter
Mathews by about 10 thousand votes. Neither Horn nor Harmon had held elected oce
before winning their House seats.
Table 6 shows a few summary statistics on the distribution of voter ideal points across
those who cast various President-Congress vote pairs. The results are consistent with idea
that it is political moderate who split their tickets. In both districts the mean fd; dg voter is
to the left of the mean of all other vote pairs. Similarly, the mean fr; rg voter is to the right
of all other vote pair means. In both cases, those that split fd; rg were to the left of those
that cast fr; dg. This nding suggests that voters perceive the presidency as more important
in the determination of policy than the Congress. Another interesting feature of these data
is that the variance of the distribution of ideal points is much higher among those who cast
straight tickets than among those that split. This may suggest that party identication plays
an important role in candidate election independent of policy preferences. That is, a fairly
large fraction of voters who would be predicted to split based on their preferences continue
to vote straight tickets.
The top panel of Figure 3 shows the distribution of voter preferences by PresidentCongress vote pair. The distributions are normalized so that each integrates to the proportion
13
13
The Almanac of American Politics. Washington: National Journal, 1994.
27
Distribution of voter ideal points by
President{Congress vote pair
Probability of voting for each President{Congress vote pair
by voter ideal point
fd; dg =
fd; rg =
fr; dg =
fr; rg =
.
Figure 3: The upper panel shows the distribution of voter preferences for various President{
Congress vote pairs. The lower panel shows the estimated probability that a voter with a
given ideal point voted for each President{Congress pair. These density functions are kernel
density smoothings of the estimated histograms.
28
of the total electorate that chose each pair. The rst thing to note is that the spatial
dimension, which is based solely on proposition votes, does a reasonably good job of sorting
Democrats and Republican. Those on the extreme left of the preference distributions (call
them \liberals") and those on the extreme right (call them \conservatives") are nearly all
fd; dg and fr; rg voters respectively. However, we also see that there is considerable overlap
in the distribution of straight ticket voters and the ticket splitters inhabit the region of this
overlap. Looking at the distribution of voter ideal points in the 38th district reveals the
electoral downfall of the Democratic candidate Peter Mathews. While Bill Clinton did quite
well in the district (44 percent to Bush's 33 percent), the gure shows that a substantial
fraction of Clinton voters defected to Republican Horn.
The lower panel of Figure 3 shows the estimated probability that a voter with a given
ideal point votes for a given President-Congress pair. The rst thing to note here is that
there is no strong spatial ordering of the four alternatives. That is, there are no ideal points
such that the most likely vote pair is a split ticket. Thus, while most all ticket splitters are
centrists, the majority of centrists are not splitters. As mentioned above this suggest that
a large fraction of the centrist electorate have some sort of identication with one party or
the other. We also see starkly in the lower panel that as we move from the extremes to the
center, it is the Congressional candidates who are abandoned rst. That is, fr; dg splitting
is more prevalent among the conservatives and fd; rg splitting is more pronounced among
liberals. This is an interesting result for which I do not have a good explanation. Perhaps,
as suggested above, voters do think of the Presidency as having greater weight in policy
formation than the Congress or at least any given member of Congress.
6 Conclusion
In this paper, I have developed an estimator of the distribution of voter preferences, applied it to an underused but potentially rich data set, and demonstrated several substantive
questions to which the estimator might be applied. However, this work is still preliminary.
The statistical properties of the estimator must be studied more carefully. Standard error
estimates and tests of t must be developed.
29
However, the preliminary results are interesting in their own right. The major substantive ndings are the following. First, the distribution of voter preferences across districts
appears to correspond to inter-district variations in legislative behavior. The districts with
more conservative mean and median voters appear to elect more conservative representatives. Second, the shape of the distribution of voter ideal points varies considerably across
Congressional districts. While some districts have unimodal symmetric voter preference distributions, others have highly skewed or even bimodal preference distributions. This nding
has important implications for models of aggregate vote shares that often assume that the
shape of the within-district preference distribution is the same across districts.
The third set of ndings relate to split-ticket voting. Here I nd that ticket splitting
involving the two major parties is relatively uncommon. Generally less than one-half of all
ticket splits involve both of the major parties. Also, among those who do split their tickets
across the two major parties, there is no strong tendency for one pattern of splitting to appear
more often than the other. Consistent with existing theory, I nd that ticket-splitters are
generally centrists. However, even among centrists, ticket splitting is the not the modal
voting pattern. Thus, while it may be that ticket-splitters are attempting to perform a sort
of ideological balancing, most voters with the same incentives do not follow this strategy.
In future work, I hope to incorporate the candidate elections with their possible strategic
incentives directly into the statistical model. I will also be applying the model to several
more election cycles and to state assembly and senate districts. With this enlarged set of
district-representative pairs, I will be able to test formally the theories of representation
discussed casually in the text.
30
References
Achen, Christopher H. 1977. \Measuring Representation: Perils of the Correlation Coecient." American Journal of Political Science. 21:805{815.
Achen, Christopher H. and W. Phillips Shively. 1995. Cross Level Inference. Chicago:
University of Chicago Press.
Alesia, Alberto and Howard Rosenthal. 1994. Partisan Politics, Divided Government, and
the Economy. New York: Cambridge University Press.
Anderson, Erling B. 1972. \The Numerical Solution of a Set of Conditional Estimation
Equations." XXXXX. XXX(1):42{54.
Anderson, Erling B. and Mette Masden. 1972. \Estimating the Parameters of the Latent
Population Distribution." Psychometrika. 42(3):357{374.
Anderson, Lee F., Meredith W. Watts and Allen Wilcox. 1966. Legislative Roll-Call Analysis.
Evanston: Northwestern Unversity Press.
Bailey, Michael. 1998. \A Random Eects Approach to Legislative Ideal Point Estimation."
Presented at the 1998 Midwest Political Science Association Meeting, Chicago.
Black, Duncan. 1958. The Theory of Committees and Elections. England: Cambridge
University Press.
Bock, R. Darrell and Marcus Lieberman. 1970. \Fitting a Response Model for n Dichotomously Scored Items." Psychometrika. 35(2):179{197.
Bock, R. Darrell and Murray Aitken. 1981. \Marginal Maximum Likelihood Estimation of
Item Parameters: Application of the EM algorithm." Psychometrika. 46(4):443{459.
Brady, Henry E. 1989. \Factor and Ideal Point Analysis for Interpersonally Incomparable
Data." Psychometrika. 54(2):181{202.
Butler, J. S. and Robert Mot. 1982. \A Computationally Ecient Quadrature Proceedure
for the One-Factor Multinomial Probit Model." Econometrica. 50(3):761{764.
Converse, Phillip and Gregory A. Markus. 1979. \Ca Plus Change...The new CPS Election
Study Panel." American Political Science Review. 73(1):32{49.
Cressie, Noel and Paul W. Holland. 1983. \Characterizing the Manifest Probabilities of
Latent Trait Models." Psychometrika. 48(1):129{143.
Dempster, A. P., D. B. Rubin and R. K. Tsutakawa. 1981. \Estimation in Covariance
Components Models." Journal of the American Statistical Society. 76(June):341{353.
Dempster, A. P., N. M. Laird and D. B. Rubin. 1977. \Maximum Likelihood Estimation
from Incomplete Data via the EM Algorithm." Journal of the Royal Statistical Society,
Series B. 39(1):1{38.
31
Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper & Row.
Dubin, Jerey A. and Elisabeth R. Gerber. 1992. \Patterns of Voting on Ballot Propositions:
A Mixture Model of Voter Types.". Social Science Working Paper 795, California
Institute of Technology.
Fienberg, Stephen E. and Michael M. Meyer. 1983. \Loglinear Models and Categorical Data
Analysis with Psychometric and Econometric Applications." Journal of Econometrics.
22:191{214.
Fiorina, Morris. 1992. \An era of divided government." Political Science Quarterly.
33(2):387{410.
Follman, Dean. 1988. \Consistent Estimation in the Rasch Model Based on Nonparametric
Margins." Psychometrika. 53(4):553{562.
Follman, Dean A. and Diane Lambert. 1989. \Generalizing Logistic Regression by Nonparametric Mixing." Journal of the American Statistical Society. 84(March):295{300.
Gerber, Elisabeth R. 1991. Legislative politics and the direct ballot: comparing policy
outcomes across institutional arrangements PhD thesis University of Michigan Ann
Arbor: .
Gerber, Elisabeth R. 1996a. \Legislative Response to the Threat of Popular Initiatives."
American Journal of Political Science. 40:99{128.
Gerber, Elisabeth R. 1996b. \Legislatures, Initiatives, and Representation: The Eects of
Institutions on Policy." Political Research Quarterly. 49:263{286.
Gerber, Elisabeth R. and Adam S. Many. 1996. \Incumbent-Led Ideological Balancing: A
Hybrid Theory of Split-Ticket Voting." Present at the 1996 Midwest Political Science
Association Annual Meetings, Chicago.
Groseclose, Tim. 1994. \The committee outlier debate: A review and a reexamination of
some of the evidence." Public Choice. 80.
Haberman, Shelby J. 1977. \Maximum Likelihood Estimation in Exponential Response
Model." The Annals of Statistics. 6(5):815{841.
Heckman, James J. and Bo E. Honore. 1990. \The Empirical Content of the Roy Model."
Econometrica. 58(5):1121{1149.
Heckman, James J. and James M. Snyder. 1997. \Linear Probability Models of the Demand for Attributes with an Empirical Application to Estimating the Preferences of
Legislators." The Rand journal of economics. 28:S142{169.
Heckman, James J. and Robert J. Willis. 1977. \A Beta-logistic Model for the Analysis of Sequential Labor Force Participation by Married Women." Journal of Political Economy.
85(1):27{58.
32
Heinen, Ton. 1996. Latent Class and Discrete Latent Trait Models. Thousand Oaks: Sage
Publications.
Heron, Michael. 1998. \Some consequences of the lack of micro-foundations in aggregate
voting data." Prepared for presentation at the 1998 annual meetings of the American
Political Science Association. Boston MA.
Hinich, Melvin J. and James M. Enelow. 1984. The spatial theory of voting: an introduction.
Cambridge, England: Cambdridge University Press.
Hotelling, H. 1929. \Stability in Competition." Economic Journal. 39:41{57.
Kiefer, J. and J. Wolfowitz. 1956. \Consistency of the Maximum Likelihood Estimator in the
Presence of Innity Many Incidental Parameters." Annals of Mathematical Statistics.
27:887{906.
King, Gary. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual
Behavior from Aggregate Data. Princeton: Princeton University Press.
Kuklinski, James H. 1978. \Representation and Elections: A Policy Analysis." American
Political Science Review. 72:165{177.
Lahda, Krishna K. 1991. \A Spatial Model of Legislative Voting with Perceptual Error."
Public Choice. 68:151{174.
Laird, Nan. 1978. \Nonparametric Maximum Likelihood Estimation of a Mixing Distribution." Journal of the American Statistical Association. 73(December):805{811.
Lewis, Jerey B. 1996. \Referendums, Roll-calls, and Constituencies." Presented at the 1996
Annual meetings of the American Political Science Association, San Fransisco, CA.
Lewis, Jerey B. 1997a. \To the victors go the rollcalls." Presented at the 1997 Annual
meetings of the Midwest Political Science Association, Chicago, IL.
Lewis, Jerey B. 1997b. Who do representatives represent? The importance of electoral
coaltion preferences in California PhD thesis Massachusetts Institute of Technology
Cambridge MA: .
Lindsay, Bruce, Cliord C. Clogg and John Grego. 1991. \Semiparametric Estimation in the
Rasch Model or Related Exponential Response Models, Including Simple Latent Class
Model of Item Analysis." Journal of the American Statistical Society. 86(March):96{107.
Londregan, John. n.d. \Estimating Preferred Points in Small Legislatures: The Case of the
Chilean Senate Committees." Unpublished Working Paper, University of California,
Los Angeles.
Londregan, John and James M. Snyder. 1994. \Comparing Committee and Floor Preferences." Legislative Studies Quarterly. 19(2):233{265.
33
Mislevy, Robert J. 1984. \Estimating Latent Distributions." Psychometrika. 49(3):359{381.
Mroz, Thomas A. 1997. \Discrete Factor Approximations in Simultaneous Equations Models:
Estimating the Impact of of a Dummy Endogenous Variable on a Continuos Outcome."
Unpublished working paper, Department of Economics, University of North Carolina,
Chapel Hill. 97-2.
Neyman, J. and Elizabeth L. Scott. 1948. \Consistent estimates based on partially consistent
observations." Econometrics. 16(1):1{32.
Peltzman, Sam. 1985a. \Constituent Interest and Congressional Voting." Journal of Law &
Politics. 27:181{210.
Peltzman, Sam. 1985b. \An Economic Interpretation of the History of Congressional Voting
in the Twentieth Century." American Economic Review. 75:657{675.
Poole, Keith and Howard Rosenthal. 1985. \A Spatial Model for Legislative Roll Call Analysis." American Journal of Political Science. 29:355{384.
Poole, Keith T. and Howard Rosenthal. 1991. \Patterns of Congressional Voting." American
journal of political science. 35(1):228.
Poole, Keith T. and Howard Rosenthal. 1997. Congress: A Political-Economic History of
Roll Call Voting. Oxford: Oxford University Press.
Rednert, Richard A. and Homer F. Walker. 1984. \Mixture Densities, Maximum LIkelihood
and the EM Algorithm." siam Review. 26(2):195{239.
Rigdon, Steven F. and Robert R. Tsutakawa. 1983. \Parameter Estimation in Latent Trait
Models." Psychometrika. 48(4):567{574.
Sanathanan, Lalitha and Saul Blumenthal. 1978. \The Logistic Model and Estimation of
Latent Structures." Journal of the American Statistical Society. 73(December):794{799.
Snyder, James M. 1996. \The Dimensions of Constituency Preferences: Evidence from
California Ballot Propositions, 1974-1990." Legislative Studies Quarterly. 21(4):463{
488.
Stroud, A. H. and Don Secrest. 1966. Gaussian Quadrature Formulas. Englewood Clis NJ:
Prentice-Hall.
Thurstone, L.L. 1931. \The isolation of Blocs in a Legislative Body by the Voting Records
of Its Members." Journal of Social Psychology. 3:425{433.
Tong, Y. L. 1988. \Some Majorization Inequalities in Multivariate Statistical Analysis."
siam Review. 30(4):602{622.
Zwinderman, Aeilko H. 1991. \A Generalized Rasch Model for Manifest Predictors." Psychometrika. XX:589{599.
34
A Appendix: The Los Angeles County Voter Data
Voting in Los Angeles County uses a punch card system. Each voter punches holes in a
computer card corresponding to particular candidate choices. The cards are then read by a
computer and the vote totals tallied. As a by-product of this process an \electronic image"
(a series of numbers representing the punch pattern) of each cast ballot is made and stored
on magnetic tape. These so called ballot image tapes have been made available to the public
for the several recent election cycles. Some analysis of these data has been undertaken by
(among others) scholars at Cal Tech (Ken McCue) and UCSD (Elisabeth Gerber).
The image les distributed by LA county were never intended to be used for data analysis.
Rather the le is simply a record of every card that was fed through the machine. Getting the
data into a usable form presents a major challenge. The rst challenge is simply mapping the
computer record that represents each cast ballot to a particular pattern of holes. The next
problem is to match that pattern of holes with a particular set of candidate and proposition
votes. The nal problem is to determine which ballot images were actually used in tallying
the vote (many ballots are run through the counting machine several times!).
The rst problem of matching the data records with a particular pattern of hole punches
is relatively straight-forward. The problem of matching a pattern of hole punches with
votes for particular candidates is cumbersome but transparent. In 1992, 235 dierent ballot
layouts were used. Matching the ballot punches to votes for candidates simply requires a
mapping between the punch positions and candidates for each of the 235 ballot layouts.
Determining which ballots were used in calculating the ocial total is the most dicult
aspect of the reading the data. Ballots are run through the counting machines in batches.
Each batch has a header record followed by the individual ballot records. The problem
is further complicated because the dierent batches are processed on several card readers
simultaneously and the image le interlaces records from the dierent readers. If a particular
batch is not going to be used in the nal tally a card is supposed to be run through the
machine that indicates that the batch will not be used. However, in practice some batches
that are not counted do not have a \delete" record in the data.
Many authors have recognized this problem and determined to treat the records they
extract from the tapes as a sample and make no attempt to reconcile the individual voting
data to the precinct-level statement of the vote (Dubin and Gerber 1992 and Gerber and
Many 1996). I have solved the problem of determining which vote batches were used in the
nal tally and am able to generate precinct-level vote totals that exactly match the ocial
statement of the vote. My method works in the following way. First, I read through the
data and extract for each batch of the votes: the precinct the batch was from, the total
number of votes in the batch, and the number of valid votes for several candidates in the
batch. I also note whether the batch is marked as deleted. In the 1992 election, there were
over 17,000 batches used to record the votes in the 6,305 precincts. Then for each precinct,
I take the reported vote total and determine which set of batches for that precinct would
14
15
16
Michael Alverez informs me that due to a change in the computer system used to total the votes the
ballot image les are no longer being compiled.
15 I thank Liz Gerber for making the raw data les available to me.
16 I thank Ken McCue and Liz Gerber for providing model code for operation.
14
35
yield the reported vote total. If more than one set of batches match the total, the total
votes for particular oces are checked until all but one set of batches has been eliminated
as a potential generator of the precinct totals. Having compiled a compete (tentative) list
of which ballot batches were used, I then reread the entire image le compiling the votes
for those batches that were identied as having been used in tallying the totals. As a nal
check, the individual-level votes are aggregated to the precinct-level and checked against the
reported precinct totals.
The data presented in the paper reproduce the precinct totals to the vote for all federal
and state oces and on all county and state-wide propositions.
36
Fly UP