Ecological inference reversed: Estimating aggregate features of voter ideal-point distributions from
by user
Comments
Transcript
Ecological inference reversed: Estimating aggregate features of voter ideal-point distributions from
Ecological inference reversed: Estimating aggregate features of voter ideal-point distributions from individual-level data Incomplete, comments welcome, version 0.1 Jeffrey B. Lewis Department of Politics Princeton University July 13, 1999 Abstract In the last decade a great deal of progress has been made in estimating spatial models of legislative roll-call voting. There are now several well-known and effective methods of estimating the ideal points of legislators from their roll-call votes. Similar progress has not been made in the empirical modeling of the distribution of preferences in the electorate. Progress has been slower, not because the question is less important, but because of limitations of data and a lack of tractable methods. In this paper, I describe the existing technologies for inferring ideal points. I then develop a method for recovering the relative means and variances of the voter ideal point distribution across two (or more) groups of voters from individual-level binary response data. I extend the model to multiple dimensions and describe tests for dimensionality and inter-group differences. I then present Monte Carlo results demonstrating the efficacy of the method. Prepared for presentation at the Annual Meetings of the Political Methodology Society, College Station TX, July 1999. A previous related work was presented at the 1999 Midwest Political Science Association Meetings and at the 1998 American Political Science Association meetings. Addresses: Department of Politics, Corwin Hall, Princeton University, Princeton NJ 08544 and [email protected]. This research was supported by a grant from the Woodrow Wilson School of Public and International Affairs, Princeton University. Thanks to Liz Gerber and David Epstein for comments on a related paper. 1 Introduction Political analysts are often interested in whether there are differences in the distribution of preferences, ideology, or policy positions across two (or more) groups of voters or other political actors. For example, we may wish to know whether incumbent Democratic house candidates are on average more liberal than Democratic challengers. Or we may want to know if there is greater heterogeneity in abortion preferences among men than there is among women. However, existing methodologies for such inference is generally inadequate. In this paper, I draw on the psychometrics literature on test taking to develop a method of inferring the relative differences in ideal point distributions across groups using individual-level binary response data. The method could be used to recover the distribution of latent characteristics other than ideal points and the individual actors need not be voters. For example, in the psychometric literature on item response, the individuals are usually “students” and the latent characteristic might be “intelligence.” However, I will use the convention of referring to the actors as “voters” and to the characteristic whose distribution we are inferring as ideal points. At first glance, the problem seems rather trivial. Surveys such as the American National Elections Study ask voters to place themselves on policy scales. If we take responses to such questions as indicative of voters’ ideal points, tests of differences in the mean or the variance of ideal points across groups are straightforward. However, such seemingly direct measures may be problematic. It is unclear how people conceive of such scales. Does a mid-point of a 7-point scale reflect the same degree of liberalness to all voters? This problem may be particularly acute for inter-group comparisons. For examples, a right of center Republican may consider herself to be somewhat liberal by comparison to her largely Republican peers while a similarly right-wing Democrat may see himself as moderately conservative. Moreover, it remains an open question whether voters even conceive of such scales in a way that is consistent with the spatial voting model (see Hinich & Enelow (1984) and Rabinowitz & MacDonald (1989)). Finally, responses are known to be unreliable (Achen, 19XX). A fact which is not surprising given the somewhat unnatural nature of the question. Alternatively, it might be preferable to infer voters spatial positions from responses to a series of policy questions. These questions might be votes on propositions (see Lewis (1999) or they may 2 be answers to survey questions. All that is required is that in each case the voter be asked to express her preference over a specific policy question. For example, abortion preferences might be inferred from a series of yes/no responses on different conditions under which access to abortion should be allowed.1 While these “questions” may be actual votes or answers to survey questions, I will follow the convention of referring to the observed voter choices as votes. Such an approach has the advantage that the basic items are, more or less, objective in the sense that they mean the same thing to all voters. Of course, researchers have been estimating voter preferences by constructing indices of binary policy questions for some time. However, such indices are rather crude and do not (in general) have desirable statistical properties. More sophisticated factor analytic techniques have also been employed. These factor analytic models including those specifically designed for binary choice items (like Poole & Rosenthals NOMINATE). are fundamentally as problematic as the simple index for these data. In particular, all such methods require the estimation of each individuals ideal point. These individual estimates can then be used to test for differences in ideal point distributions across groups. However, as noted below. The individual-level ideal point estimates are (at best) only consistent as the number of questions asked to each respondent grows large. In most cases, we have only a very few decisions upon which to make inferences about voter preferences. The inconsistent estimates of each voter’s ideal point, lead to inconsistent estimates distribution of ideal points within each group. One obvious alternative is to use the individual-level response data, to estimate features of the groups level ideal-point distributions directly without the intermediate of estimating each voter’s ideal point. While techniques such as covariance structure modeling (L ISREL) have this flavor, very few examples of this sort of approach have been employed in the Political Science literature. On the other hand such approaches have been used extensively in the psychological literature on test taking (see, for example, Lord & Novick (1968), Anderson & Madsen (1977), Bock & Aitken (1981)). In what follows, I will recast some of the insights from the item response theory to the problem of recovering the distribution of voter ideal points. In particular, the model I develop allows me to 1 The questions need not be binary though I will only consider binary questions in what follows. 3 estimate of inter-group differences in the means and variances of ideal points distributions along two latent dimensions. I then demonstrate the efficacy of the technique through Monte Carlo simulation. The Monte Carlo experiments reveal that with as few as seven observed votes per voter, inter-group differences in ideal point distributions in a two-dimensional policy space can be recovered. The paper unfolds as follows. In the next section, I review the literature on recovering ideal points from voting or other binary choice data. In section three, I develop a model for estimating inter-group differences in means and variances of voter ideal points across multiple spatial dimensions. In section four, I consider measures of fit and test of dimensionality for the model. Section five presents some preliminary Monte Carlo results. Section six concludes. 2 Spatial models of legislators’ preferences or spatial locations The obvious place to start in any consideration of recovering spatial positions from binary choice data is models of roll call voting in legislatures. Over the last twenty years there have been great advances in the methods for mapping legislators in policy spaces based on their roll call voting records. Unfortunately, these models are not directly importable to the question at hand. Almost all existing models of roll call voting only have desirable statistical properties as the number of observed votes grows large. In a legislative setting this is not particularly problematic because for most legislatures the number of recorded votes is indeed quite large. Despite the differences in the observed data, models of legislator’s preferences are the natural place to start in constructing a technique for inferring voter preferences. Thinking about differences in legislators’ voting records in a spatial way has a long history, perhaps the earliest example of this sort of analysis is Thurstone (1931). The methods employed in this early work were not based on any explicit theoretical model of preferences and were mainly concerned with identifying voting coalitions in the Congress, various state legislatures, and even the United Nations [ADD CITES]. Recent advances in the spatial analysis of roll-call data are explicitly grounded in formal models of voting. In these spatial theories of voting (see Hinich & Enelow (1984)), public policy alternatives are represented as points in space and legislators’ preferences over those points are defined by the distance between the policy point that the legislator would most like to see imple- 4 Methods of Estimating Legislator/Voter Locations Estimator Method type Vote index NP Guttman Scaling NP Heckman–Snyder scores GLS NOMINATE scores ML Random proposal models MML Rasch models CML Covariate models ML Random effects covariate models MML Consistent as: n/a n/a Citation/example ADA score Anderson et. al., 1966 Heckman & Snyder (1997) Poole & Rosenthal (1997) Londregan (n.d.) Lahda (1991) Peltzman (1985a) Bailey (1998) Table 1: Table lists commonly used methods of ideal point estimation. NP = Non-parametric, GLS = Generalized Least Squares, ML = Maximum Likelihood, CML = Conditional Maximum Likelihood, MML = Marginal Maximum Likelihood, = Number of legislators/voters, = Number of votes. mented (the ideal point) and each of the alternatives. With this theoretical framework (articulated further below), estimated locations of the legislators take on a new significance. That is, the spatial locations of the legislators can be interpreted as their ideal points. As it turn out, a spatial voting model interpretation can be given (roughly speaking) to many of the earlier methods (Heckman & Snyder 1997). Table 1 lists a number of techniques that have been used to estimate legislators’ spatial locations or ideal points.2 Each of these methods represents a potential approach to the problem of estimating voter ideal points. In the end, none are particularly suited to the problem. However, some consideration of their properties highlights the relevant data and statistical issues involved. While some of these models are explicitly designed to place legislators in a multidimensional space others are restricted to a single dimension. A few can be extended to multiple dimensions only with a great increase in computational cost (e.g. Rasch model and random-effects models). The most important statistical issue is the consistency of the estimated ideal points. The problem is most easily seen from a maximum-likelihood perspective. Supposing a maximum-likelihood estimator for each of the ideal point models could be written, it would have the form 2 In discussing the methods, I use the terms “location” and “ideal point” interchangeably. 5 where is a !" matrix of observed votes (the data), # characteristics of the proposals, and $ is a vector of parameters describing is a vector of parameters describing the distribution of ideal points. Suppose for starters that each of the " proposals to be voted on has one or more elements of # associated with it and that $ is a vector of ideal points. Consider the ML estimators for # and $ as the size of the data matrix increases. As the number of legislators grows by one so does the size of $ . Similarly, as " grows so does the size of # . This proliferation of parameters as the sample size increases is well-known to undermine the standard consistency results for ML estimators (Neyman & Scott 1948). One way around this problem has been to show that estimates of $ will be consistent under a so-called “triple” asymptotic condition (Haberman 1977). In these cases, $ can be consistently estimated if the following three conditions hold: (1) the number of roll calls goes to infinity, (2) the number of legislators goes to infinity, and (3) the ratio of votes to legislators goes to infinity. In other words, these estimators will work if you have a large legislature that takes a lot of roll call votes.3 While I have not extended Haberman’s triple asymptotic result to Poole and Rosenthal’s N OMINATE procedure, I believe their method is consistent under these conditions. From the standpoint of estimating voters preferences (as opposed to legislators’ preferences), this is certainly not going to be a compelling result. While it may be correct to think of the number of voters as approaching infinity, the number of votes is not large and certainly in no way could we think of the number of votes over the number of voters as large. In cases where the triple asymptotic condition seems unlikely to hold, several alternative solutions have been presented. The first is to specify the model in such a way that their exists a sufficient statistic ( % ) of the data for # . In this case the likelihood can be reformulated as: &' )( $+* #,.-0/ 1 &.2 ' 1 1 ' 1 ( $+* % , 3 % ( #4, Since the corresponding log likelihood, 5 6 &' 3 )( $+* #4,.-87 1 '56 & 2 ' 1 1 56 ' 1 ( $+* % , 9 3 % ( #4, ,* It should be noted that under this triple asymptotic condition the : will still not be consistently estimated. 6 is additively separable in ; and < , the value that maximizes => ? @ over ; will be the same as the value that maximizes =.A > ? @ over ; . Assuming =.A > ? @ meets the usual conditions for consistent ML estimation, we see that we can get consistent estimates of ; likelihood of ; as BC grows large by maximizing the conditional on D . This method is referred to as Conditional Maximum Likelihood. Both the Heckman–Snyder and (one-parameter) Rasch-type models take this approach. From the standpoint estimating voter preferences, even these models require that a large number of votes be observed. The above approaches can all be put under the general heading of fixed-effects models. That is, they treat all of the vote characteristic and the ideal point parameters as fixed constants to be estimated. Another approach would be to treat the set of proposals as a draw from some distribution, E > FG@ . The likelihood of => H)I ;@ can then be written as L => HI ;+@GJ0K LMN=> H I ;+O F.@ PRQS> FG@ T Integrating out the nuisance parameters ( < ) and maximizing (1) over ; (1) is referred to as Marginal Maximum Likelihood. Londregan’s (n.d.) random coefficient model takes this approach. Londregan presents a model in which E > FG@ is conditioned on a set of proposal-specific covariates (e.g. party of the proposer). This use of auxiliary information weakens the importance of the arbitrary distributional assumption, U > ? @ . The coefficients on the covariates are also of direct substantive interest. The model is consistent as the number of votes grows large. Since Longregan’s method still requires the number of proposals to grow large, it is not appropriate to the problem at hand, but it does suggest a possible course. Rather than using auxiliary information to help identify the distribution of proposal characteristics, we could parameterize the idel points using covariates. This is quite common in the literature. The most common form of the “covariate” models are those in which the legislators’ ideal points are assumed to be a deterministic linear function of a set of covariates, ; where ; L is legislator X ’s ideal point and V L L J8V LW is a vector of observed legislator attributes (e.g. party, 7 constituency characteristics, or previous occupation). These deterministic models are commonly used (at least implicitly) in both models of legislative voting and candidate elections. Generally these models involve running a probit or logit regression of a single roll call or vote choice on a \ vector of covariates. The ideal points can then be estimated as YSZ [ . While these models can be applied to situations in which many decisions are observed, the estimated ideal points are consistent with a single observed roll call as the number of voters or legislators grows large. The obvious problem with the deterministic covariate model is that it is highly restrictive to assume that the ideal points can be written as a deterministic function of a set of covariates. Surely, unobserved traits must also affect the values of ] . Bailey (1998) generalizes the deterministic covariate model by assuming that the ideal points are a linear function of a set of covariates and a ^ legislator specific random shock, Z _ Y Z \`a Z ^ Bailey’s model can be thought of as a mirror image of Londregan’s. Bailey treats b as a set of fixed effects to be estimated and integrates over the random , cd e)f b4g \h _8i cd e Zj f ^ Z gb ^ h kRl d f Y+g \h m a Having assumed a distribution for , Bailey can then consistently estimate the distribution of leg- ^ islator ideal points as the number of legislators grows large. To find a particular legislator’s ideal point, the a posteriori expectation n o d Z fe g \ gb h o can be estimated using estimated the [ and p [ in place of p and . The consistency of these estimates require that the number of legislators and the number of rolls calls grow large. Because it is only the distribution of voter ideal points and not each individual’s ideal point that is of interest, the random-effects covariate model seems promising. It provides estimates of the parameters of the conditional ideal point distribution as the number of legislators (voters) grows large. However, the voting data contains no covariates. The challenge is to develop a model that will allow us to estimate consistently the distribution of ideal points without covariates. 8 3 The basic spatial model My statistical model begins from the standard spatial model of voting (Hotelling 1929, Downs 1957, Black 1958, Hinich & Enelow 1984). In the spatial model, it is assumed that policy choices can be represented by points in Euclidean space. Each voter is assumed to have a most preferred policy position in a q -dimensional space, r+sut v w x v y x z z z x v { | . A voter’s utility for various policy alternatives is defined by a function of the distance between the position of the alternative and the voter’s ideal point. Following the usual convention in the literature, I assume this function is a simple quadratic. In order to introduce uncertainty into the vote choice, I assume that voters’ utilities for various alternatives are not solely determined by their spatial positions but are also determined by an additive idiosyncratic shock } . Thus, the utility for a voter at r from the implementation of a policy ~Nst w x y x z z z x { | is, y t rx ~|.s !t v | } where, by assumption, }0t x | and i.i.d. across alternatives. Assume that all choices are over exactly two alternatives (there is no abstention). The difference between the utility provided by any two alternatives and t rx ~| t rx |s t v y y s t is: G y | !t v | t ! G y | }w }y | v 8t } w } y | Assuming sincere voting in the sense that voters vote for the alternative that they prefer.4 the probability that a voter with ideal point v votes for alternative over alternative . t . R r|.s . is: y t y | t | v t } y } w | ¡ 4 Of course, since the each voter has virtually no chance of changing the outcome of the election with her vote, voting for her preferred outcome is not a strictly dominating strategy. 9 Because ¢¤£0¥¦ §¨ © ª . ¢ «.¬¢ .£0¥¦ §¨ ® ¯ © ª , °.± ² ³S´ ¦ À « ¬!µ « ª ¶ ¯¦ µ ¿ ¬À ¿ ª · ¿ « « ¿ ¦ µ « ¬µ ª ¶)¯¦ µG¬µ« ª ·¹¸¦ ¢ «.¬¢ ª º4»0¼8½¾8¿ ¿ ¾ ¿ Á  ® ¯ © where ¼¦ à ª is the standard normal cumulative distribution. Letting ÄÅ Æ represent voter Ç ’s choice over È É¹Æ ¨ Ê4Æ Ë where ÄÅ Æ»Ì denotes the choice of ÉÆ and ÄÅ Æ»0§ denotes the choice of Ê4Æ and Ô Ô Ø Ó « Ð Ù ÕÔ Ö Ó Ð × and Ú Æ » letting Í Æ »ÏÎÐ Ñ Ò ¿ °.± ² ³ « ØÔ Ô Ñ Ö Ð « Õ Ù ÒÔ Ð × , we find the familiar probit model.5 Ô Ú Æ · Å ª Ý Þ ¦ ̤¬)¼¦ Í Æ ¶)Ü ¿ ¿ ¦ Ä Å Æ »8Û Å Æ ªG»0¼¦ Í Æ ¶)Ü ¿ ¿ Ô Ú Æ · Å ª ª ÑÕ ÝÞ × ¿ ¿ 4 Estimating the model Given this basic model of each vote choice, I now turn to the question of estimating the parameters of the model. In order to go from the theoretical model presented above to a statistical model, I must make some additional assumptions. First, I assume that, conditional on the parameters, the vote choice probabilities are independent across votes and voters. That is, each decision made by each voter is an independent draw. Thus, the likelihood of observing a vector of votes (Ä Å ) by a voter Ç located at · Å is ß ¦ ÄÅ à Í.¨ Ú.¨ áÅ ª.âã Æ ¼¦ ÍÆG¶ Ü ¿ Ô ÚRÆ ¿ · Å ¿ ª Ý Þ ¦ ̬)¼¦ ÍÆG¶ Ü ¿ Ô ä ÚRÆ ¿ · Å ¿ ª ª Õ Ý Þ As shown here, the likelihood is conditioned on the observed á . The following the random effects approaches discussed above, I will place a multivariate density åG¦ à ª over the vector á . I then integrate out over this density. ß ¦ æSÅ à ç¨ èªG»0éêé äää éNë¦ ÄÅ à ç¨ è¨ ·Rª åG¦ · ¨ · « ¨ äää ¨ · ¿ ª ìR· ìR· « äää ì· ¿ The resulting likelihood expresses the observed data only in terms of the parameters of interest. Because the above integral is very unlikely to have a closed form, it is approximated by a Defining í¤îï¤ð where ï is a vector of observed characteristics, we have the deterministic covariates model of ideal point estimation described above. 5 10 Form of the data òó Pattern òó òó òó òó Rò ô òó òó òó òó òó õ ô òó òó òó òó õRó òRô ñ òó ñ òó ñ òó ñ õ ó ò ó õ ó ñ òó Rõ ó õ ó ñ õ ó Rõ ó õ ó Frequency 5 2 5 .. Rõ . ó õ ó Rõ ó õ ô Rõ ó õ ó Rõ ó õ ô Rõ ó õ ó Rõ ó õ ô Total 13 16 75 1000 Table 2: A listing of the frequency of each voting pattern completely captures the information in data. Thus, the estimation problem can be thought of as fitting a multinomial distribution with öR÷ catagories (one for each pattern). The ability to render the data in this multinomial way is very convenient as it allows all the outer sums in the log likelihood to run over the ö ÷ categories rather than ø voters. With large data sets the computational advantages can be very large. quadrature. That is the integral is replaced by a weighted sum of a given number of points, ù ñ úû ü ýó þô.ÿ ù ñ úû ü ýó þó ô ñ ô ó (2) Given that vote choices are assumed to be independent across voters, the total log likelihood can be written as û ù ù ñ ú û ü ó ý4ó þ4ó However, the situation can be greatly simplified. If ô ¹ñ ô (the number of items) is not overly large, the number of patterns in the data ö ÷ is considerably smaller than the number of voters ø . Since the vector of votes is all that I observe for each voter the value of the likelihood is clearly the same for all voters who share the vote vector share a vote profile ú û . Thus, defining to be the total number of voters that , the likelihood can be rewritten as ù ù ñ ! ü ýó þ4ó ô ñ ô "# Equation (3) can be maximized directly using standard numerical maximization techniques. 11 (3) $ All that remains is the selection of a distribution . The usual convention is to construct the distribution such that the ideal point dimensions are orthogonal. Because the likelihood is function %'&)(+* & , - & ), one cannot jointly estimate the covariance between the ideal point distribution and the ( s. Thus, the usual approach is assume of a linear combination of the ideal point elements (i.e. independence. Under independence, we have $/. 01324$)5 . , 5 1 $ 6 . , 6 17 7 7 $ 89. , 81 7 The quadrature estimates are then also written as sum of over the product of the weights along each : ;=< dimension, ?2 @ 6 A > @ 5B : ;DC ?? 7 7 7 ? < @+F GIH JKH L .E 1 MN5 . O)5 1 MP6 . O 6 17 7 7 MP89. Q/81 R#7 At this point, I have fully specified the model for a single group. Extending the model to test for intergroup differences is straightforward. The basic intuition is the following. Suppose, that the distribution $ has parameters S . For example, if the $ were taken to be spherical normal S H VH H W 7 7 7 : .;]We\^_can^ then^`write parameters differ across groups T2U : ;=< < the log likelihood as GIH JKH X4Y H X'Y H H @ F GIH JKH L F [ ZA ?@ 6 A > [ @ ? [ . 7 7 7 .E 1 $/. 0 S 1 a , 5 a , 6 7 7 7 a , 8b 7 5 6 7 7 7 S=Z9132 5 5B For identification, (at least some of) the elements of S must be fixed for one of the groups. Notice cedata V)f continues to have a multinomial where form the number of “patterns” in the data will thatWd the G J be . Assuming a well behaved $ , the above can be estimated by MML. would be comprised of the means and variances along each dimension. Further suppose that those 6 Note that the leverage in this model comes from the restriction that s and s are the same across groups. Without this restriction no comparison between the group distributions would be g ( possible. In effect, the item parameters ( s and s are used to place the groups in the same space. 6 h Alternatively, the sum or mean of the groups s might be fixed. 12 5 An example assuming ideal points are distributed uniformly over rectangles In this section, I consider two examples of the general model developed above. In these examples, i the number of votes ( ) is set at 7 and the number of dimensions is fixed at one for the first example and two for the second. The number of groups is fixed at two. The data are simulated from a known process. The distribution of voter ideal points is assumed to be uniform over a rectangle in the two dimensional case and uniform over a closed interval in the case of one dimension. The choice of the uniform distribution is restrictive.7 Much less restrictive distributions might to used.8 While spherical distributions such as the bivariate normal would perhaps be more appropriate for empirical applications, the rectangular distribution is preferable for the Monte Carlo experiments presented below. The problem with spherical distributions is that the dimensions are only defined up to an orthogonal rotation of the axes. In general this is not problematic and may even be an advantage allowing the analyst to pick the rotation that, for example, captures the most inner group differences along a single dimension. However, for Monte Carlo experiments invariance to rotation makes the results difficult to interpret since the true data generating process can be any on a continuum of orthogonal rotations. This is not true for the uniform distribution of a rectangle. While this distribution is not invariant to rotation, the number of isomorphic transformations are very few and they are easy to compare to one another. In particular, the maximum likelihood can be achieved only be exchanging the axes, inflecting the axes, or some combination of the two. Thus, by the judicious selection of starting values, the parameters estimated in each iteration of the Monte Carlo can be directly compared to the parameters used to generate the data. The distribution of ideal points for each group is defined as follows. Group one voters are assumed to be uniformly distributed over the interval j kKlm l)n in the unidimensional case and uni- j kKlm l)noej kPlpm l n square in the two dimensional case. Group two members are assumed to be uniformly distributed over the inteval j kKl)qDrts/m l)qr sn in the case of one dimension and uniformly over the j kKl)quvr'swu m l)qu3r's9u nvoxj kPl+qpy=r'swy m l+qpy=r'sy n rectangle in formly distributed over the 7 To avoid the issue of picking the optimal quadrature, the simulated data is drawn from distribution of the a 7point discrete distribution the corresponds to the 7-point quadrature that would of the continuous uniform distributions described in the text. 8 See Lewis (1999) for a semi-parametric estimator of ideal point distributions in the one-dimensional case. 13 Possible distribution of ideal points for each of two groups z ~ } ~ | ~ { Figure 1: Graph shows possible distributions of ideal points for each of two groups. For identification, the location of one group must be fixed. Here we assume that the ideal points of members of group one are uniformly distributed over a square centered at the origin. Given this assumed distribution of group one ideal points (the solid square in the graph), we can the then estimate the distribution of ideal points for group two. The three dashed boxes are examples of possible rectangles over which group two member might be uniformly distributed. K# 14 Results of MML estimation on simulated data: One dimension Item True 1 0.1 2 0.2 3 0.3 4 0.4 5 0.5 6 0.6 7 0.7 Item parameters Estimated 0.087 0.222 0.306 0.413 0.452 0.574 0.643 Estimated 0.489 0.329 0.089 -0.386 1.204 0.459 0.810 True 0.5 0.3 0.1 -0.4 1.2 0.5 0.8 Group 2 parameters True 0.4 1.2 Shift Stretch Estimated 0.433 1.205 K ) ++ + ) + p + +) 3 3 )w 9 p v 3 Log likelihood Table 3: Table shows the estimated parameters of the one dimensional model. The data are simulated. The true parameters used to generate the data are given in the table. the two dimensional case. The parameters determine how far the mean group two ideal point is shifted away from the mean group one ideal point. The parameter describes how stretched or compressed the group two ideal points are relative to the group one ideal points. Examples of possible ideal points distribution for group two are shown in figure 1. The two . groups could be found to have relatively similar central tendencies along both dimension with . Or, group two could be more heterogeneous along one dimension and more homogeneous along the other as in . group two being much more homogeneous as in On the other hand, group two could have much more extreme positions and be more homogeneous as in Table 3 shows the estimates of the model on simulated data with a single dimension. In this example, seven items were used. Notice that in general the estimates are very close to the true 15 values. The estimated and are all quite good as is the estimate of the shift of the stretch parameter. Somewhat farther off is the estimate of the mean shift. In the next section, Monte Carlo experiments will confirm that the mean shift parameter is The two dimensional model was similarly effective at recovering the true parameters of the model. In this case, we were able to correctly infer that members of group two were had positions on dimension one that were one higher on average and more variable than the positions of group one members. Similarly, on dimension two, the model correctly found that group two members had on average lower positions and about the same dispersion as group one members. [STANDARD ERROR FOR THE ESTIMATED PARAMETERS ARE IN PRINCIPAL STRAIGHTFORWARD TO CALCULATE FROM THE INVERSE OF THE HESSIAN. HOWEVER, I HAVE YET TO ACHIEVE A WORKING CODE TO DO THIS] 6 Assessing the fit of the model By not estimating each voter’s ideal point individually, criteria such the percent of votes correctly ¡ classified are not available. That is, because we do not estimate a particular ideal point ( ) for each voter, we cannot say how well we were able to predict that particular voters vote. However, the fit of the model can be considered both from a likelihood and from a graphical perspective. Both will be considered below. As noted above the data are inherently multinomial in nature. That is the estimation problem can be thought of as picking the parameter values that most closely reproduce the frequency distribution of the voting patterns in the data (as shown in table 2). Thus, the spatial models developed above can be thought of as nested within a general multinomial alternative. That is, we could estimate a general model in which we estimate parameters of the multinomial distribution of the data. )¢ £)¤'¥x¦ parameters (one parameter describing the probability of falling into each of the first ¢)£)¤'¥x¦ patterns). The likelihood function for this general model is: §/¨P©4ª «'ª ¨¬¨® « ¯ ° « ¨ « ©'²¨ « ³ ´ ¨ Noting the ML estimator for each ¬ is ¬ ± , the value of the log likelihood at it maximum This distribution has 16 Results of MML estimation on simulated data: Two dimensions µ Item parameters Item True Estimated 1 0.1 0.091 2 0.2 0.199 3 0.3 0.274 4 0.4 0.438 5 0.5 0.344 6 0.6 0.559 7 0.7 0.668 ¶· ¶p¸ True Estimated 0.5 0.486 0.3 0.316 0.2 0.158 -0.4 -0.389 1.2 1.146 0.5 0.500 0.8 0.872 True Estimated 0.2 0.203 -0.2 -0.189 0.6 0.797 0.2 0.127 0.0 0.097 0.3 0.263 0.1 0.111 Group 2 parameters Parameter Shift Dimension 1 Dimension 2 Stretch Dimension 1 Dimension 2 ¿3À ¸ 3Ä Å Æ)ÇwÈ É È Á ÂpÊ ËvÌ¿3ÀIË Log likelihood Î True Estimated 0.4 -0.2 0.52 -0.10 1.2 1.0 1.21 1.05 ¹KÁ º) »¼)¼½ º+¾ ¼+½ Ã)» »pÁ ½ Í Á ¼ »+»)» Table 4: Table shows the estimated parameters of the two dimensional model. The data are simulated. The true parameters used to generate the data are given in the table. 17 Ï Ð=Ñ3Ò Ï ÐwØ Ö Ù ÚÛ Ö ÓPÔ4ÕÖ'× × will be, Since the spatial models developed above (the MML estimated models) place restrictions on the feasible values of Ü Ó Ý , these models can be thought of as nested under the general model. If in fact, the data were generated by the spatial model, then the restriction should not bind and the a similar likelihood would be achieved by the spatial model and the general multinomial alternative. On the other hand, if the data were generated by a different process (perhaps a spatial model with more dimensions), then the general model should produce a significantly higher likelihood. The dominence of the general multinomial alternative over the any particular spatial model can be tested for by constructing a familiar likelihood ratio statistic. Ñ3Þ where is distributed ã9ä Ø Ï Ð=Ñ Ò Ï Ð=Ñ áNÒ áKâ Û Ô4ß Ó=à with degrees of freedom equal to the number of patterns ( åæä ) minus the number of parameters used in the spatial model. Tables 3 and 4 report the value this test statistic and it corresponding p-value. Notice that in each case we cannot reject the spatial model in favor of the general multinomial alternative. This is, of course, what we would expect given that the data were indeed generated by the spatial models described above. Similarly, likelihood tests can also be constructed to test for the intergroup differences and for dimensionality. To see this note that the one dimensional model is nested within the two dimensional model. The same likelihood could also be achieved for the two dimensional model as ç è for the one dimensional model simply by setting all of the s and s equal to zero. Similarly, the models which constrain the groups to have the same ideal point distribution can be nested within models that allow the distributions to differ. The availability of statistical tests for dimensionality and parameter differences is an advantage of MML approach over the fixed effect models such as NOMINATE. However, these tests do have some problems. If you have relatively few observations the predicted cell probabilities for many of the multinomial cells will be very small. Since the distribution of the statistic is only asymptotically 18 correct, these small predicted cell frequencies cast some doubt on the validity of the test. For example, in the Monte Carlo experiments below, it is not uncommon for as many as twenty of the patterns to be missing from the data, while many others appear fewer than 5 times. To get around this problem, the patterns could be grouped and tests run based on the grouped patterns. Unfortunately there is no obviously best way to group the patterns and moreover this grouping does discard information lowering the power of the test. One the other hand, very large data sets introduce other problems. With very large data sets, the é3ê statistic will more closely approximate its limiting ë9ì distribution. However, we such large data sets, it is very likely that the general multinomial alternative will always dominate the spatial models when they are applied to real world data. Whatever, distribution we put over the voter ideal points and whatever independence assumptions we make about the questions are unlikely to hold exactly in practice. With very large data sets, we will likely reject our spatial model even when the violations to its assumptions are quite slight. In such a case, comparing the value of the spatial likelihood to the general multinomial alternative would be rather uninformative. On the other hand, with large data sets, tests for dimensionality and for intergroup differences should be very effective. Though again in very large data sets, LR tests may find significant differences that are of little substantive importance. The fit of the model can also be assessed using a graphical approach that is similar in spirit to the LR test against the general multinomial model. Figures 2 and 3 show the predicted and actual pattern of frequencies for each of the two examples presented above. Note that in both cases the model fits the data quite nicely. Nearly all of the predicted frequencies fall very near to the 45 degree line. The graphs also highlight the small cell frequency problem noted above. Much the discrepancy between the true and estimated pattern frequencies occur near the bottom left-hand corner of each graph. In these cases the discrepancies may in part be due not to poor estimated frequencies but, in a sense to “poor” actual frequencies. That is, where we expect to find 3.2 cases but find none, it may be that íî ï is a better estimate of how many we should find than is none. 19 True frequency ù ú û üý þ ÿ ø ð PSfrag replacements ñ ò ó ôõ ö ÷ Estimated frequency Figure 2: Plots the actual number of observation casting each of the roughly 256 vote patterns against the number predicted by the model. That most all of the points fall close to the 45 degree line is indicative of the good fit of the model. True frequency PSfrag replacements Estimated frequency Figure 3: Plots the actual number of observation casting each of the roughly 256 vote patterns against the number predicted by the model. That most all of the points fall close to the 45 degree line is indicative of the good fit of the model. 20 7 Monte Carlo results In this section, I run Monte Carlo experiments on the one and two dimensional models developed above. Overall the experiments show the method to be quite effective in recovering parameters of the model. For comparison, I also estimate the s and s directly by running probits of each on the usually unobserved . While the sampling variation in the estimates made without knowledge of is larger, it is not as much larger as one would imagine. Even with as few as 7 observed “votes” per voter, the estimates of the s and s are generally quite good. Figure 5 shows the results of the model with one spatial dimension. Overall, the technique seems to work well. In only a very few instances to do the average estimated s and s fall as much as away from their true values. Moreover, the standard deviation of the estimates. In many cases, the sampling variation in the these estimates is only about 1/3rd more than was found by estimating the s and s by running probit directly. The only case where the estimation seems to have broken down is in the estimates of the parameters related to items (or votes) 5 and 7. In both of these cases, I believe the problem is one of numerical inaccuracy. In each case the true s were very large. With such large s the exact value the value of likelihood for a yes vote is very close to zero or one for nearly all respondents. As this probability approaches zero or one the exact value of the estimated is less and less important in the likelihood. [IN THE FUTURE, I WILL EITHER REDUCE THE VARIANCE OF THETA OR MAKE THESE BETA SMALLER.] The estimates the intergroup differences between the two groups also seem very good on average. Somewhat surprisingly, the mean shift is much less accurately measured on average than is the variance stretch. Figure 6 shows the results of the Monte Carlo experiments using the two-dimensional model. As was the case with a single dimension, the s and s are quite well estimated with exception of the very large values of . As before the estimates of stretch parameters were considerably more accurate than those of the shift parameters. A few more Monte Carlo experiments would be very instructive. Figure 4 shows how the sampling variation in the parameters falls as a function of the sample size. Other experiments would also be valuable, most obvious would be one in which the number of votes per voter is manipulated. These are left for future work. 21 Monte Carlo results: One dimension Item parameters Direct probit Estimated Item true mean 1 0.1 0.105 2 0.2 0.204 3 0.3 0.299 4 0.4 0.398 5 0.5 0.572 6 0.6 0.609 7 0.7 0.713 std 0.067 0.050 0.044 0.050 0.355 0.079 0.118 mean 0.102 0.203 0.299 0.401 0.501 0.606 0,710 Estimated std 0.047 0.045 0.044 0.050 0.083 0.057 0.068 true 0.5 0.3 0.1 -0.4 1.2 0.5 0.8 mean 0.498 0.299 0.100 -0.399 -1.290 -0.503 -0.799 std 0.036 0.027 0.022 0.032 0.383 0.038 0.083 Direct probit mean 0.502 0.301 0.100 -0.402 -1.220 0.506 0.803 std 0.026 0.021 0.019 0.022 0.086 0.028 0.044 Group 2 parameters Estimated Shift Stretch True 0.4 1.2 Mean 0.414 1.216 Std. 0.218 0.077 Trials Table 5: Shows the results of 250 trials of a Monte Carlo experiment in which a data set with 1000 observations generated by the given parameters was fit using the MML method described in the text. The “Direct probit” columns show the results of fitting the model by running probit regressions of each vote directly on ! , something which is, of course, not possible in real-life applications. 22 23 std 0.034 0.030 0.026 0.292 0.081 0.039 0.057 mean 0.101 0.200 0.301 0.402 0.503 0.602 0.699 true 0.5 0.3 0.1 -0.4 1.2 0.5 0.8 std 0.022 0.024 0.027 0.387 0.107 0.024 0.039 Std. 0.103 0.055 0.048 0.043 Mean 0.401 -0.002 1.198 0.997 ()* & Trials 0.4 0.0 1.2 1.0 True Estimated std 0.015 0.011 0.011 0.019 0.040 0.015 0.022 Direct probit mean 0.502 0.300 0.099 -0.400 1.204 0.501 0.800 Group 2 parameters mean 0.500 0.300 0.099 -0.467 1.217 0.499 0.801 Dimension 1 Dimension 2 Stretch Dimension 1 Dimension 2 Shift std 0.025 0.023 0.022 0.033 0.040 0.028 0.035 Estimated '( (( true 0.2 0.4 -0.4 1.1 0.4 0.3 0.0 mean 0.200 0.400 -0.401 1.194 0.409 0.299 0.001 std 0.038 0.026 0.024 0.372 0.114 0.037 0.065 Estimated mean 0.200 0.401 -0.400 1.100 0.401 0.300 -0.001 std 0.013 0.012 0.012 0.032 0.021 0.015 0.065 Direct probit Table 6: Shows the results of 250 trials of a Monte Carlo experiment in which a data set with 1000 observations generated by the given parameters was fit using the MML method described in the text. The “Direct probit” column show the results of fitting the model by running probit of each vote directly on , something which is, of course, not possible in real-life applications. mean 0.100 0.200 0.301 0.460 0.505 0.600 0.698 " true 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Direct probit #$ Item 1 2 3 4 5 6 7 Estimated Item parameters Monte Carlo results: Two dimensions #% + Monte Carlo standard deviations of , , -/. , and -10 , by sample size Standard deviation PSfrag replacements MML Direct Probit 9: ; < = > ? 2 3 4 Observations p/q ij k l m n o Standard deviation UV W X Y Z [ Standard deviation @A B C D E F MML Direct Probit 5678 PSfrag replacements NO P Q R S T G H I JKLM Observations pr s p/q PSfrag replacements cd e f g h \ ] ^ MML Direct Probit Observations s _`ab p r Figure 4: Show the average standard errors of the s, s, and s across 7 items for various samples sizes. The standard error estimates are based on Monte Carlo simulations. The model parameters are the same as those used in table 6. For each sample size, 200 simulations were conducted. 8 Conclusion In this paper, I have developed a general model for estimating intergroup differences in ideal point distribution from individual-level binary choice (voting) data. The main advantage of the model is that skip the usual intermediate step of estimating each individuals ideal point. This is desirable because, by most existing methods, these individual-level ideal point distributions are known to be inconsistent. The Monte Carlo experiments show the promise of the model. While the applications are obvious they have been omitted from this paper and will the focus of future work. [MORE TO COME] 24 References Achen, Christopher H. 1977. “Measuring Representation: Perils of the Correlation Coefficient.” American Journal of Political Science. 21:805–815. Achen, Christopher H. & W. Phillips Shively. 1995. Cross Level Inference. Chicago: University of Chicago Press. Alesia, Alberto & Howard Rosenthal. 1994. Partisan Politics, Divided Government, and the Economy. New York: Cambridge University Press. Anderson, Erling B. 1972. “The Numerical Solution of a Set of Conditional Estimation Equations.” XXXXX. XXX(1):42–54. Anderson, Erling B. & Mette Masden. 1972. “Estimating the Parameters of the Latent Population Distribution.” Psychometrika. 42(3):357–374. Anderson, Lee F., Meredith W. Watts & Allen Wilcox. 1966. Legislative Roll-Call Analysis. Evanston: Northwestern Unversity Press. Bailey, Michael. 1998. “A Random Effects Approach to Legislative Ideal Point Estimation.” Presented at the 1998 Midwest Political Science Association Meeting, Chicago. Black, Duncan. 1958. The Theory of Committees and Elections. England: Cambridge University Press. Bock, R. Darrell & Marcus Lieberman. 1970. “Fitting a Response Model for n Dichotomously Scored Items.” Psychometrika. 35(2):179–197. Bock, R. Darrell & Murray Aitken. 1981. “Marginal Maximum Likelihood Estimation of Item Parameters: Application of the EM algorithm.” Psychometrika. 46(4):443–459. Brady, Henry E. 1989. “Factor and Ideal Point Analysis for Interpersonally Incomparable Data.” Psychometrika. 54(2):181–202. Butler, J. S. & Robert Moffit. 1982. “A Computationally Efficient Quadrature Proceedure for the One-Factor Multinomial Probit Model.” Econometrica. 50(3):761–764. Converse, Phillip & Gregory A. Markus. 1979. “Ca Plus Change...The new CPS Election Study Panel.” American Political Science Review. 73(1):32–49. Cressie, Noel & Paul W. Holland. 1983. “Characterizing the Manifest Probabilities of Latent Trait Models.” Psychometrika. 48(1):129–143. Dempster, A. P., D. B. Rubin & R. K. Tsutakawa. 1981. “Estimation in Covariance Components Models.” Journal of the American Statistical Society. 76(June):341–353. Dempster, A. P., N. M. Laird & D. B. Rubin. 1977. “Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society, Series B. 39(1):1–38. 25 Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper & Row. Dubin, Jeffrey A. & Elisabeth R. Gerber. 1992. “Patterns of Voting on Ballot Propositions: A Mixture Model of Voter Types.”. Social Science Working Paper 795, California Institute of Technology. Fienberg, Stephen E. & Michael M. Meyer. 1983. “Loglinear Models and Categorical Data Analysis with Psychometric and Econometric Applications.” Journal of Econometrics. 22:191–214. Fiorina, Morris. 1992. “An era of divided government.” Political Science Quarterly. 33(2):387– 410. Follman, Dean. 1988. “Consistent Estimation in the Rasch Model Based on Nonparametric Margins.” Psychometrika. 53(4):553–562. Follman, Dean A. & Diane Lambert. 1989. “Generalizing Logistic Regression by Nonparametric Mixing.” Journal of the American Statistical Society. 84(March):295–300. Gerber, Elisabeth R. 1991. Legislative politics and the direct ballot: comparing policy outcomes across institutional arrangements PhD thesis University of Michigan Ann Arbor: . Gerber, Elisabeth R. 1996a. “Legislative Response to the Threat of Popular Initiatives.” American Journal of Political Science. 40:99–128. Gerber, Elisabeth R. 1996b. “Legislatures, Initiatives, and Representation: The Effects of Institutions on Policy.” Political Research Quarterly. 49:263–286. Gerber, Elisabeth R. & Adam S. Many. 1996. “Incumbent-Led Ideological Balancing: A Hybrid Theory of Split-Ticket Voting.” Present at the 1996 Midwest Political Science Association Annual Meetings, Chicago. Groseclose, Tim. 1994. “The committee outlier debate: A review and a reexamination of some of the evidence.” Public Choice. 80. Haberman, Shelby J. 1977. “Maximum Likelihood Estimation in Exponential Response Model.” The Annals of Statistics. 6(5):815–841. Heckman, James J. & Bo E. Honore. 1990. “The Empirical Content of the Roy Model.” Econometrica. 58(5):1121–1149. Heckman, James J. & James M. Snyder. 1997. “Linear Probability Models of the Demand for Attributes with an Empirical Application to Estimating the Preferences of Legislators.” The Rand journal of economics. 28:S142–169. Heckman, James J. & Robert J. Willis. 1977. “A Beta-logistic Model for the Analysis of Sequential Labor Force Participation by Married Women.” Journal of Political Economy. 85(1):27–58. Heinen, Ton. 1996. Latent Class and Discrete Latent Trait Models. Thousand Oaks: Sage Publications. 26 Heron, Michael. 1998. “Some consequences of the lack of micro-foundations in aggregate voting data.” Prepared for presentation at the 1998 annual meetings of the American Political Science Association. Boston MA. Hinich, Melvin J. & James M. Enelow. 1984. The spatial theory of voting: an introduction. Cambridge, England: Cambdridge University Press. Hotelling, H. 1929. “Stability in Competition.” Economic Journal. 39:41–57. Kiefer, J. & J. Wolfowitz. 1956. “Consistency of the Maximum Likelihood Estimator in the Presence of Infinity Many Incidental Parameters.” Annals of Mathematical Statistics. 27:887–906. King, Gary. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton: Princeton University Press. Kuklinski, James H. 1978. “Representation and Elections: A Policy Analysis.” American Political Science Review. 72:165–177. Lahda, Krishna K. 1991. “A Spatial Model of Legislative Voting with Perceptual Error.” Public Choice. 68:151–174. Laird, Nan. 1978. “Nonparametric Maximum Likelihood Estimation of a Mixing Distribution.” Journal of the American Statistical Association. 73(December):805–811. Lewis, Jeffrey B. 1996. “Referendums, Roll-calls, and Constituencies.” Presented at the 1996 Annual meetings of the American Political Science Association, San Fransisco, CA. Lewis, Jeffrey B. 1997a. “To the victors go the rollcalls.” Presented at the 1997 Annual meetings of the Midwest Political Science Association, Chicago, IL. Lewis, Jeffrey B. 1997b. Who do representatives represent? The importance of electoral coaltion preferences in California PhD thesis Massachusetts Institute of Technology Cambridge MA: . Lindsay, Bruce, Clifford C. Clogg & John Grego. 1991. “Semiparametric Estimation in the Rasch Model or Related Exponential Response Models, Including Simple Latent Class Model of Item Analysis.” Journal of the American Statistical Society. 86(March):96–107. Londregan, John. n.d. “Estimating Preferred Points in Small Legislatures: The Case of the Chilean Senate Committees.” Unpublished Working Paper, University of California, Los Angeles. Londregan, John & James M. Snyder. 1994. “Comparing Committee and Floor Preferences.” Legislative Studies Quarterly. 19(2):233–265. Mislevy, Robert J. 1984. “Estimating Latent Distributions.” Psychometrika. 49(3):359–381. Mroz, Thomas A. 1997. “Discrete Factor Approximations in Simultaneous Equations Models: Estimating the Impact of of a Dummy Endogenous Variable on a Continuos Outcome.” Unpublished working paper, Department of Economics, University of North Carolina, Chapel Hill. 97-2. 27 Neyman, J. & Elizabeth L. Scott. 1948. “Consistent estimates based on partially consistent observations.” Econometrics. 16(1):1–32. Peltzman, Sam. 1985a. “Constituent Interest and Congressional Voting.” Journal of Law & Politics. 27:181–210. Peltzman, Sam. 1985b. “An Economic Interpretation of the History of Congressional Voting in the Twentieth Century.” American Economic Review. 75:657–675. Poole, Keith & Howard Rosenthal. 1985. “A Spatial Model for Legislative Roll Call Analysis.” American Journal of Political Science. 29:355–384. Poole, Keith T. & Howard Rosenthal. 1991. “Patterns of Congressional Voting.” American journal of political science. 35(1):228. Poole, Keith T. & Howard Rosenthal. 1997. Congress: A Political-Economic History of Roll Call Voting. Oxford: Oxford University Press. Rednert, Richard A. & Homer F. Walker. 1984. “Mixture Densities, Maximum LIkelihood and the EM Algorithm.” SIAM Review. 26(2):195–239. Rigdon, Steven F. & Robert R. Tsutakawa. 1983. “Parameter Estimation in Latent Trait Models.” Psychometrika. 48(4):567–574. Sanathanan, Lalitha & Saul Blumenthal. 1978. “The Logistic Model and Estimation of Latent Structures.” Journal of the American Statistical Society. 73(December):794–799. Snyder, James M. 1996. “The Dimensions of Constituency Preferences: Evidence from California Ballot Propositions, 1974-1990.” Legislative Studies Quarterly. 21(4):463–488. Stroud, A. H. & Don Secrest. 1966. Gaussian Quadrature Formulas. Englewood Cliffs NJ: Prentice-Hall. Thurstone, L.L. 1931. “The isolation of Blocs in a Legislative Body by the Voting Records of Its Members.” Journal of Social Psychology. 3:425–433. Tong, Y. L. 1988. “Some Majorization Inequalities in Multivariate Statistical Analysis.” Review. 30(4):602–622. SIAM Zwinderman, Aeilko H. 1991. “A Generalized Rasch Model for Manifest Predictors.” Psychometrika. XX:589–599. 28