Comments
Description
Transcript
\Sink or Swim:" What
\Sink or Swim:" What Happened to California's Bilingual Students after Proposition 227? Valentina A. Bali California Insititute of Technology [email protected] August 10th, 2000 Abstract Proposition 227, passed in California in 1998, aimed to dismantle bilingual programs in the state's public schools. Using individual level data from a southern California school district, I nd that in 1998, before Proposition 227, limited-English-procient (LEP) students enrolled in bilingual classes had lower scores in reading than LEP students not enrolled in bilingual classes: 2.4 points I thank Matthew O. Jackson, R. Michael Alvarez, Christine Rossell and Fred Boehmke for helpful comments and discussion. I am grateful as well to William Bibbiani of the Research, Evaluation and Testing Department at Pasadena Unied School District for providing the data, and for insightful discussions. 1 less on a scale from 1 to 99. In math these bilingual students scored 0.5 points higher than non-bilingual LEPs. But in 1999, after Proposition 227 the same set of students had scores no worse than non-bilingual LEP students in reading and were still 0:5 points higher in math. Proposition 227, which interrupted bilingual programs early and emphasized English instruction, then, did not set bilingual students back relative to non-bilingual LEP and may have even benetted them. 1. INTRODUCTION Proposition 227 passed in California's June 1998 primary election with an ample margin of approval: 61% of voters supported the measure statewide, while only two counties in the state, San Francisco and Alameda, voted against it. The main goal of the initiative was to dismantle long-standing bilingual programs in public schools. Specically, after Proposition 227, a child could be kept in a bilingual education program only if the parents requested a waiver and it was approved by school authorities. This deceptively simple new standard was implemented to dierent degrees throughout the state. Many school districts fully complied with the new law while others unsuccessfully sought legal exemptions. As a result, the overall percent of LEP 2 students enrolled in bilingual programs declined from 29% in 1998 to 11% in 1999.1 This shift of educational regimes, from one encouraging primary language instruction to one emphasizing early English instruction, is taking place in many of California's districts and may possibly take place in other states as well. It is important then to understand its immediate impact on students. The purpose of this paper is to examine the academic performance of limitedEnglish-procient (LEP) students after Proposition 227. In particular, the focus is on former bilingual LEP students: LEPs who were enrolled in bilingual programs in 1998 but not in 1999, and whose academic The contributions in this paper are three-fold. First, from a substantive point of view, looking at individual level data, I nd that former bilingual students in a southern California district were not set back by Proposition 227 relative to non-bilingual LEPs. In particular, controlling for background characteristics, LEP students enrolled in bilingual classes in 1998 but not in 1999, had standardized scores 2.4 points less in reading and 0.5 more in math (in a scale from 1 to 99) than LEP students not enrolled in bilingual classes in 1998, who I refer to here as non-bilingual LEPs. After Proposition 227, I nd that once former bilingual students were placed in English classrooms with special support, their scores are never worse than those of non-bilingual LEPs in reading and are still 0:5 higher in math. In general, then, former bilingual LEPs caught up with non-bilingual LEPs after the implementation of Proposition 227. The second contribution is methodological. A common, yet often ignored, problem in assessing English instruction programs is that students with little English skills may be exempted from taking standardized tests in English. Bilingual LEPs, in particular, are more likely to be exempted from testing. In those cases, when we compare bilingual LEP students, who are exempted more often, to non-bilingual LEP students, who are exempted less often, the conclusions can be biased towards favoring bilingual programs if mostly the better performing bilingual students are being tested. In general, not accounting for the exemption process can provide inconsistent estimates of expected performance. In this paper I account for the selection process with a 4 model that explicitly takes into consideration the test-taking process. If the selection model is not used, bilingual LEP students seemingly fare better when compared to non-bilingual LEPs. Finally, to check for the generalizability of the school district results, I look at county level data. I nd that counties with higher percentages of bilingual LEP students experienced gains in their test scores that are statistically no dierent than those of counties with less bilingual LEP students. On the other hand, counties with higher percentages of Hispanic students experienced larger gains than counties with smaller percentages of Hispanic students. More specically, after racial and economic controls, when comparing counties with no Hispanics to counties with 40% Hispanics, such as Los Angeles, I nd that 1.2% more students score above the 50th national percentile ranking (NPR). The fact that counties with more Hispanics experienced higher gains is consistent with Proposition 227 not setting English learner students back. These ndings have important implications. Given that bilingual students do as well as non-bilingual students when bilingual programs are dismantled, this suggests that interrupting or shortening their stay in a bilingual program does not set these students back. More English instruction does not seem to hamper them, at least in the short run. Clearly, one-year eects are short term and what matters are long term eects. The next several years should provide ample opportunities to explore whether 5 the long-term eects of this large reform are positive or negative. Understanding the eects of Proposition 227 is crucial given that similar measures are being considered in other states such as Arizona, Massachusetts and New York and given the increasing size of the immigrant population. The paper is organized as follows. The next section briey reviews the passage of Proposition 227. Section 3 discusses the hypothesis while Section 4 reviews the data and the methods. Section 5 presents then the main results by comparing bilingual LEPs' scores in 1998 and 1999 to those of non-bilingual LEPs and non-LEPs. Section 6 looks at county level results. Alternative specications, and caveats are in Section 7, while conclusions and discussion are in Section 8. 2. THE PASSAGE OF PROPOSITION 227 Proposition 227, sponsored by a citizen organization, passed in California with 61% of voter approval. Only two counties, San Francisco and Alameda, out of the 58 California counties, did not support the initiative. The level of voter approval across the state surprised observers at the time, given that many teacher unions and Hispanic organizations had strongly mobilized against Proposition 227. Moreover, many counties have large Hispanic populations. Los Angeles, for example, has over 6 45% Hispanics while the state overall has close to 30%.2 This remarkable uniformity of voter approval did not necessarily imply an informed consensus on the educational merits of bilingual instruction. Voters did not base their decision exclusively on their views on bilingual instruction. In a probit analysis of exit-poll voters, Alvarez (1999) shows that racial and ideological identications were driving factors for the passage of Proposition 227, independent of opinions on the eÆcacy of bilingual instruction (see also Cornelius and Martinez (2000) and Ji (2000)). Bilingual education is a racially and ideologically charged issue for voters, and this educational method is also controversial for researchers.3 The academic literature has not arrived at a general consensus on the benets of bilingual programs. This lack of consensus stems at times from ideological biases, problematic methodology, or simply intellectual disagreement. James Green (1999) and Rossell and Baker (1996) have the most recent comprehensive surveys of the eÆcacy of bilingual education, yet they reach quite opposite conclusions. Rossell and Baker review 72 methodologically acceptable studies from a pool of 300 studies. Their main conclusion is that the research evidence does not support bilingual programs as a strictly better form of instruction than English-as-a-second-Language (ESL) programs or structured-English-immersion (SEI). Only in a minority of studies is bilingual instruction better than a regular En2 See the Department of Finance of California website (www://htttp.dof.ca.gov) for demographic information on 2000 projections. 3 For a description of the dierent bilingual programs in the country and their methods see Faltis and Hudelson (1998). For a history of bilingual education and the its politics see Crawford (1995). 7 glish classroom. But, other researchers have concluded that bilingual programs can be as eective as English-only ones and sometimes even more eective (Willig (1985), Collier and Thomas (1989, 1997), Garcia (1991), Krashen (1998, 1999), Hakuta (1994)). In a meta-analysis of bilingual eÆcacy that used 11 studies, Green (1999) nds that bilingual instruction is superior to programs emphasizing early English instruction. Similarly, Thomas and Collier (1997) concluded that bilingual instruction, particularly when literacy in both languages is emphasized, was better than any other program for LEP students. One of the better known long term studies was conducted by Ramirez (1992) who tracked students for over four years in various dierent programs of instruction for LEP students. Ramirez and his associates found that bilingual programs of the early type (where the goal is to exit the students as soon as they learn English) are better than immersion (all-English special instruction) programs but only in the early years. In later years the benets from bilingual instruction disappear. Importantly though, the Ramirez study did not statistically account for the fact that fewer bilingual students were tested than those in other programs. For example, only 29% of bilingual students were tested while 42% of the alternative immersion program students were tested. This can induce a favorable bias toward bilingual programs since only the better performing bilingual students get tested (Rossell (1999)). The majoritarian opinion of California's voters was for reform, but not entirely 8 based on assessments of educational outcomes. Before Proposition 227 there was no broad agreement among voters, teachers or academics, about the potential eects of this initiative. After the public release of 1999 aggregate school level scores which showed small increases, the reactions were mixed. Those who advocated bilingual instruction cautioned against ignoring across-the-board increases when looking at LEPs' gains in scores (Hakuta (1999)). Others compared the gains between districts which thoroughly complied with Proposition 227's mandate and those that maintained bilingual programs and concluded that the initiative had worked (Amselle (1999). The next sections by focusing on individual data and county level data will hopefully provide further understanding of the impact of the initiative. 3. THE HYPOTHESIS The main hypothesis to test is whether Proposition 227 had a negative impact in 1999 on former bilingual LEPs compared to former non-bilingual LEPs relative to their baseline performances in 1998 . To test the main hypothesis one needs to look at both 1998 and 1999 overall scores or, alternatively, individual 1998-1999 gains. We may expect that overall, students enrolled in 1998 in bilingual classes had lower 1998 scores than non-bilingual LEPs and non-LEPs since bilingual students were exposed to much less English, the standardized tests were fully in English and designed for uent students, and dierent studies have suggested that full uency 9 can take from 5 to 7 years (National Research Council (1998), Collier and Thomas (1989)). Controlling for background information and school eects, I will test the rst hypothesis: H.1 : Bilingual LEP students had statistically lower scores than non-bilingual LEP students in 1998. Dismantling bilingual instruction can have a negative eect on bilingual students if bilingual instruction is a superior program or, regardless of the merit of the program, from interrupting the program and expecting English competence in a short period of time. Dismantling bilingual instruction can positively aect bilingual students if bilingual instruction is not a superior program (in and of itself or due to implementation) or if bilingual instruction is a benecial program only for short periods of time. The goal of this paper is to estimate the eect of the reform or, more precisely, the eect of interrupting bilingual instruction. In particular, I will test the hypothesis held by bilingual advocates that LEP students would not benet from the reform compared to other LEP students enrolled in less adequate programs or already exited from bilingual instruction. H.2 : The gap in test scores between former Bilingual LEPs students and continuing non-bilingual LEP students increased in 1999. Ideally we would rst compare the performance of former bilingual LEPs against 10 continuing bilingual LEPs rather than continuing non-bilingual LEPs. This comparison would hold constant their 1998 bilingual background while varying their 1999 status. The problem with this comparison is the possibility of strong biases in determining who is assigned to each program alternative. The decision to place or continue a student in a bilingual instructional setting is most likely non-random. However, in the southern California district of my study, this potentially problematic comparison is actually not possible since only 200 bilingual students continued in bilingual instruction and they were all exempted from test-taking in 1999. Moreover, the continuing bilingual students were all from the same school and their waivered status was mostly the result of activist teachers at that particular school. Therefore, although I cannot compare former and continuing bilingual LEPs, I can compare former bilingual students and non-bilingual students with the condence that there are no large biases in the composition of the groups due to Proposition 227-induced changes. 4. THE DATA AND THE METHODS 4.1 The District Pasadena Unied School District (PUSD) is in Los Angeles county, southern California. In 1998-1999 it had a total population of approximately 22,000 students, of 11 whom 18,300 were eligible for testing and took the Stanford 9 tests.4 Pasadena is quite diverse in its population as can be seen in Table 1. Compared to California's total 1999 averages, Pasadena has a larger minority student body and its students come from backgrounds that are more disadvantaged, as seen by the percentages of students from families qualifying for Aid to Families with Dependent Children (AFDC) and Free Lunch programs. Pasadena's LEP percentage, 26.3, is, on the other hand, very similar to California's percentage of 24.6. In terms of academic performance Pasadena lags California in every grade as measured by national percentile rankings (NPR) from reading Stanford 9 scores. The last two rows in Table 1 show the dierence for second and eleventh grades. (Table 1 about here). Overall PUSD is a good representative school district to study the eects of Proposition 227 in that it has a large Hispanic and LEP student body, one which is comparable to the state's demographics. The fact that it has a possibly more disadvantaged student body and in general lower scores can make it more diÆcult for any reform to succeed, but if positive eects are found then they would be even more convincing. The language of Proposition 227 implied that each district had to inform the parents about the new regime, and inform them of their option to request a waiver to 4 Stanford 9 tests are the standardized tests which by law since 1998 all students in California in grades 1-11 must take. The tests cover math, reading, language and subject areas. 12 keep their child in a bilingual program. In the 1998 academic year there were approximately 5,400 LEP students; 2,900 of these (or 16% of the student population including kindergarten) were enrolled in bilingual classes, primarily in grades K-4. Less than 3% of bilingual students were from an Armenian background, the other minority group which was oered bilingual instruction in Pasadena. By 1999, after the passage of Proposition 227, the district largely dismantled its bilingual programs after few requests for waivers from parents were received. The majority of the bilingual students were placed in structured-English-immersion classes (SEI) where English is taught at the students' level, while non-bilingual LEPs continued in regular classrooms or classrooms with some English-as-a-second-language support. Approximately 200 waivers were requested by parents and accepted, all coming from the most heavily Hispanic school. The district went from roughly two-thirds of its 30 schools oering bilingual programs in 1998 to just one school in 1999.5 4.2 The Data To test the hypothesis I use multivariate analysis in which the dependent variable is test scores in reading and math and the explanatory variables correspond to background and school information. To measure the performance of students I use their 5 Since continuing bilingual students are not included in this study I will often refer in the reaminder of this paper to former bilingual LEP student simply as bilingual LEPs. Similarly I will refer to continuing non-bilingual LEPs as non-bilingual LEPs. 13 1998 and 1999 Stanford 9 tests scores. This is the test that all California students in grades 2-11 must take by law since 1998. Only math, reading and language are tested across all grade levels and I will focus the analysis on total reading and math.6 In general the variables to be incorporated in the analysis can, arbitrarily, be grouped into three categories: individual, group and school variables. The individual variables correspond to those describing a student's English prociency classication. The group, or family, variables are: race, socioeconomic level, welfare (AFDC), Free Lunch program, and residence values of Both Parents, Mother and Father. Socioeconomic level can take three values, low, mid, and high. These levels are derived from relative real estate values of a students' address. AFDC is welfare for families with children and Free Lunch captures students enrolled in the federally funded program of free/reduced luncheons. Both Parents, Mother and Father, correspond to the various types of residence values or the guardians the student lives with in the household. In general, lower SES or welfare variables are expected to be associated with lower scores, while relatively more stable households composed of both parents are expected to be associated with higher scores. 6 The test scores are normed curve equivalent (NCE) scores which are obtained by rst scaling the raw score of a student given the diÆculty of the questions such that any increase of a point at one place in the scale is equal to a point increase anywhere else in the scale. Next, these scaled scores are translated into a national percentile rank (NPR) which is the percentage of the national norming sample who scored equal to or less than the student. Finally, the NPR is re-expressed as a value from a normal curve. The benets from using NCE scores is that comparisons can be made across subjects and grades. 14 The school variables are Class Size, Percent Full Credentials and Magnet. Class size is the average class size of a school while Percent Full Credentials is the percent of credentials held in a school which are full credentials, as opposed to emergency, or interim credentials. Magnet is an indicator variable for the three magnet schools in the district. I expect larger class sizes to have a negative impact on scores while I expect higher percentages of full credentials to be associated with higher scores. Below is a brief description of the variables that capture the level of English prociency which are the focus of most of the analysis. LEP/Non-LEP: A LEP student is not yet deemed procient in English, as mea- sured by a standardized evaluation.7 If a student is assessed as LEP in 1999 then they were also LEP in 1998. Non-LEP students are either uent natives or students who have been redesignated as procient, or former LEPs. LEP students score in general signicantly less than non-LEPs and this gap is to be expected in the Pasadena district as well. Moreover, in any given year the gains of LEPs are expected to be larger than those of non-LEPs since LEPs have gains that include increased comprehension of English and not just expanded acquisition of the material (Rossell and Baker (1996)). 7 Standardized evaluations of LEP students can be problematic. The category of LEP does not exclusively include students learning English but may also include students who are now uent but were not so previously, and in some cases even students who know no language other than English. What all LEP students have is a family member who does not speak Englsish and low scores. 15 Former Bilingual LEP/Non-Bilingual LEP: In 1998 a LEP student could be enrolled in bilingual classes or not. If they were enrolled in bilingual classes I refer to them as former bilingual LEP (or just bilingual LEP); otherwise they are nonbilingual LEP 1998. Note that non-bilingual LEPs may have been enrolled in bilingual classes before 1998. After Proposition 227, most bilingual LEPs were assigned to SEI classrooms or mainstream classrooms with some English support. Non-bilingual LEPs largely continued in their previous program (mostly regular classrooms with English support) unless redesignated as non-LEPs or uent, in which case they attended regular classrooms. 4.3 Methods The assignment in 1998 of students into a given English learning program was determined in great part by the districts' assessment and subsequent recommendation to the parents. Students were clearly not randomly selected into bilingual or all English programs so we must address other factors, apart from enrollment in or out of bilingual classes, that may have inuenced students scores. In addition, out of a pool of 14,000 students enrolled in the district in both years, more than 1000 students were exempted while close to 1000 students skipped the reading and math tests (in the data set the students who were exempted or missed are indistinguishable). If the exemptions and misses are not random these underlying selection processes must be 16 taken into account, otherwise the estimates of the coeÆcients will be inconsistent. A rst conjecture with regards to the direction of the bias is that the less procient LEPs are being exempted while the lower achieving students are missing the tests. In this paper I will use a method of estimation, Heckman's selection model, that explicitly models the selection process. That is, two equations are actually estimated. The rst equation is the one that explains test scores and the one we are interested in. Without a selection process this equation would be estimated by standard ordinary least squares (OLS) techniques. The second equation, the selection equation, predicts whether a score is observed or not. Separate from the scores equation this equation could be estimated as a discrete binary choice model (probit). In Heckman's model, the coeÆcients and parameters in both equations are simultaneously estimated through maximizing the likelihood of observing the data.8 An important parameter that is estimated is the correlation, , between the errors (the non-deterministic components) in the two equations. If the correlation is signicantly dierent from zero this suggests the two processes, scores and test-taking, are interdependent and the selection model is an appropriate approach. Moreover, when a coeÆcient appears in both equations, the total marginal eect will depend on the eect in the scores equation plus a correction term that is linearly weighted by the correlation between the errors (see Appendix A for details of the model). 8 See Green (1998) or Maddala (1996) for a detailed explanation of the method. 17 5. BEFORE AND AFTER PROPOSITION 227 5.1 Overall View I begin the analysis by looking at the average scores without controlling for background or school information. The student population studied is those enrolled in the district in both years 1998 and 1999 and tested in 1999.9 Some of these students may be missing scores in either year but they were enrolled in the district. Table 2 presents their average scores in 1998 and 1999 for reading and math by level of English prociency. In parentheses the number of students who actually took the test and the standard deviation of their scores are also included.10 LEP students who were enrolled in bilingual classes in 1998 increased their average scores by 4.4 points in reading and 4 points in math in 1999. Non-bilingual LEPs on the other hand experienced smaller increases in 1999: 1.5 points and 2.5 in reading and math respectively. Further, bilingual LEPs' 1999 scores in reading and math are statistically indistinguishable from those of non-bilingual LEPs. Non-LEP students have much higher average scores than either bilingual or non-bilingual LEPs but their average gains are much smaller: 0.7 points in reading and 1.2 in math. This preliminary break-down already suggests that bilingual LEP students caught up to non-bilingual 9 As discussed in Section 6 the same qualitative results will hold when comparing all students who take the tests in each year, without the present restrictions. 10 1998 scores include test-taker in grades 1-10 and 1999 scores include test-takers who have moved on to grades 2-11. 18 LEP students after Proposition 227. These numbers though do not include statistical controls for background. Moreover, this simple inspection does not account for the fact that many bilingual students did not take the tests in 1998. The next section will address these problems using a selection model that includes controls for background characteristics and estimates a scores equation and an equation explaining who is more likely to be tested. (Table 2 about here). 5.2 The Baseline in 1998 Table 3 belo at the 95% level. Being a former bilingual LEP, LEP, Hispanic, or black decreases a student's chances of taking the tests. Similarly, this probability decreases further if the school has a large percentage of Hispanic teachers. Students in magnet schools are more likely to be tested as well as those in grades greater than 3 (not shown). (Table 3 about here). I begin by looking at the background and school information variables that appear only in the scores equation. The coeÆcients for these variables have a straightforward interpretation as in an OLS model, without any corrections. In general the signs of the eects are all in the expected direction. For example, all else equal, students with low SES backgrounds have lower scores in reading (-3.2) and math (-3.3) than students from high SES background. Family stability, on the other hand, corresponds to signicant increases in scores: having both parents in the family is associated with 2.4 points more in reading and 3.15 points more in math in comparison to living with step parents, a foster family or in an institutional setting, the categories excluded. With regards to the school variables or the policy variables upon which the district has direct inuence, I nd that the coeÆcient on Percent Full Credential, 0:20 is positive and signicant for both reading and math. All else equal, if we compare a school with 65% of its credentials being full, close to the districts' average, with a hypothetical school with 100% of its credentials being full, the increase in 20 percentage of full credentials corresponds to an increase of 7 points in reading and math. These increases are large when compared to the average increases in scores, close to 3 percentile points in the national ranking ascribed to the reduction in class size in California's schools (Los Angeles Times (June, 1999)). I focus next on the variables LEP and Bilingual LEP which are the subject of this analysis. As I have modeled the selection process these variable can inuence both a score and the probability of taking a test. The variable LEP is a discrete variable which is one when a student is bilingual LEP or non-bilingual LEP in 1998 and zero otherwise. Therefore if the variable bilingual LEP is signicant it means there is an extra eect for former bilingual students. Table 4 below summarizes the results (the exact determination of the total eects are included in Appendix B). The net eect of having been enrolled in bilingual instruction in 1998 for a LEP student is 2:4 points less in reading than a non-bilingual LEP and 0:5 points more in math. These dierences are signicant at the 95% level. While in bilingual classes LEP students did worse in reading than non-bilingual LEP students, as would be expected given that they were exposed to less English. The fact that their scores in math are virtually the same suggests that the assignment into the two groups is based on language skills rather than academic skills. These eects are signicant and conrm the intuition behind Hypothesis 1: Bilingual students enrolled in 1998 had statistically lower scores than non-bilingual LEPs in subjects that stress English skills. 21 (Table 4 about here). With regards to the other indicator of English prociency, LEP 1998, I nd that LEP students have much lower scores than non-LEP students, the excluded indicator variable. When we combine the eects of the LEP variable from both the scores and the selection equations a representative (Hispanic and non-bilingual) LEP student scores 13:1 points less than a non-LEP student in reading and 9:7 in math. These dierences are large and signicant at the 95% level. DiÆculties with the English language being a LEP or not is the variable with the strongest impact on a students' score. Furthermore, after controlling for LEP status, Hispanic students still score close to 6:3 and 7:3 points less in reading and math than white students. For black students the gap with respect to white students is larger: 9:8 points in reading and 11:9 points in math. My present analysis does not include more controls such as parent's education and at-home behavior which can reduce the gap between whites and Hispanic and black students. On the other hand, this gap is consistent with many similar ndings in the literature of a persistent racial gap after ever more thorough controls.11 5.3 Good News after Proposition 227? 11 See Jencks and Phillips, 1998, for an excellent account of the test-score gap black and white students. See NCE 95-767 report on Hispanics in education and a discussion of their gap in scores with regards to white students. 22 Bilingual students had lower scores in reading than non-bilingual students most likely due to the fact that they had less exposure to English since their math scores were actually slightly higher than those of non-bilingual LEPs. In 1999, most bilingual LEP students were placed in structured-English-immersion classrooms where the content is, in theory, the same as in regular classrooms but the English is adapted to suit the student's level. That is, in 1999 many bilingual students had their educational program interrupted (specially if the student was entering second or third grade given that the average stay in bilingual programs was above two years) and they were placed in a dierent educational program that heavily emphasized English acquisition. What happened in 1999, after Proposition 227, to these former bilingual LEPs? I nd that in 1999 former bilingual LEP students had scores in reading that were not statistically dierent from those of non-bilingual LEPs. (Table 5 about here). Table 5 presents the results from a selection model analysis where scores from 1999 are explained by the same independent variables as in the scores equation for 1998. Complete results are included in Appendix C. As in the previous section, the students included in the analysis are those in the district in both years. In reading bilingual LEPs had scores 0.37 less than non-bilingual LEPs (p-value >0:2). That is, at the 95% level bilingual LEPs scored statistically indistinguishably from nonbilingual LEPs. Likewise for math, the positive coeÆcient, 0:49; is not signicantly 23 dierent from zero (p-value >0:43). Non-bilingual students' scores also went up: in 1999 they are 12.8 points less than non-LEPs in reading, rather than 13.1 in 1998, and 9.4 points less in math rather than 9.8 points.12 From Table 2 we know that non-LEP scores went up as well though by a much smaller amount. So the reduction in the gap between bilingual LEPs and non-bilingual LEPs is not due to the \top" performing students going down. Rather, it seems the \bottom" performing students caught up a bit. Exposing bilingual students to a program that emphasized English acquisition did not set these students back relative to non-bilingual LEP students. In this way we can refute the second hypothesis. Therefore: The gap in scores between Bilingual LEPs students and non-bilingual LEP students decreased in 1999. An alternative way to analyze the impact of the reform is to look at individual gains in scores. Table 6 below presents the overall gains for students in the district and with test scores in both years. The gains of bilingual LEPs are statistically higher than those of non-bilingual LEPs at the 95% level. With math the gains between the two groups of students are statistically indistinguishable at the 95% level. This preliminary inspection is consistent with the previous ndings. A more thorough analysis taking into consideration control variables runs into diÆculties: gains are not well explained by the independent variables previously introduced (the adjusted 12 With regards to the other independent variables in the model we would not expect them to have a dierential eect one year later and they display essentially the same magnitude and direction as in 1998. 24 R2's are 0:06 and 0:04 in reading and math). That is, apart from language prociency, the other independent variables cannot explain much of the gains. This is consistent with the fact that, in theory, standardized tests are designed such that a student who has learned his grade level material will test at the same percentile level as the previous year (Rossell and Baker (1996)). (Table 6 about here). Finally, an interesting question remains with regards to alternative scenarios. Specifically, what would PUSD scores have been in 1999 if Proposition 227 had not been passed? Figure 1 below shows three dierent average scores: average predicted scores in 1999 for an alternative, counterfactual, scenario in which Proposition 227 was not implemented, average predicted scores from the selection model in 1999 and average actual scores in 1999.13 The average scores in reading and math for the scenario without implementation, 29.56 and 40.43, are lower than both the predicted and actual average scores. As might be expected the dierence is larger for reading than math. These results suggest that although increases in scores were to be expected independently of Proposition 227, due to students becoming more comfortable with the test formats or teachers stressing preparation, if the measure had not been implemented then average scores would have been lower. 13 The average score for the alternative scenario is obtained by adding to the predicted average score of a non-bilingual LEP in the 1999 selection model the marginal eect from bilingual instruction in the 1998 selection model. 25 (Figure 1 about here). Looking at individual level data from a representative district I found that bilingual students caught up to non-bilingual students after bilingual instruction was dismantled. A counterfactual scenario suggests that if the measure had not been implemented, scores would have been lower. In general, the emphasis on English instruction did not seem to hamper former bilingual students' academic performance. Having said this, these ndings are not conclusive since they may be in part due to district eects not captured in my analysis or a possibly spurious short term eect. The next section looks at county level data to check for general trends. 6. COUNTY LEVEL RESULTS The results I obtained are for PUSD, a district that I consider representative of California's districts. Yet, it is important to check whether similar results hold at more general levels. To do so, I analyze county level data for California in 1998 and 1999 whic 1998, Hispanic, Black, White, AFDC, Free lunch, and LEP. I also include percentage of teachers with full credentials. The standard caveats for any aggregate analysis of educational data hold in this case: there is multicollinearity among variables and there is always the possibility of committing ecological fallacies. I view the aggregate analysis as a check of the previously observed results. (Table 7 about here). The t of the analysis is 0.35 for reading and 0.19 for math, of a similar order to the t from an OLS analysis of the PUSD data. The only coeÆcients that are signicant at the 90% level are Percent Hispanic, Percent Black, and Percent Full Credentials. The coeÆcient on Percent Hispanic (0:029) is positive for reading, indicating that counties with larger percentages of Hispanics had more students scoring above the median. In particular, if the comparison is made between a county with no Hispanics to a county with 40% Hispanics then 1.2% more students score above the 50th NPR in reading. For math the predicated increase is 5%. These values would be consistent with Proposition 227 not setting Hispanic students back, though a more denitive test would compare gains from other years, for example 97-98, to those experienced in 98-99. Systematic testing by law only started after 1998, so this comparison cannot be made. For reading the coeÆcient for Bilingual LEPs is negative, 0:01; but not signicant 27 at the 90% level ( p-value =0:29) and for math it is positive, 0:001; though also not signicant (p-value=0:96). The fact that the coeÆcient is negative for reading may partly be the result of only the higher performing bilingual LEP students being tested in 1998. Across the state, 57% of LEPs got tested in 1998 while in 1999 the percent tested ranged from 63% to 82% (the uncertainty in the latter is due to yet unresolved problems by the test-makers when collecting language uency information). These results higher Hispanic counties experiencing higher gains, and counties with more bilingual students experiencing the same gains as those with less bilingual students suggest that Proposition 227 did not set LEP students back. But again the ndings are not denitive since one cannot compare the current gains to gains experienced in other years nor account for the bias induced by a more selective pool of 1998 test-takers. 7. ROBUSTNESS There are several clarications necessary when assessing educational reforms in general, and in this study in particular. These will be addressed below. Selection Model vs OLS: If a standard linear model (OLS) is estimated instead of the selection model estimated in Section 4, then bilingual LEPs score 1:8 less in reading and 0:73 more in math than non-bilingual LEPs (p-values are 0:005 and 0:287 respectively). These numbers imply a smaller gap between the two groups than 28 those obtained earlier with the selection model: 2.4 less in reading and 0.5 points more in math. The dierences between the two estimates (selection vs OLS) are somewhat small considering that close to 50% of bilingual students did not take the tests compared to 11% of non-bilingual LEPs. The selection model is the correct methodology to analyze the PUSD data (the estimates are consistent and the model is identied) but it may not be capturing the test-taking process completely. As a result, the selection model estimates will not be dramatically dierent from the OLS estimates. Using a selection model, future research should incorporate further variables into the test-taking equation, such as English prociency level or teacher certication and racial background. Redesignation: Bilingual and non-bilingual students were redesignated in 1999 as non-LEPs. This would have occurred with or without the reform. If I repeat the same analysis done throughout this paper but excluding bilingual and non-bilingual students who were redesignated as non-LEPs in 1999, then the same qualitative results obtain. Bilingual LEPs (the reduced set) had lower scores in 1998 than non-bilingual LEPs (also a reduced set), but in 1999 they caught up with them. Stable population of students bias: The analysis in this paper looks at stu- dents who were in the district in both years, before and after Proposition 227. The rationale was that in this way the general district-wide impact would be held constant. Bilingual students that arrive may have had very dierent experiences in their 29 bilingual instruction, further complicating the comparisons with non-bilingual LEPs. This choice of a more stable population can induce bias. The direction, though, is not clear since anecdotal accounts often refer to the high mobility of low income students yet the data suggests otherwise. In PUSD the group who left in 1998 had statistically the same percentage of LEP students and mean SES level as those who stayed. Moreover, the percentage of white students who left was slightly higher (signicant at the 95% level) than those who stayed. Repeating the analysis done in the paper, but without a restriction on enrollment for two consecutive years yields the same qualitative results. In 1999, bilingual LEPs scored indistinguishably from non-bilingual LEPs. Technical Issues: The data does present some heteroskedasticity due to boundary eects. That is, larger errors occur in the estimation when predicting close to the boundaries. All estimations were done without including robust standard errors to minimize the chances of incorrectly concluding a variable had signicant eects. If 1999 results are checked with a probit analysis in which a one codes for scores above 50 and 0 otherwise, then again bilingual LEP students scored no dierently than non-bilingual LEPs in 1999. 7. DISCUSSION 30 In this paper I have shown that before Proposition 227 bilingual LEP students were scoring in reading lower than non-bilingual students, as would be expected since they had not yet been redesignated and they were taught with a heavy emphasis on their primary language. One year later and these former bilingual students have reading scores that are indistinguishable from non-bilingual LEPs students who in principle already had a better command of English. Non-bilingual LEP students themselves had better scores in 1999 than in 1998 implying that the lower performing students, the bilingual LEPs, were catching up to them, rather than the non-bilinguals doing less well. I conclude then that interrupting bilingual students' length of stay in bilingual programs did not set them back relative to non-bilingual LEP students. A counterfactual analysis of the Pasadena data further suggests that if Proposition 227 had not been implemented then bilingual students would have had lower scores. The benets of the initiative may have been positive after all. The methodology used for the analysis is a selection model that estimates scores and the probaility of taking a test. I argue this is an appropriate methodology given the potential bias among the population of test-takers towards higher performing students. Further research will hopefully focus on the test-taking process and incorporate more predictors of this process. At the county level I nd that counties with larger Hispanic populations had higher gains than counties with less Hispanics. Counties with more bilingual students on the 31 other hand had gains no dierent from those with fewer bilingual students. These ndings are consistent with the initiative having no deleterious eects on students but, without comparisons to gains in other years, cannot be denitive. Some caveats are in order. First, reforms that completely dismantle bilingual instruction may not be benecial either. What this paper has suggested is that interrupting bilingual instruction does not set them back. The key for success may actually be in \small doses" of bilingual instruction early on. And in fact some studies have found that exposure to the primary language is benecial (Rossell and Baker (1996)). Further, an important point to remember is that factors other than programs for English learning have larger impact on student scores. Factors such as socioeconomic advantage cannot be modied by the schools or district. But full credentialing can be required by schools. It is possible that on average an emphasis on teachers standards may help students (and English learners) more than an emphasis on nding the best language program (Hakuta (1999)). An nally, as often mentioned in the paper, long term eects are what actually matter and, for this data, from the next three or four years will prove enlightening. REFERENCES [1] Alvarez, R. M. (1999). \Why did Proposition 227 pass?." Working paper 1062, Caltech. 32 [2] Amselle, J. (1999). \Teaching English Wins: An Analysis of California Test Scores After Proposition 227." Read Perspectives, Abstract, Fall. [3] Bali, V. (2000).\Proposition 227 and California" The EÆcacy of Bilingual Education Revisited." Mimeo, Caltech. [4] Baker, K. and deKanter, A. (1983). \Federal Policy and the Eectiveness of Bilingual Education." K.Baker and A. deKanter (Eds.), Bilingual Education, 33-86, Lexington, MA. [5] Collier, V. (1992).\A synthesis of studies examining long-term-language-minority students data on academic achievement." Bilingual Research Journal, 16 (1&2), 187-212. [6] Collier, V. and Wayne P. Thomas (1989). \How Quickly Can Immigrants Become Procient in School English?." The Journal of Educational Issues of Language Minority Students, 5, Fall, 26-38. [7] Wayne A. Cornelius and Francisco J. Martinez, eds. (2000). \Educatin California's Immigrant Children: The Origins and Implementation of Proposition 227." Monograph No.2, LaJolla, Calif.: Center for Comparative Immigration Studies, University of California-San Diego. [8] Crawford, J. (1995): \Bilingual Education: History, Politics, Theory and Practice." Crane, Trenton, New Jersey. 33 [9] Faltis and Hudelson (1998). \Bilingual Education in Elementary and Secondary Communities." Allyn and Bacon, Massachusetts. [10] Garcia, E. (1991).\The Education of Linguistically and Culturally Diverse Students: Eective Instructional Practices." Report from National Center for Research on Cultural Diversity and Second Language Learning. [11] Greene, J. (1998). \A Meta-Analysis of the Eectiveness of Bilingual Education." Mimeo, University of Texas. [12] Green, W. (1993). \Econometric Analysis." Prentice Hall, New Jersey. [13] Hakuta, K. (1999). \What Legitimate Inferences can be Made from the 199 Release of SAT9 Scores with Respect to the Impact of Proposition 227 on the Performance of LEP students?." Release on website http://www.stanford.edu/~hakuta/SAT9. [14] Krashen, S.D. (1996). \Under Attack: The Case Against Bilingual Education." Language education Associates, Culver City, California. [15] Krashen, S. D. (1999). \Condemned without a Trial: Bogus Arguments Against Bilingual Education." Heinemann, Portsmouth, New Hampshire. [16] Ji, Chang-Ho C. (2000). \Education and Ballot Measures in California: Reections on the Thirty-Year Experience." Working Paper, Univeristy of California, Riverside. 34 [17] Maddala, G. (1983). \Limited Dependent and Qualitiative Variables in Econometrics." Cambridge University Press, New York. [18] National Research Council Report (1998). \Educating Language-Minority Children." National Academy Press, Washington D.C. [19] NCES Report 767 (1995). \The Condition of Education: The Educational Progress of Hispanic Students." National Center for Education Statistics. [20] Ramirez, J.D. (1992). \Executive summary." Bilingual Research Journal, 16 (1&2), 1-61. [21] Rossell, C. and Baker, K. (1996).\The educational eectiveness of bilingual education." Research in the Teaching of English, 30 (1), 7-74. [22] Rossell, C. (1998). \Mystery on the Bilingual Express: A Critique of the Thomas and Collier Study". Read Perspectives, V(2), Fall, 5-32. [23] Willig, A. (1985). \A Meta-Analysis of Selected Studies on the Eectiveness of Bilingual Education." Review of Educational Research, 55, 269-317. APPENDIX A: ESTIMATION OF TOTAL MARGINAL EFFECT IN HECKMAN'S SELECTION MODEL. Following Green's notation (1998) the total eect is as calculated as follows. Consider two equations, one for the selection model and the other for scores. 35 zi = wi + ui Selection yi = xi + i Scores 0 0 Then for an observed yi we have, E [yi jyi is observed] = E [yi jzi > 0] = xi + E [ijui > wi] 0 0 = xi + e mi(u) 0 (1) where, wi u = u () mi (u ) = () = Mills Inverse Ratio: 0 (:) is the density of a normal and (:) is the cumulative density function of a 36 normal. The term e is often referred to in the economics literature and in the present takes on values 0 or 1 then the marginal eect is: E [yijzi > 0; xk = 1] E [yi jzi > 0; xk = 0] = k + e[mi (xk = 1) mi (xk = 0)] APPENDIX B: TOTAL MARGINAL EFFECTS FOR READING 1998. To calculate the total marginal eects we rst obtain mi(xk = 1) mi(xk = 0) holding all other variables in wi other than xk at their mean value (or modal value if more appropriate): Below I present the estimates for the dierences in the inverse Mill's ratio as well as the total eect for reading variables in 1998. T otalEffect = ScoreEffect + Lambda [Mills(1) Mills(0)] BilingualLEP 1998 2:4 = 9:2 + 15:8 [0:65 0:22] (2) LEP 1998 13:1 = 14:7 + 15:8 [0:22 0:12] (3) 37 Hispanic 6:3 = 7:28 + 15:8 [0:12 0:065] Black 9:8 = 11:06 + 15:8 [0:14 0:065] (4) (5) In (1) the dierence in the Mill's ratio was obtained with regards to a LEP and Hispanic student who went from non-bilingual to bilingual while all other variables in the selection equation were set at their meadian. For (2), Bilingual LEP was zero and Hispanic was one. For (3), Bilingual was zero, and LEP was one. And for (4), Bilingual and LEP were zero. 38